Apache Accumulo 2.1.0

01 Nov 2022

About

Apache Accumulo 2.1.0 brings many new features and updates since 1.10 and 2.0. The 2.1 release series is an LTM series, and as such, is expected to receive stability-improving bugfixes, as needed. This makes this series suitable for production environments where stability is preferable over new features that might appear in subsequent non-LTM releases.

This release has received more than 1200 commits from over 50 contributors, including numerous bugfixes, updates, and features.

Minimum Requirements

This version of Accumulo requires at least Java 11 to run. Various Java 11 versions from different distributors were used throughout its testing and development, so we expect it to work with any standard OpenJDK-based Java distribution.

At least Hadoop 3 is required, though it is recommended to use a more recent version. Version 3.3 was used extensively during testing, but we have no specific knowledge that an earlier version of Hadoop 3 will not work. Whichever major/minor version you use, it is recommended to use the latest bugfix/patch version available. By default, our POM depends on 3.3.4.

During much of this release’s development, ZooKeeper 3.5 was used as a minimum. However, that version reach its end-of-life during development, and we do not recommend using end-of-life versions of ZooKeeper. The latest bugfix version of 3.6, 3.7, or 3.8 should also work fine. By default, our POM depends on 3.8.0.

Binary Incompatibility

This release is known to be incompatible with prior versions of the client libraries. That is, the 2.0.0 or 2.0.1 version of the client libraries will not be able to communicate with a 2.1.0 or later installation of Accumulo, nor will the 2.1.0 or later version of the client libraries communicate with a 2.0.1 or earlier installation.

Major New Features

Overhaul of Table Compactions

Significant changes were made to how Accumulo compacts files in this release. See compaction for details, below are some highlights.

  • Multiple concurrent compactions per tablet on disjoint files is now supported. Previously only a single compaction could run on a tablet. This allows tablets that are running long compactions on large files to concurrently compact new smaller files that arrive.
  • Multiple compaction thread pools per tablet server are now supported. Previously only a single thread pool existed within a tablet server for compactions. With a single thread pool, if all threads are working on long compactions it can starve quick compactions. Now compactions with little data can be processed by dedicated thread pools.
  • Accumulo’s default algorithm for selecting files to compact was modified to select the smallest set of files that meet the compaction ratio criteria instead of the largest set. This change makes tablets more aggressive about reducing their number files while still doing logarithmic compaction work. This change also enables efficiently compacting new small files that arrive during a long running compaction.
  • Having dedicated compaction threads pools for tables is now supported through configuration. The default configuration for Accumulo sets up dedicated thread pools for compacting the Accumulo metadata table.
  • Merging minor compactions were dropped. These were added to Accumulo to address the problem of new files arriving while a long running compaction was running. Merging minor compactions could cause O(N^2) compaction work. The new compaction changes in this release can satisfy this use case while doing a logarithmic amount of work.

CompactionStrategy was deprecated in favor of new public APIs. CompactionStrategy was never public API as it used internal types and one of these types FileRef was removed in 2.1. Users who have written a CompactionStrategy can replace FileRef with its replacement internal type StoredTabletFile but this is not recommended. Since it is very likely that CompactionStrategy will be removed in a future release, any work put into rewriting a CompactionStrategy will be lost. It is recommended that users implement CompactionSelector, CompactionConfigurer, and CompactionPlanner instead. The new compaction changes in 2.1 introduce new algorithms for optimally scheduling compactions across multiple thread pools, configuring a deprecated compaction strategy may result is missing out on the benefits of these new algorithms.

See the javadoc for more information.

GitHub tickets related to these changes: #564 #1605 #1609 #1649

External Compactions (experimental)

This feature includes two new optional server components, CompactionCoordinator and Compactor, that enables the user to run major compactions outside of the TabletServer. See design , compaction , and the External Compaction blog post for more information. This work was completed over many tickets, see the GitHub project for the related issues. #2096

Scan Servers (experimental)

This feature includes a new optional server component, Scan Server, that enables the user to run scans outside of the TabletServer. See design , #2411, and #2665 for more information. Importantly, users can utilize this feature to avoid bogging down the TabletServer with long-running scans, slow iterators, etc., provided they are willing to tolerate eventual consistency.

New Per-Table On-Disk Encryption (experimental)

On-disk encryption can now be configured on a per table basis as well as for the entire instance (all tables). See on-disk-encryption for more information.

New jshell entry point

Created new “jshell” convenience entry point. Run bin/accumulo jshell to start up jshell, preloaded with Accumulo classes imported and with an instance of AccumuloClient already created for you to connect to Accumulo (assuming you have a client properties file on the class path) #1870 #1910

Major Improvements

Fixed GC Metadata hotspots

Prior to this release, Accumulo stored GC file candidates in the metadata table using rows of the form ~del<URI>. This row schema lead to uneven load on the metadata table and metadata tablets that were eventually never used. In #1043 / #1344, the row format was changed to ~del<hash(URI)><URI> resulting in even load on the metadata table and even data spread in the tablets. After upgrading, there may still be splits in the metadata table using the old row format. These splits can be merged away as shown in the example below which starts off with splits generated from the old and new row schema. The old splits with the prefix ~delhdfs are merged away.

root@uno> getsplits -t accumulo.metadata
2<
~
~del55
~dela7
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000a0.rf
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000kb.rf
root@uno> merge -t accumulo.metadata -b ~delhdfs -e ~delhdfs~
root@uno> getsplits -t accumulo.metadata
2<
~
~del55
~dela7

Master Renamed to Manager

In order to use more inclusive language in our code, the Accumulo team has renamed all references to the word “master” to “manager” (with the exception of deprecated classes and packages retained for compatibility). This change includes the master server process, configuration properties with master in the name, utilities with master in the name, and packages/classes in the code base. Where these changes affect the public API, the deprecated “master” name will still be supported until Accumulo 3.0.

Important One particular change to be aware of is that certain state for the manager process is stored in ZooKeeper, previously in under a directory named masters. This directory has been renamed to managers, and the upgrade will happen automatically if you launch Accumulo using the provided scripts. However, if you do not use the built in scripts (e.g., accumulo-cluster or accumulo-service), then you will need to perform a one-time upgrade of the ZooKeeper state by executing the RenameMasterDirInZK utility:

  ${ACCUMULO_HOME}/bin/accumulo org.apache.accumulo.manager.upgrade.RenameMasterDirInZK

Some other specific examples of these changes include:

  • All configuration properties starting with master. have been renamed to start with manager. instead. The master.* property names in the site configuration file (or passed on the command-line) are converted internally to the new name, and a warning is printed. However, the old name can still be used until at least the 3.0 release of Accumulo. Any master.* properties that have been set in ZooKeeper will be automatically converted to the new manager.* name when Accumulo is upgraded. The old property names can still be used by the config shell command or via the methods accessible via AccumuloClient, but a warning will be generated when the old names are used. You are encouraged to update all references to master in your site configuration files to manager when installing Accumulo 2.1.
  • The tablet balancers in the org.apache.accumulo.server.master.balancer package have all been relocated to org.apache.accumulo.server.manager.balancer. DefaultLoadBalancer has been also renamed to SimpleLoadBalancer along with the move. The default balancer has been updated from org.apache.accumulo.server.master.balancer.TableLoadBalancer to org.apache.accumulo.server.manager.balancer.TableLoadBalancer, and the default per-table balancer has been updated from org.apache.accumulo.server.master.balancer.DefaultLoadBalancer to org.apache.accumulo.server.manager.balancer.SimpleLoadBalancer. If you have customized the tablet balancer configuration, you are strongly encouraged to update your configuration to reference the updated balancer names. If you have written a custom tablet balancer, it should be updated to implement the new interface org.apache.accumulo.server.manager.balancer.TabletBalancer rather than extending the deprecated abstract org.apache.accumulo.server.master.balancer.TabletBalancer.
  • The configuration file masters for identifying the manager host(s) has been deprecated. If this file is found, a warning will be printed. The replacement file managers should be used (i.e., rename your masters file to managers) instead.
  • The master argument to the accumulo-service script has been deprecated, and the replacement manager argument should be used instead.
  • The -master argument to the org.apache.accumulo.server.util.ZooZap utility has been deprecated and the replacement -manager argument should be used instead.
  • The GetMasterStats utility has been renamed to GetManagerStats.
  • org.apache.accumulo.master.state.SetGoalState is deprecated, and any custom scripts that invoke this utility should be updated to call org.apache.accumulo.manager.state.SetGoalState instead.
  • masterMemory in minicluster.properties has been deprecated and managerMemory should be used instead in any minicluster.properties files you have configured.
  • See also #1640 #1642 #1703 #1704 #1873 #1907

New Tracing Facility

HTrace support was removed in this release and has been replaced with OpenTelemetry. Trace information will not be shown in the monitor. See comments in #2259 for an example of how to configure Accumulo to emit traces to supported OpenTelemetry sinks. #2257

New Metrics Implementation

The Hadoop Metrics2 framework is no longer being used to emit metrics from Accumulo. Accumulo is now using the Micrometer framework. Metric name and type changes have been documented in org.apache.accumulo.core.metrics.MetricsProducer, see the javadoc for more information. See comments in #2305 for an example of how to configure Accumulo to emit metrics to supported Micrometer sinks. #1134

New SPI Package

A new Service Plugin Interface (SPI) package was created in the accumulo-core jar, at org.apache.accumulo.core.spi, under which exists interfaces for the various pluggable components. See #1900 #1905 #1880 #1891 #1426

Minor Improvements

New listtablets Shell Command

A new command was created for debugging called listtablets, that shows detailed tablet information on a single line. This command aggregates data about a tablet such as status, location, size, number of entries and HDFS directory name. It even shows the start and end rows of tablets, displaying them in the same sorted order they are stored in the metadata. See example command output below. #1317 #1821

root@uno> listtablets -t test_ingest -h
2021-01-04T15:12:47,663 [Shell.audit] INFO : root@uno> listtablets -t test_ingest -h
NUM  TABLET_DIR      FILES WALS  ENTRIES   SIZE      STATUS     LOCATION                       ID    START (Exclusive)    END
TABLE: test_ingest
1    t-0000007       1     0            60       552 HOSTED     CURRENT:ip-10-113-12-25:9997   2     -INF                 row_0000000005
2    t-0000006       1     0           500     2.71K HOSTED     CURRENT:ip-10-113-12-25:9997   2     row_0000000005       row_0000000055
3    t-0000008       1     0         5.00K    24.74K HOSTED     CURRENT:ip-10-113-12-25:9997   2     row_0000000055       row_0000000555
4    default_tablet  1     0         4.44K    22.01K HOSTED     CURRENT:ip-10-113-12-25:9997   2     row_0000000555       +INF
root@uno> listtablets -t accumulo.metadata
2021-01-04T15:13:21,750 [Shell.audit] INFO : root@uno> listtablets -t accumulo.metadata
NUM  TABLET_DIR      FILES WALS  ENTRIES   SIZE      STATUS     LOCATION                       ID    START (Exclusive)    END
TABLE: accumulo.metadata
1    table_info      2     0     7         524       HOSTED     CURRENT:ip-10-113-12-25:9997   !0    -INF                 ~
2    default_tablet  0     0     0         0         HOSTED     CURRENT:ip-10-113-12-25:9997   !0    ~                    +INF

New Utility for Generating Splits

A new command line utility was created to generate split points from 1 or more rfiles. One or more HDFS directories can be given as well. The utility will iterate over all the files provided and determine the proper split points based on either the size or number given. It uses Apache Datasketches to get the split points from the data. #2361 #2368

New Option for Cloning Offline

Added option to leave cloned tables offline #1474 #1475

New Max Tablets Option in Bulk Import

The property table.bulk.max.tablets was created in new bulk import technique. This property acts as a cluster performance failsafe to prevent a single ingested file from being distributed across too much of a cluster. The value is enforced by the new bulk import technique and is the maximum number of tablets allowed for one bulk import file. When this property is set, an error will be thrown when the value is exceeded during a bulk import. #1614

New Health Check Thread in TabletServer

A new thread was added to the tablet server to periodically verify tablet metadata. #2320 This thread also prints to the debug log how long it takes the tserver to scan the metadata table. The property tserver.health.check.interval was added to control the frequency at which this health check takes place. #2583

New ability for user to define context classloaders

Deprecated the existing VFS ClassLoader for eventual removal and created a new mechanism for users to load their own classloader implementations. The new VFS classloader and VFS context classloaders are in a new repo and can now be specified using Java’s own system properties. Alternatively, one can set their own classloader (they always could do this). #1747 #1715

Please reference the Known Issues section of the 2.0.1 release notes for an issue affecting the VFSClassLoader.

Change in uncaught Exception/Error handling in server-side threads

Consolidated and normalized thread pool and thread creation. All threads created through this code path will have an UncaughtExceptionHandler attached to it that will log the fact that the Thread encountered an uncaught Exception and is now dead. When an Error is encountered in a server process, it will attempt to print a message to stderr then terminate the VM using Runtime.halt. On the client side, the default UncaughtExceptionHandler will only log the Exception/Error in the client and does not terminate the VM. Additionally, the user has the ability to set their own UncaughtExceptionHandler implementation on the client. #1808 #1818 #2554

Updated hash algorithm

With the default password Authenticator, Accumulo used to store password hashes using SHA-256 and using custom code to add a salt. In this release, we now use Apache commons-codec to store password hashes in the crypt(3) standard format. With this change, we’ve also defaulted to using the stronger SHA-512. Existing stored password hashes (if upgrading from an earlier version of Accumulo) will automatically be upgraded when users authenticate or change their passwords, and Accumulo will log a warning if it detects any passwords have not been upgraded. #1787 #1788 #1798 #1810

Various Performance improvements when deleting tables

  • Make delete table operations cancel user compactions #2030 #2169.
  • Prevent compactions from starting when delete table is called #2182 #2240.
  • Added check to not flush when table is being deleted #1887.
  • Added log message before waiting for deletes to finish #1881.
  • Added code to stop user flush if table is being deleted #1931

New Monitor Pages, Improvements & Features

  • A page was added to the Monitor that lists the active compactions and the longest running active compaction. As an optimization, this page will only fetch data if a user loads the page and will only do so a maximum of once a minute. This optimization was also added for the Active Scans page, along with the addition of a “Fetched” column indicating when the data was retrieved.
  • A new feature was added to the TabletServer page to help users identify which tservers are in recovery mode. When a tserver is recovering, its corresponding row in the TabletServer Status table will be highlighted.
  • A new page was also created for External Compactions that allows users to see the progress of compactions and other details about ongoing compactions (see below).

#2283 #2294 #2358 #2663

External Compactions

External Compactions Details

New tserver scan timeout property

The new property tserver.scan.results.max.timeout was added to allow configuration of the timeout. A bug was discovered where tservers were running out of memory, partially due to this timeout being so short. The default value is 1 second, but now it can be increased. It is the max time for the thrift client handler to wait for scan results before timing out. #2599 #2598

Always choose volumes for new tablet files

In #1389, we changed the behavior of the VolumeChooser. It now runs any time a new file is created. This means VolumeChooser decisions are no longer “sticky” for tablets. This allows tablets to balance their files across multiple HDFS volumes, instead of the first selected. Now, only the directory name is “sticky” for a tablet, but the volume is not. So, new files will appear in a directory named the same on different volumes that the VolumeChooser selects.

Iterators package is now public API

#1390 #1400 #1411 We declared that the core.iterators package is public API, so it will now follow the semver rules for public API.

Better accumulo-gc memory usage

#1543 #1650 Switch from batching file candidates to delete based on the amount of available memory, and instead use a fixed-size batching strategy. This allows the accumulo-gc to run consistently using a batch size that is configurable by the user. The user is responsbile for ensuring the process is given enough memory to accommodate the batch size they configure, but this makes the process much more consistent and predictable.

Log4j2

#1528 #1514 #1515 #1516 While we still use slf4j, we have upgraded the default logger binding to log4j2, which comes with a bunch of features, such as dynamic reconfiguration, colorized console logging, and more.

Added forEach method to Scanner

#1742 #1765 We added a forEach method to Scanner objects, so you can easily iterate over the results using a lambda / BiConsumer that accepts a key-value pair.

New public API to set multiple properties atomically

#2692 We added a new public API added to support setting multiple properties at once atomically using a read-modify-write pattern. This is available for table, namespace, and system properties, and is called modifyProperties(). This builds off a related change that allows us to more efficiently store and properties in ZooKeeper, which also results in fewer ZooKeeper watches.

Simplified cluster configuration

#2138 #2903 Modified the accumulo-cluster script to read the server locations from a single file, cluster.yaml, in the conf directory instead of multiple files (tserver, manager, gc, etc.). Starting the new scan server and compactor server types is supported using this new file. It also contains options for starting multiple Tablet and Scan Servers per host.

Other notable changes

  • #1174 #816 Abstract metadata and change root metadata schema
  • #1309 Explicitly prevent cloning metadata table to prevent poor user experience
  • #1313 #936 Store Root Tablet list of files in Zookeeper
  • #1294 #1299 Add optional -t tablename to importdirectory shell command.
  • #1332 Disable FileSystemMonitor checks of /proc by default (to be removed in future)
  • #1345 #1352 Optionally disable gc-initiated compactions/flushes
  • #1397 #1461 Replace relative paths in the metadata tables on upgrade.
  • #1456 #1457 Prevent catastrophic tserver shutdown by rate limiting the shutdown
  • #1053 #1060 #1576 Support multiple volumes in import table
  • #1568 Support multiple tservers / node in accumulo-service
  • #1644 #1645 Fix issue with minor compaction not retrying
  • #1660 Dropped unused MemoryManager property
  • #1764 #1783 Parallelize listcompactions in shell
  • #1797 Add table option to shell delete command.
  • #2039 #2045 Add bulk import option to ignore empty dirs
  • #2117 #2236 Make sorted recovery write to RFiles. New tserver.wal.sort.file. property to configure
  • #2076 Sorted recovery files can now be encrypted
  • #2441 Upgraded to Junit 5
  • #2462 Added SUBMITTED FaTE status to differentiate between things submitted vs. running
  • #2467 Added fate shell command option to cancel FaTE operations that are NEW or SUBMITTED
  • #2807 Added several troubleshooting utilities to the accumulo admin command.
  • #2820 #2900 du command performance improved by using the metadata table for computation instead of HDFS
  • #2966 Upgrade Thrift to 0.17.0

Upgrading

View the Upgrading Accumulo documentation for guidance.

2.1.0 GitHub Project

All tickets related to 2.1.0.

Known Issues

At the time of release, the following issues were known:

  • #3045 - External compactions may appear stuck until the coordinator is restarted
  • #3048 - The monitor may not show times in the correct format for the user’s locale
  • #3053 - ThreadPool creation is a bit spammy by default in the debug logs
  • #3057 - The monitor may have an annoying popup on the external compactions page if the coordinator is offline

View all releases in the archive