Apache Accumulo 2.1.0

19 Jan 2020

** DRAFT RELEASE NOTES **

Binary Incompatibility

This release is known to be incompatible with prior versions of the client libraries. That is, the 2.0.0 or 2.0.1 version of the client libraries will not be able to communicate with a 2.1.0 or later installaction of Accumulo, nor will the 2.1.0 or later version of the client libraries communicate with a 2.0.1 or earlier installation.

Notable Changes

Compaction Changes

Significant changes were made to how Accumulo compacts files in this release. See compaction for details, below are some highlights.

  • Multiple concurrent compactions per tablet on disjoint files is now supported. Previously only a single compaction could run on a tablet. This allows tablets that are running long compactions on large files to concurrently compact new smaller files that arrive.
  • Multiple compaction thread pools per tablet server are now supported. Previously only a single thread pool existed within a tablet server for compactions. With a single thread pool, if all threads are working on long compactions it can starve quick compactions. Now compactions with little data can be processed by dedicated thread pools.
  • Accumulo’s default algorithm for selecting files to compact was modified to select the smallest set of files that meet the compaction ratio criteria instead of the largest set. This change makes tablets more aggressive about reducing their number files while still doing logarithmic compaction work. This change also enables efficiently compacting new small files that arrive during a long running compaction.
  • Having dedicated compaction threads pools for tables is now supported through configuration. The default configuration for Accumulo sets up dedicated thread pools for compacting the Accumulo metadata table.
  • Merging minor compactions were dropped. These were added to Accumulo to address the problem of new files arriving while a long running compaction was running. Merging minor compactions could cause O(N^2) compaction work. The new compaction changes in this release can satisfy this use case while doing a logarithmic amount of work.

CompactionStrategy was deprecated in favor of new public APIs. See its javadoc for more information.

Fixed GC Metadata hotspots

Prior to this release, Accumulo stored GC file candidates in the metadata table using rows of the form ~del<URI>. This row schema lead to uneven load on the metadata table and metadata tablets that were eventually never used. In #1043 the row format was changed to ~del<hash(URI)><URI> resulting in even load on the metadata table and even data spread in the tablets. After upgrading, there may still be splits in the metadata table using the old row format. These splits can be merged away as shown in the example below which starts off with splits generated from the old and new row schema. The old splits with the prefix ~delhdfs are merged away.

root@uno> getsplits -t accumulo.metadata 
2<
~
~del55
~dela7
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000a0.rf
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000kb.rf
root@uno> merge -t accumulo.metadata -b ~delhdfs -e ~delhdfs~
root@uno> getsplits -t accumulo.metadata 
2<
~
~del55
~dela7

Master Renamed to Manager

In order to support more inclusive language in our code, the Accumulo team has renamed all references to the word “master” to “manager” (with the exception of deprecated classes and packages retained for compatibility). This change includes the master process, configuration properties with master in the name, utilities with master in the name, and packages/classes in the code base. Where these changes affect the public API, the deprecated “master” name will still be supported until at least Accumulo 3.0.

Important One particular change to be aware of is that certain state for the manager process is stored in ZooKeeper, previously in under a directory named masters. This directory has been renamed to managers, and the upgrade will happen automatically if you launch Accumulo using the provided scripts. However, if you do not use the built in scripts (e.g., accumulo-cluster or accumulo-service), then you will need to perform a one-time upgrade of the ZooKeeper state by executing the RenameMasterDirInZK utility:

  ${ACCUMULO_HOME}/bin/accumulo org.apache.accumulo.manager.upgrade.RenameMasterDirInZK

Some other specific examples of these changes include:

  • All configuration properties starting with master. have been renamed to start with manager. instead. The master.* property names in the site configuration file (or passed on the command-line) are converted internally to the new name, and a warning is printed. However, the old name can still be used until at least the 3.0 release of Accumulo. Any master.* properties that have been set in ZooKeeper will be automatically converted to the new manager.* name when Accumulo is upgraded. The old property names can still be used by the config shell command or via the methods accessible via AccumuloClient, but a warning will be generated when the old names are used. You are encouraged to update all references to master in your site configuration files to manager when installing Accumulo 2.1.
  • The tablet balancers in the org.apache.accumulo.server.master.balancer package have all been relocated to org.apache.accumulo.server.manager.balancer. DefaultLoadBalancer has been also renamed to SimpleLoadBalancer along with the move. The default balancer has been updated from org.apache.accumulo.server.master.balancer.TableLoadBalancer to org.apache.accumulo.server.manager.balancer.TableLoadBalancer, and the default per-table balancer has been updated from org.apache.accumulo.server.master.balancer.DefaultLoadBalancer to org.apache.accumulo.server.manager.balancer.SimpleLoadBalancer. If you have customized the tablet balancer configuration, you are strongly encouraged to update your configuration to reference the updated balancer names. If you written a custom tablet balancer, it should be updated to implement the new interface org.apache.accumulo.server.manager.balancer.TabletBalancer rather than extending the deprecated abstract org.apache.accumulo.server.master.balancer.TabletBalancer.
  • The configuration file masters for identifying the manager host(s) has been deprecated. If this file is found, a warning will be printed. The replacement file managers should be used (i.e., rename your masters file to managers) instead.
  • The master argument to the accumulo-service script has been deprecated, and the replacement manager argument should be used instead.
  • The -master argument to the org.apache.accumulo.server.util.ZooZap utility has been deprecated and the replacement -manager argument should be used instead.
  • The GetMasterStats utility has been renamed to GetManagerStats.
  • org.apache.accumulo.master.state.SetGoalState is deprecated, and any custom scripts that invoke this utility should be updated to call org.apache.accumulo.manager.state.SetGoalState instead.
  • masterMemory in minicluster.properties has been deprecated and managerMemory should be used instead in any minicluster.properties files you have configured.

View all releases in the archive