Apache Accumulo 2.1.0
19 Jan 2020
These are draft release notes for a future release of Accumulo!
Please view the latest release notes.
This version is not a Long Term Maintenance (non-LTM) release.
** DRAFT RELEASE NOTES **
This release is known to be incompatible with prior versions of the client libraries. That is, the 2.0.0 or 2.0.1 version of the client libraries will not be able to communicate with a 2.1.0 or later installaction of Accumulo, nor will the 2.1.0 or later version of the client libraries communicate with a 2.0.1 or earlier installation.
Significant changes were made to how Accumulo compacts files in this release. See compaction for details, below are some highlights.
- Multiple concurrent compactions per tablet on disjoint files is now supported. Previously only a single compaction could run on a tablet. This allows tablets that are running long compactions on large files to concurrently compact new smaller files that arrive.
- Multiple compaction thread pools per tablet server are now supported. Previously only a single thread pool existed within a tablet server for compactions. With a single thread pool, if all threads are working on long compactions it can starve quick compactions. Now compactions with little data can be processed by dedicated thread pools.
- Accumulo’s default algorithm for selecting files to compact was modified to select the smallest set of files that meet the compaction ratio criteria instead of the largest set. This change makes tablets more aggressive about reducing their number files while still doing logarithmic compaction work. This change also enables efficiently compacting new small files that arrive during a long running compaction.
- Having dedicated compaction threads pools for tables is now supported through configuration. The default configuration for Accumulo sets up dedicated thread pools for compacting the Accumulo metadata table.
- Merging minor compactions were dropped. These were added to Accumulo to address the problem of new files arriving while a long running compaction was running. Merging minor compactions could cause O(N^2) compaction work. The new compaction changes in this release can satisfy this use case while doing a logarithmic amount of work.
CompactionStrategy was deprecated in favor of new public APIs. See its javadoc for more information.
Fixed GC Metadata hotspots
Prior to this release, Accumulo stored GC file candidates in the metadata table
using rows of the form
~del<URI>. This row schema lead to uneven load on
the metadata table and metadata tablets that were eventually never used. In #1043 the row format was changed to
~del<hash(URI)><URI> resulting in
even load on the metadata table and even data spread in the tablets. After
upgrading, there may still be splits in the metadata table using the old row
format. These splits can be merged away as shown in the example below which
starts off with splits generated from the old and new row schema. The old
splits with the prefix
~delhdfs are merged away.
root@uno> getsplits -t accumulo.metadata 2< ~ ~del55 ~dela7 ~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000a0.rf ~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000kb.rf root@uno> merge -t accumulo.metadata -b ~delhdfs -e ~delhdfs~ root@uno> getsplits -t accumulo.metadata 2< ~ ~del55 ~dela7
Master Renamed to Manager
In order to support more inclusive language in our code, the Accumulo team has renamed all references to the word “master” to “manager” (with the exception of deprecated classes and packages retained for compatibility). This change includes the master process, configuration properties with master in the name, utilities with master in the name, and packages/classes in the code base. Where these changes affect the public API, the deprecated “master” name will still be supported until at least Accumulo 3.0.
Important One particular change to be aware of is that certain state for the manager process is stored in ZooKeeper, previously in under a directory named
masters. This directory has been renamed to
managers, and the upgrade will happen automatically if you launch Accumulo using the provided scripts. However, if you do not use the built in scripts (e.g., accumulo-cluster or accumulo-service), then you will need to perform a one-time upgrade of the ZooKeeper state by executing the
Some other specific examples of these changes include:
- All configuration properties starting with
master.have been renamed to start with
master.*property names in the site configuration file (or passed on the command-line) are converted internally to the new name, and a warning is printed. However, the old name can still be used until at least the 3.0 release of Accumulo. Any
master.*properties that have been set in ZooKeeper will be automatically converted to the new
manager.*name when Accumulo is upgraded. The old property names can still be used by the
configshell command or via the methods accessible via
AccumuloClient, but a warning will be generated when the old names are used. You are encouraged to update all references to
masterin your site configuration files to
managerwhen installing Accumulo 2.1.
- The tablet balancers in the
org.apache.accumulo.server.master.balancerpackage have all been relocated to
org.apache.accumulo.server.manager.balancer. DefaultLoadBalancer has been also renamed to SimpleLoadBalancer along with the move. The default balancer has been updated from
org.apache.accumulo.server.manager.balancer.TableLoadBalancer, and the default per-table balancer has been updated from
org.apache.accumulo.server.manager.balancer.SimpleLoadBalancer. If you have customized the tablet balancer configuration, you are strongly encouraged to update your configuration to reference the updated balancer names. If you written a custom tablet balancer, it should be updated to implement the new interface
org.apache.accumulo.server.manager.balancer.TabletBalancerrather than extending the deprecated abstract
- The configuration file
mastersfor identifying the manager host(s) has been deprecated. If this file is found, a warning will be printed. The replacement file
managersshould be used (i.e., rename your masters file to managers) instead.
masterargument to the
accumulo-servicescript has been deprecated, and the replacement
managerargument should be used instead.
-masterargument to the
org.apache.accumulo.server.util.ZooZaputility has been deprecated and the replacement
-managerargument should be used instead.
GetMasterStatsutility has been renamed to
org.apache.accumulo.master.state.SetGoalStateis deprecated, and any custom scripts that invoke this utility should be updated to call
minicluster.propertieshas been deprecated and
managerMemoryshould be used instead in any
minicluster.propertiesfiles you have configured.
View all releases in the archive