Apache Accumulo 2.1.0
19 Jan 2020
These are draft release notes for a future release of Accumulo!
Please view the latest release notes.
This version is not a Long Term Maintenance (non-LTM) release.
** DRAFT RELEASE NOTES **
Binary Incompatibility
This release is known to be incompatible with prior versions of the client libraries. That is, the 2.0.0 or 2.0.1 version of the client libraries will not be able to communicate with a 2.1.0 or later installaction of Accumulo, nor will the 2.1.0 or later version of the client libraries communicate with a 2.0.1 or earlier installation.
Notable Changes
Compaction Changes
Significant changes were made to how Accumulo compacts files in this release. See compaction for details, below are some highlights.
- Multiple concurrent compactions per tablet on disjoint files is now supported. Previously only a single compaction could run on a tablet. This allows tablets that are running long compactions on large files to concurrently compact new smaller files that arrive.
- Multiple compaction thread pools per tablet server are now supported. Previously only a single thread pool existed within a tablet server for compactions. With a single thread pool, if all threads are working on long compactions it can starve quick compactions. Now compactions with little data can be processed by dedicated thread pools.
- Accumulo’s default algorithm for selecting files to compact was modified to select the smallest set of files that meet the compaction ratio criteria instead of the largest set. This change makes tablets more aggressive about reducing their number files while still doing logarithmic compaction work. This change also enables efficiently compacting new small files that arrive during a long running compaction.
- Having dedicated compaction threads pools for tables is now supported through configuration. The default configuration for Accumulo sets up dedicated thread pools for compacting the Accumulo metadata table.
- Merging minor compactions were dropped. These were added to Accumulo to address the problem of new files arriving while a long running compaction was running. Merging minor compactions could cause O(N^2) compaction work. The new compaction changes in this release can satisfy this use case while doing a logarithmic amount of work.
CompactionStrategy was deprecated in favor of new public APIs. See its javadoc for more information.
Fixed GC Metadata hotspots
Prior to this release, Accumulo stored GC file candidates in the metadata table
using rows of the form ~del<URI>
. This row schema lead to uneven load on
the metadata table and metadata tablets that were eventually never used. In #1043 the row format was changed to ~del<hash(URI)><URI>
resulting in
even load on the metadata table and even data spread in the tablets. After
upgrading, there may still be splits in the metadata table using the old row
format. These splits can be merged away as shown in the example below which
starts off with splits generated from the old and new row schema. The old
splits with the prefix ~delhdfs
are merged away.
root@uno> getsplits -t accumulo.metadata
2<
~
~del55
~dela7
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000a0.rf
~delhdfs://localhost:8020/accumulo/tables/2/default_tablet/F00000kb.rf
root@uno> merge -t accumulo.metadata -b ~delhdfs -e ~delhdfs~
root@uno> getsplits -t accumulo.metadata
2<
~
~del55
~dela7
Master Renamed to Manager
In order to support more inclusive language in our code, the Accumulo team has renamed all references to the word “master” to “manager” (with the exception of deprecated classes and packages retained for compatibility). This change includes the master process, configuration properties with master in the name, utilities with master in the name, and packages/classes in the code base. Where these changes affect the public API, the deprecated “master” name will still be supported until at least Accumulo 3.0.
Important One particular change to be aware of is that certain state for the manager process is stored in ZooKeeper, previously in under a directory named
masters
. This directory has been renamed tomanagers
, and the upgrade will happen automatically if you launch Accumulo using the provided scripts. However, if you do not use the built in scripts (e.g., accumulo-cluster or accumulo-service), then you will need to perform a one-time upgrade of the ZooKeeper state by executing theRenameMasterDirInZK
utility:${ACCUMULO_HOME}/bin/accumulo org.apache.accumulo.manager.upgrade.RenameMasterDirInZK
Some other specific examples of these changes include:
- All configuration properties starting with
master.
have been renamed to start withmanager.
instead. Themaster.*
property names in the site configuration file (or passed on the command-line) are converted internally to the new name, and a warning is printed. However, the old name can still be used until at least the 3.0 release of Accumulo. Anymaster.*
properties that have been set in ZooKeeper will be automatically converted to the newmanager.*
name when Accumulo is upgraded. The old property names can still be used by theconfig
shell command or via the methods accessible viaAccumuloClient
, but a warning will be generated when the old names are used. You are encouraged to update all references tomaster
in your site configuration files tomanager
when installing Accumulo 2.1. - The tablet balancers in the
org.apache.accumulo.server.master.balancer
package have all been relocated toorg.apache.accumulo.server.manager.balancer
. DefaultLoadBalancer has been also renamed to SimpleLoadBalancer along with the move. The default balancer has been updated fromorg.apache.accumulo.server.master.balancer.TableLoadBalancer
toorg.apache.accumulo.server.manager.balancer.TableLoadBalancer
, and the default per-table balancer has been updated fromorg.apache.accumulo.server.master.balancer.DefaultLoadBalancer
toorg.apache.accumulo.server.manager.balancer.SimpleLoadBalancer
. If you have customized the tablet balancer configuration, you are strongly encouraged to update your configuration to reference the updated balancer names. If you written a custom tablet balancer, it should be updated to implement the new interfaceorg.apache.accumulo.server.manager.balancer.TabletBalancer
rather than extending the deprecated abstractorg.apache.accumulo.server.master.balancer.TabletBalancer
. - The configuration file
masters
for identifying the manager host(s) has been deprecated. If this file is found, a warning will be printed. The replacement filemanagers
should be used (i.e., rename your masters file to managers) instead. - The
master
argument to theaccumulo-service
script has been deprecated, and the replacementmanager
argument should be used instead. - The
-master
argument to theorg.apache.accumulo.server.util.ZooZap
utility has been deprecated and the replacement-manager
argument should be used instead. - The
GetMasterStats
utility has been renamed toGetManagerStats
. org.apache.accumulo.master.state.SetGoalState
is deprecated, and any custom scripts that invoke this utility should be updated to callorg.apache.accumulo.manager.state.SetGoalState
instead.masterMemory
inminicluster.properties
has been deprecated andmanagerMemory
should be used instead in anyminicluster.properties
files you have configured.
View all releases in the archive