Apache Accumulo 1.6.6

18 Sep 2016

Apache Accumulo 1.6.6 is a maintenance release on the 1.6 version branch. This release contains changes from more than 40 issues, comprised of bug-fixes, performance improvements, build quality improvements, and more. See JIRA for a complete list.

Below are resources for this release:

Users of any previous 1.6.x release are strongly encouraged to update as soon as possible to benefit from the improvements with very little concern in change of underlying functionality.

As of this release, active development has ceased for the 1.6 release line, so users should consider upgrading to a newer, actively maintained version when they can. While the developers may release another 1.6 version to address a severe issue, there’s a strong possibility that this will be the last 1.6 release. That would also mean that this will be the last Accumulo version to support Java 6 and Hadoop 1.

Highlights

Write-Ahead Logs can be prematurely deleted

There were cases where the Accumulo Garbage Collector may inadvertently delete a WAL for a tablet server that it has erroneously determined to be down, causing data loss. This has been corrected. See ACCUMULO-4157 for additional detail.

Upgrade to Commons-VFS 2.1

UPDATE (20161202): This change was reverted prior to releasing 1.6.6, because it broke the build with Hadoop 1. Hadoop 1 support was dropped in 1.7.0 and later, so builds were not affected in those branches. It is still possible to Apache Commons VFS 2.1, but you may need backport the patch. See ACCUMULO-3470 for details.

Upgrading to Apache Commons VFS 2.1 fixes several issues with classloading out of HDFS. For further detail see ACCUMULO-4146. Additional fixes to a potential HDFS class loading deadlock situation were made in ACCUMULO-4341.

Native Map failed to increment mutation count properly

There was a bug (ACCUMULO-4148) where multiple put calls with identical keys and no timestamp would exhibit different behaviour depending on whether native maps were enabled or not. This behaviour would result in hidden mutations with native maps, and has been corrected.

Open WAL files could prevent DataNode decommission

An improvement was introduced to allow a max age before WAL files would be automatically rolled. Without a max age, they could stay open for writing indefinitely, blocking the Hadoop DataNode decommissioning process. For more information, see ACCUMULO-4004.

Remove unnecessary copy of cached RFile index blocks

Accumulo maintains an cache for file blocks in-memory as a performance optimization. This can be done safely because Accumulo RFiles are immutable, thus their blocks are also immutable. There are two types of these blocks: index and data blocks. Index blocks refer to the b-tree style index inside of each Accumulo RFile, while data blocks contain the sorted Key-Value pairs. In previous versions, when Accumulo extracted an Index block from the in-memory cache, it would copy the data. ACCUMULO-4164 removes this unnecessary copy as the contents are immutable and can be passed by reference. Ensuring that the Index blocks are not copied when accessed from the cache is a big performance gain at the file-access level.

Analyze Key-length to avoid choosing large Keys for RFile Index blocks

Accumulo’s RFile index blocks are made up of a Key which exists in the file and points to that specific location in the corresponding RFile data block. Thus, the size of the RFile index blocks is largely dominated by the size of the Keys which are used by the index. ACCUMULO-4314 is an improvement which uses statistics on the length of the Keys in the Rfile to avoid choosing Keys for the index whose length is greater than three standard deviations for the RFile. By choosing smaller Keys for the index, Accumulo can access the RFile index faster and keep more Index blocks cached in memory. Initial tests showed that with this change, the RFile index size was nearly cut in half.

Gson version bump

Due to an upstream bug with Gson 2.2.2, we’ve bumped our bundled dependency (ACCUMULO-4345) to version 2.2.4. Please take note of this when you upgrade, if you were using the version shipped with Accumulo, and were relying on the buggy behavior in the previous version in your own code.

Minor performance improvements.

A performance issue was identified and corrected (ACCUMULO-1755) where the BatchWriter would block calls to addMutation while looking up destination tablet server metadata. The writer has been fixed to allow both operations in parallel.

Other Notable Changes

  • ACCUMULO-4155 No longer publish javadoc for non-public API to website. (Still available in javadoc jars in maven)
  • ACCUMULO-4334 Ingest rates reported through JMX did not match rates reported by Monitor.
  • ACCUMULO-4335 Error conditions that result in a Halt should ensure non-zero process exit code.

Testing

Each unit and functional test only runs on a single node, while the RandomWalk and Continuous Ingest tests run on any number of nodes. Agitation refers to randomly restarting Accumulo processes and Hadoop Datanode processes, and, in HDFS High-Availability instances, forcing NameNode failover.

OS/Environment Hadoop Nodes ZooKeeper HDFS HA Tests
CentOS 7 1.2.1 1 3.3.6 No Unit tests and Integration Tests
CentOS 7 2.2.0 1 3.3.6 No Unit tests and Integration Tests

View all releases in the archive