Apache Accumulo 1.5.1

06 Mar 2014

Apache Accumulo 1.5.1 is a maintenance release on the 1.5 version branch. This release contains changes from over 200 issues, comprised of bug fixes (client side and server side), new test cases, and updated Hadoop support contributed by over 30 different contributors and committers.

Below are resources for this release:

As this is a maintenance release, Apache Accumulo 1.5.1 has no client API incompatibilities over Apache Accumulo 1.5.0 and requires no manual upgrade process. Users of 1.5.0 are strongly encouraged to update as soon as possible to benefit from the improvements.

Notable Improvements

While new features are typically not added in a bug-fix release as 1.5.1, the community does create a variety of improvements that are API compatible. Contained here are some of the more notable improvements.

PermGen Leak from Client API

Accumulo’s client code creates background threads that users presently cannot stop through the API. This is quick to cause problems when invoking the Accumulo API in application containers such as Apache Tomcat or JBoss and repeatedly redeploying an application. ACCUMULO-2128 introduces a static utility, org.apache.accumulo.core.util.CleanUp, that users can invoke as part of a teardown hook in their container that will stop these threads and avoid the eventual OutOfMemoryError “PermGen space”.

Prefer IPv4 when starting Accumulo processes

While Hadoop does not support IPv6 networks, attempting to run on a system that does not have IPv6 completely disabled can cause strange failures. ACCUMULO-2262 invokes the JVM-provided configuration parameter at process startup to prefer IPv4 over IPv6.

Memory units in configuration

In previous versions, units of memory had to be provided as upper-case (e.g. ‘2G’, not ‘2g’). Additionally, a non-intuitive error was printed when a lower-case unit was provided. ACCUMULO-1933 allows lower-case memory units in all Accumulo configurations.

Apache Thrift maximum frame size

Apache Thrift is used as the internal RPC service. ACCUMULO-2360 allows users to configure the maximum frame size an Accumulo server will read. This prevents non Accumulo client from connecting and causing memory exhaustion.

MultiTableBatchWriter concurrency

The MultiTableBatchWriter is a class which allows multiple tables to be written to from a single object that maintains a single buffer for caching Mutations across all tables. This is desirable as it greatly simplifies the JVM heap usage from caching Mutations across many tables. Sadly, in Apache Accumulo 1.5.0, concurrent access to a single MultiTableBatchWriter heavily suffered from synchronization issues. ACCUMULO-1833 introduces a fix which alleviates the blocking and idle-wait that previously occurred when multiple threads accessed a single MultiTableBatchWriter instance concurrently.

Hadoop Versions

Since Apache Accumulo 1.5.0 was released, Apache Hadoop 2.2.0 was also released as the first generally available (GA) Hadoop 2 release. This was a very exciting release for a number of reasons, but this also caused additional effort on Accumulo’s part to ensure that Apache Accumulo continues to work across multiple Hadoop versions. Apache Accumulo 1.5.1 should function with any recent Hadoop 1 or Hadoop 2 without any special steps, tricks or instructions required.

Notable Bug Fixes

As with any Apache Accumulo release, we have numerous bug fixes that have been fixed. Most are very subtle and won’t affect the common user; however, some notable bugs were resolved as a part of 1.5.1 that are rather common.

Failure of ZooKeeper server in quorum kills connected Accumulo services

Apache ZooKeeper provides a number of wonderful features that Accumulo uses to accomplish a variety of tasks, most notably a distributed locking service. Typically, multiple ZooKeeper servers are run to provide resilience against a certain number of node failures. ACCUMULO-1572 resolves an issue where Accumulo processes would kill themselves when the ZooKeeper server they were communicating with died instead of failing over to another ZooKeeper server in the quorum.

Monitor table state isn’t updated

The Accumulo Monitor contains a column for the state of each table in the Accumulo instance. The previous resolution was to restart the Monitor process when it got in this state. ACCUMULO-1920 resolves an issue where the Monitor would not see updates from ZooKeeper.

Two locations for the same extent

The !METADATA table is the brains behind the data storage for each table, tracking information like which files comprise a Tablet, and which TabletServers are hosting which Tablets. ACCUMULO-2057 fixes an issue where the !METADATA table contained multiple locations (hosting server) for a single Tablet.

Deadlock on !METADATA tablet unload

Tablets are unloaded, typically, when a shutdown request is issued. ACCUMULO-1143 resolves a potential deadlock issue when a merging-minor compaction is issued to flush in-memory data to disk before unloading a Tablet.

Other notable fixes

Known Issues

When using Accumulo 1.5 and Hadoop 2, Accumulo will call hsync() on HDFS. Calling hsync improves durability by ensuring data is on disk (where other older Hadoop versions might lose data in the face of power failure); however, calling hsync frequently does noticeably slow writes. A simple work around is to increase the value of the tserver.mutation.queue.max configuration parameter via accumulo-site.xml.

A value of “4M” is a better recommendation, and memory consumption will increase by the number of concurrent writers to that TabletServer. For example, a value of 4M with 50 concurrent writers would equate to approximately 200M of Java heap being used for mutation queues.

For more information, see ACCUMULO-1950 and this comment.

Documentation

The following documentation updates were made:

Testing

Below is a list of all platforms that 1.5.1 was tested against by developers. Each Apache Accumulo release has a set of tests that must be run before the candidate is capable of becoming an official release. That list includes the following:

  1. Successfully run all unit tests
  2. Successfully run all functional test (test/system/auto)
  3. Successfully complete two 24-hour RandomWalk tests (LongClean module), with and without “agitation”
  4. Successfully complete two 24-hour Continuous Ingest tests, with and without “agitation”, with data verification
  5. Successfully complete two 72-hour Continuous Ingest tests, with and without “agitation”

Each unit and functional test only runs on a single node, while the RandomWalk and Continuous Ingest tests run on any number of nodes. Agitation refers to randomly restarting Accumulo processes and Hadoop Datanode processes, and, in HDFS High-Availability instances, forcing NameNode failover.

OS Hadoop Nodes ZooKeeper HDFS High-Availability Tests
CentOS 6.5 HDP 2.0 (Apache 2.2.0) 6 HDP 2.0 (Apache 3.4.5) Yes (QJM) All required tests
CentOS 6.4 CDH 4.5.0 (2.0.0+cdh4.5.0) 7 CDH 4.5.0 (3.4.5+cdh4.5.0) Yes (QJM) Unit, functional and 24hr Randomwalk w/ agitation
CentOS 6.4 CDH 4.5.0 (2.0.0+cdh4.5.0) 7 CDH 4.5.0 (3.4.5+cdh4.5.0) Yes (QJM) 2x 24/hr continuous ingest w/ verification
CentOS 6.3 Apache 1.0.4 1 Apache 3.3.5 No Local testing, unit and functional tests
RHEL 6.4 Apache 2.2.0 10 Apache 3.4.5 No Functional tests

View all releases in the archive