Apache Accumulo 1.6.0

02 May 2014

Apache Accumulo 1.6.0 adds some major new features and fixes many bugs. This release contains changes from 609 issues contributed by 36 contributors and committers.

Below are resources for this release:

Accumulo 1.6.0 runs on Hadoop 1, however Hadoop 2 with HA namenode is recommended for production systems. In addition to HA, Hadoop 2 also offers better data durability guarantees, in the case when nodes lose power, than Hadoop 1.

Notable Improvements

Multiple volume support

BigTable’s design allows for its internal metadata to automatically spread across multiple nodes. Accumulo has followed this design and scales very well as a result. There is one impediment to scaling though, and this is the HDFS namenode. There are two problems with the namenode when it comes to scaling. First, the namenode stores all of its filesystem metadata in memory on a single machine. This introduces an upper bound on the number of files Accumulo can have. Second, there is an upper bound on the number of file operations per second that a single namenode can support. For example, a namenode can only support a few thousand delete or create file request per second.

To overcome this bottleneck, support for multiple namenodes was added under ACCUMULO-118. This change allows Accumulo to store its files across multiple namenodes. To use this feature, place comma separated list of namenode URIs in the new instance.volumes configuration property in accumulo-site.xml. When upgrading to 1.6.0 and multiple namenode support is desired, modify this setting only after a successful upgrade.

Table namespaces

Administering an Accumulo instance with many tables is cumbersome. To ease this, ACCUMULO-802 introduced table namespaces which allow tables to be grouped into logical collections. This allows configuration and permission changes to made to a namespace, which will apply to all of its tables.

Conditional Mutations

Accumulo now offers a way to make atomic read,modify,write row changes from the client side. Atomic test and set row operations make this possible. ACCUMULO-1000 added conditional mutations and a conditional writer. A conditional mutation has tests on columns that must pass before any changes are made. These test are executed in server processes while a row lock is held. Below is a simple example of making atomic row changes using conditional mutations.

  1. Read columns X,Y,SEQ into a,b,s from row R1 using an isolated scanner.
  2. For row R1 write conditional mutation X=f(a),Y=g(b),SEQ=s+1 if SEQ==s.
  3. If conditional mutation failed, then goto step 1.

The only built in test that conditional mutations support are equality and isNull. However, iterators can be configured on a conditional mutation to run before these test. This makes it possible to implement any number of test such as less than, greater than, contains, etc.

Encryption

Encryption is still an experimental feature, but much progress has been made since 1.5.0. Support for encrypting rfiles and write ahead logs were added in ACCUMULO-958 and ACCUMULO-980. Support for encrypting data over the wire using SSL was added in ACCUMULO-1009.

When a tablet server fails, its write ahead logs are sorted and stored in HDFS. In 1.6.0, encrypting these sorted write ahead logs is not supported. ACCUMULO-981 is open to address this issue.

Pluggable compaction strategies

One of the key elements of the BigTable design is use of the Log Structured Merge Tree. This entails sorting data in memory, writing out sorted files, and then later merging multiple sorted files into a single file. These automatic merges happen in the background and Accumulo decides when to merge files based comparing relative sizes of files to a compaction ratio. Before 1.6.0 adjusting the compaction ratio was the only way a user could control this process. ACCUMULO-1451 introduces pluggable compaction strategies which allow users to choose when and what files to compact. ACCUMULO-1808 adds a compaction strategy that prevents compaction of files over a configurable size.

Lexicoders

Accumulo only sorts data lexicographically. Getting something like a pair of (String,Integer) to sort correctly in Accumulo is tricky. It’s tricky because you only want to compare the integers if the strings are equal. It’s possible to make this sort properly in Accumulo if the data is encoded properly, but can be difficult. To make this easier ACCUMULO-1336 added Lexicoders to the Accumulo API. Lexicoders provide an easy way to serialize data so that it sorts properly lexicographically. Below is a simple example.

PairLexicoder plex = new PairLexicoder(new StringLexicoder(), new IntegerLexicoder());
byte[] ba1 = plex.encode(new ComparablePair<String, Integer>("b",1));
byte[] ba2 = plex.encode(new ComparablePair<String, Integer>("aa",1));
byte[] ba3 = plex.encode(new ComparablePair<String, Integer>("a",2));
byte[] ba4 = plex.encode(new ComparablePair<String, Integer>("a",1));
byte[] ba5 = plex.encode(new ComparablePair<String, Integer>("aa",-3));

//sorting ba1,ba2,ba3,ba4, and ba5 lexicographically will result in the same order as sorting the ComparablePairs

Locality groups in memory

In cases where a very small amount of data is stored in a locality group one would expect fast scans over that locality group. However this was not always the case because recently written data stored in memory was not partitioned by locality group. Therefore if a table had 100GB of data in memory and 1MB of that was in locality group A, then scanning A would have required reading all 100GB. ACCUMULO-112 changes this and partitions data by locality group as its written.

Service IP addresses

Previous versions of Accumulo always used IP addresses internally. This could be problematic in virtual machine environments where IP addresses change. In ACCUMULO-1585 this was changed, now Accumulo uses the exact hostnames from its config files for internal addressing.

All Accumulo processes running on a cluster are locatable via zookeeper. Therefore using well known ports is not really required. ACCUMULO-1664 makes it possible to for all Accumulo processes to use random ports. This makes it easier to run multiple Accumulo instances on a single node.

While Hadoop does not support IPv6 networks, attempting to run on a system that does not have IPv6 completely disabled can cause strange failures. ACCUMULO-2262 invokes the JVM-provided configuration parameter at process startup to prefer IPv4 over IPv6.

ViewFS

Multiple bug-fixes were made to support running Accumulo over multiple HDFS instances using ViewFS. ACCUMULO-2047 is the parent ticket that contains numerous fixes to enable this support.

Maven Plugin

This version of Accumulo is accompanied by a new maven plugin for testing client apps (ACCUMULO-1030). You can execute the accumulo-maven-plugin inside your project by adding the following to your pom.xml’s build plugins section:

<plugin>
  <groupId>org.apache.accumulo</groupId>
  <artifactId>accumulo-maven-plugin</artifactId>
  <version>1.6.0</version>
  <configuration>
    <instanceName>plugin-it-instance</instanceName>
    <rootPassword>ITSecret</rootPassword>
  </configuration>
  <executions>
    <execution>
      <id>run-plugin</id>
      <goals>
        <goal>start</goal>
        <goal>stop</goal>
      </goals>
    </execution>
  </executions>
</plugin>

This plugin is designed to work in conjunction with the maven-failsafe-plugin. A small test instance of Accumulo will run during the pre-integration-test phase of the Maven build lifecycle, and will be stopped in the post-integration-test phase. Your integration tests, executed by maven-failsafe-plugin can access this instance with a MiniAccumuloInstance connector (the plugin uses MiniAccumuloInstance, internally), as in the following example:

private static Connector conn;

@BeforeClass
public static void setUp() throws Exception {
  String instanceName = "plugin-it-instance";
  Instance instance = new MiniAccumuloInstance(instanceName, new File("target/accumulo-maven-plugin/" + instanceName));
  conn = instance.getConnector("root", new PasswordToken("ITSecret"));
}

This plugin is quite limited, currently only supporting an instance name and a root user password as configuration parameters. Improvements are expected in future releases, so feedback is welcome and appreciated (file bugs/requests under the “maven-plugin” component in the Accumulo JIRA).

Packaging

One notable change that was made to the binary tarball is the purposeful omission of a pre-built copy of the Accumulo “native map” library. This shared library is used at ingest time to implement an off-JVM-heap sorted map that greatly increases ingest throughput while side-stepping issues such as JVM garbage collection pauses. In earlier releases, a pre-built copy of this shared library was included in the binary tarball; however, the decision was made to omit this due to the potential variance in toolchains on the target system.

It is recommended that users invoke the provided build_native_library.sh before running Accumulo:

$ACCUMULO_HOME/bin/build_native_library.sh

Be aware that you will need a C++ compiler/toolchain installed to build this library. Check your GNU/Linux distribution documentation for the package manager command.

Size-Based Constraint on New Tables

A Constraint is an interface that can determine if a Mutation should be applied or rejected server-side. After ACCUMULO-466, new tables that are created in 1.6.0 will automatically have the DefaultKeySizeConstraint set. As performance can suffer when large Keys are inserted into a table, this Constraint will reject any Key that is larger than 1MB. If this constraint is undesired, it can be removed using the constraint shell command. See the help message on the command for more information.

Other notable changes

  • ACCUMULO-842 Added FATE administration to shell
  • ACCUMULO-1042 CTRL-C no longer kills shell
  • ACCUMULO-1345 Stuck compactions now log a warning with a stack trace, tablet id, and filename.
  • ACCUMULO-1442 JLine2 support was added to the shell. This adds features like history search and other nice things GNU Readline has.
  • ACCUMULO-1481 The root tablet is now the root table.
  • ACCUMULO-1537 Python functional test were converted to maven Integration test that use MAC
  • ACCUMULO-1566 When read-ahead starts in the scanner is now configurable.
  • ACCUMULO-1650 Made common admin commands easier to run, try bin/accumulo admin --help
  • ACCUMULO-1667 Added a synchronous version of online and offline table
  • ACCUMULO-1706 Admin utilities now respect EPIPE
  • ACCUMULO-1833 Multitable batch writer is faster now when used by multiple threads
  • ACCUMULO-1933 Lower case can be given for memory units now.
  • ACCUMULO-1985 Configuration to bind Monitor on all network interfaces.
  • ACCUMULO-2128 Provide resource cleanup via static utility
  • ACCUMULO-2360 Allow configuration of the maximum thrift message size a server will read.

Notable Bug Fixes

  • ACCUMULO-324 System/site constraints and iterators should NOT affect the METADATA table
  • ACCUMULO-335 Can’t batchscan over the !METADATA table
  • ACCUMULO-391 Added support for reading from multiple tables in a Map Reduce job.
  • ACCUMULO-1018 Client does not give informative message when user can not read table
  • ACCUMULO-1492 bin/accumulo should follow symbolic links
  • ACCUMULO-1572 Single node zookeeper failure kills connected Accumulo servers
  • ACCUMULO-1661 AccumuloInputFormat cannot fetch empty column family
  • ACCUMULO-1696 Deep copy in the compaction scope iterators can throw off the stats
  • ACCUMULO-1698 stop-here doesn’t consider system hostname
  • ACCUMULO-1901 start-here.sh starts only one GC process even if more are defined
  • ACCUMULO-1920 Monitor was not seeing zookeeper updates for tables
  • ACCUMULO-1994 Proxy does not handle Key timestamps correctly
  • ACCUMULO-2037 Tablets are now assigned to the last location
  • ACCUMULO-2174 VFS Classloader has potential to collide localized resources
  • ACCUMULO-2225 Need to better handle DNS failure propagation from Hadoop
  • ACCUMULO-2234 Cannot run offline mapreduce over non-default instance.dfs.dir value
  • ACCUMULO-2261 Duplicate locations for a Tablet.
  • ACCUMULO-2334 Lacking fallback when ACCUMULO_LOG_HOST isn’t set
  • ACCUMULO-2408 metadata table not assigned after root table is loaded
  • ACCUMULO-2519 FATE operation failed across upgrade

Known Issues

Slower writes than previous Accumulo versions

When using Accumulo 1.6 and Hadoop 2, Accumulo will call hsync() on HDFS. Calling hsync improves durability by ensuring data is on disk (where other older Hadoop versions might lose data in the face of power failure); however, calling hsync frequently does noticeably slow writes. A simple work around is to increase the value of the tserver.mutation.queue.max configuration parameter via accumulo-site.xml.

A value of “4M” is a better recommendation, and memory consumption will increase by the number of concurrent writers to that TabletServer. For example, a value of 4M with 50 concurrent writers would equate to approximately 200M of Java heap being used for mutation queues.

For more information, see ACCUMULO-1950 and this comment.

Another possible cause of slower writes is the change in write ahead log replication between 1.4 and 1.5. Accumulo 1.4. defaulted to two loggers servers. Accumulo 1.5 and 1.6 store write ahead logs in HDFS and default to using three datanodes.

BatchWriter hold time error

If a BatchWriter fails with MutationsRejectedException and the message contains "# server errors 1" then it may be ACCUMULO-2388. To confirm this look in the tablet server logs for org.apache.accumulo.tserver.HoldTimeoutException around the time the BatchWriter failed. If this is happening often a possible work around is to set general.rpc.timeout to 240s.

Other known issues

  • ACCUMULO-981 Sorted write ahead logs are not encrypted.
  • ACCUMULO-1507 Dynamic Classloader still can’t keep proper track of jars
  • ACCUMULO-1588 Monitor XML and JSON differ
  • ACCUMULO-1628 NPE on deep copied dumped memory iterator
  • ACCUMULO-1708 ACCUMULO-2495 Out of memory errors do not always kill tservers leading to unexpected behavior
  • ACCUMULO-2008 Block cache reserves section for in-memory blocks
  • ACCUMULO-2059 Namespace constraints easily get clobbered by table constraints
  • ACCUMULO-2677 Tserver failure during map reduce reading from table can cause sub-optimal performance

Documentation updates

API Changes

The following deprecated methods were removed in ACCUMULO-1533

  • Many map reduce methods deprecated in ACCUMULO-769 were removed
  • SecurityErrorCode o.a.a.core.client.AccumuloSecurityException.getErrorCode() deprecated in ACCUMULO-970
  • Connector o.a.a.core.client.Instance.getConnector(AuthInfo) deprecated in ACCUMULO-1024
  • Connector o.a.a.core.client.ZooKeeperInstance.getConnector(AuthInfo) deprecated in ACCUMULO-1024
  • static String o.a.a.core.client.ZooKeeperInstance.getInstanceIDFromHdfs(Path) deprecated in ACCUMULO-1
  • static String ZooKeeperInstance.lookupInstanceName (ZooCache,UUID) deprecated in ACCUMULO-765
  • void o.a.a.core.client.ColumnUpdate.setSystemTimestamp(long) deprecated in ACCUMULO-786

Testing

Below is a list of all platforms that 1.6.0 was tested against by developers. Each Apache Accumulo release has a set of tests that must be run before the candidate is capable of becoming an official release. That list includes the following:

  1. Successfully run all unit tests
  2. Successfully run all functional test (test/system/auto)
  3. Successfully complete two 24-hour RandomWalk tests (LongClean module), with and without “agitation”
  4. Successfully complete two 24-hour Continuous Ingest tests, with and without “agitation”, with data verification
  5. Successfully complete two 72-hour Continuous Ingest tests, with and without “agitation”

Each unit and functional test only runs on a single node, while the RandomWalk and Continuous Ingest tests run on any number of nodes. Agitation refers to randomly restarting Accumulo processes and Hadoop Datanode processes, and, in HDFS High-Availability instances, forcing NameNode failover.

The following acronyms are used in the test testing table.

  • CI : Continuous Ingest
  • HA : High-Availability
  • IT : Integration test, run w/ mvn verify
  • RW : Random Walk
OS Java Hadoop Nodes ZooKeeper HDFS HA Version/Commit hash Tests
CentOS 6.5 CentOS OpenJDK 1.7 Apache 2.2.0 20 EC2 nodes Apache 3.4.5 No 1.6.0 RC1 + ACCUMULO_2668 patch 24-hour CI w/o agitation. Verified.
CentOS 6.5 CentOS OpenJDK 1.7 Apache 2.2.0 20 EC2 nodes Apache 3.4.5 No 1.6.0 RC2 24-hour RW (Conditional.xml module) w/o agitation
CentOS 6.5 CentOS OpenJDK 1.7 Apache 2.2.0 20 EC2 nodes Apache 3.4.5 No 1.6.0 RC5 24-hour CI w/ agitation. Verified.
CentOS 6.5 CentOS OpenJDK 1.6 and 1.7 Apache 1.2.1, 2.2.0 Single Apache 3.3.6 No 1.6.0 RC5 All unit and ITs w/ -Dhadoop.profile=2 and -Dhadoop.profile=1
Gentoo Sun JDK 1.6.0_45 Apache 1.2.1, 2.2.0, 2.3.0, 2.4.0 Single Apache 3.4.5 No 1.6.0 RC5 All unit and ITs. 2B entries ingested/verified with CI
CentOS 6.4 Sun JDK 1.6.0_31 CDH 4.5.0 7 CDH 4.5.0 Yes 1.6.0 RC4 and RC5 24-hour RW (LongClean) with and without agitation
CentOS 6.4 Sun JDK 1.6.0_31 CDH 4.5.0 7 CDH 4.5.0 Yes 3a1b38 72-hour CI with and without agitation. Verified.
CentOS 6.4 Sun JDK 1.6.0_31 CDH 4.5.0 7 CDH 4.5.0 Yes 1.6.0 RC2 24-hour CI without agitation. Verified.
CentOS 6.4 Sun JDK 1.6.0_31 CDH 4.5.0 7 CDH 4.5.0 Yes 1.6.0 RC3 24-hour CI with agitation. Verified.

View all releases in the archive