Apache Accumulo 1.6.4

03 Oct 2015

Apache Accumulo 1.6.4 is a maintenance release on the 1.6 version branch. This release contains changes from 21 issues, comprised of bug-fixes, performance improvements and better test cases. See JIRA for a complete list.

Below are resources for this release:

Users of any previous 1.6.x release are strongly encouraged to update as soon as possible to benefit from the improvements with very little concern in change of underlying functionality. Users of 1.4 or 1.5 that are seeking to upgrade to 1.6 should consider 1.6.4 as a starting point.

Silent data-loss via bulk imported files

A user recently reported that a simple bulk-import application would occasionally lose some records. Through investigation, it was found that when bulk imports into a table failed the initial assignment, the logic that automatically retries the imports was incorrectly choosing the tablets to import the files into. ACCUMULO-3967 contains more information on the cause and identification of the bug. The data-loss condition would only affect entire files. If records from a file exist in Accumulo, it is still guaranteed that all records within that imported file were successful.

As such, users who have bulk import applications using previous versions of Accumulo should verify that all of their data was correctly ingested into Accumulo and immediately update to Accumulo 1.6.4.

Other bug fixes

  • ACCUMULO-3979 Fixed an issue where the BulkImporter failed with an error message “QUERY_METADATA already started”.
  • ACCUMULO-3965 The listscans shell command did not contain the scanId attribute for currently running scans.
  • ACCUMULO-3946 Verified that all user-facing operations contained appropriate audit messages.
  • ACCUMULO-3977 Isolated scans with Iterators in use incorrectly fail around invocation of deepCopy.
  • ACCUMULO-3905 RowDeletingIterator functions incorrectly when columns are provided by the client. This restores intended functionality without the need for a workaround.
  • ACCUMULO-3959 ACCUMULO-3934 Multiple documentation improvements to BatchScanner.

Testing

Each unit and functional test only runs on a single node, while the RandomWalk and Continuous Ingest tests run on any number of nodes. Agitation refers to randomly restarting Accumulo processes and Hadoop Datanode processes, and, in HDFS High-Availability instances, forcing NameNode failover.

OS Hadoop Nodes ZooKeeper HDFS HA Tests
Amazon Linux 2014.09 2.6.0 20 3.4.5 No ContinuousIngest w/ verification w/ and w/o agitation (37B entries)

View all releases in the archive