Apache Accumulo 1.10.0

03 Sep 2020

About

Apache Accumulo 1.10.0 is a continuation of the 1.x release line, and is essentially the next maintenance release of 1.8/1.9, following the 1.9.3 version with some small additional internal improvements. Earlier 1.x versions are now superseded by this maintenance release, and will no longer be maintained.

The semver minor version number increase (1.9 to 1.10) signals that this release is backwards compatible with previous minor releases (1.8 and 1.9). Rather than API additions, the primary reason for this minor version increase is due to the decision to make Java 8 the minimum supported Java version (see below for more).

This release contains contributions from more than 13 contributors from the Apace Accumulo community in over 80 commits and 16 months of work since the 1.9.3 release. These release notes are highlights of those changes. The full detailed changes can be seen in the git history. If anything is missing from this list, please contact us to have it included.

According to the Long Term Maintenance (LTM) strategy, the intent is to maintain the 1.10 release line with critical bug and security fixes until one year after the next LTM version is released. However, this is anticipated to be the final 1.x legacy release, so it is not expected to receive any new features or significant non-critical updates, so users wanting new features should plan to upgrade to a 2.x release, where new feature development is still being done.

Users of 1.9.3 or earlier are urged to upgrade to 1.10.0 as soon as it is available, as this is a continuation of the 1.9 maintenance line. and to consider migrating to a 2.x version when a suitable one becomes available. Accumulo 2.0.0 is currently available, and 2.1.0 is anticipated to be the next LTM release. If you would like to start preparing for 2.1.0 now, one way to do this is to start building and testing the next version of your software against Accumulo 2.0.0 because all 2.x releases should be backwards compatible with 2.0.0, following semantic versioning.

Minimum Requirements

The versions mentioned here are a guide only. It is not expected that our convenience binary tarball will work out-of-the-box with your particular environment, and some responsibility is placed on users to properly configure Accumulo, or even patch and rebuild it from source, for their particular environment.

Please contact us or file a bug report if you have trouble with a specific version or wish to seek tips. Be prepared to provide details of the problems you encounter, as well as perform some troubleshooting steps of your own, in order to get the best response.

Java 8

Java 8 is now the minimum supported Java version, and it is designed to work on Java 11, as well. To build the project from source, Java 11 or later is required. Please contact us if you find any bugs on any Java version.

Hadoop 2 or 3

This release has been built using Hadoop 2.6.5, and is expected to work with any Hadoop version 2.6.5 or later. It has also been tested with 3.0.3, and is expected to work with Hadoop 3.0 versions as well. Hadoop 3.1.3, 3.2.1, and 3.3.0 have also been tested with this version, and are known to work (with at least basic functionality) with some class path modifications (specifically, using Guava 27.0-jre instead of the provided 14.0 version).

Particular class path pain points are known to be guava, commons-io, commons-vfs2, and possibly other commons libraries.

ZooKeeper

This release has been built agains ZooKeeper 3.4.14, the latest 3.4 release. It is known to work against 3.5 and 3.6 versions as well, when configured properly.

Major Bug Fixes

Accumulo GC Bug

  • #1314, #1318 Eliminate task creation leak caused by the an additional timed-task created for each accumulo-gc cycle

Bulk Import Concurrency Bug

  • #1153 Prevent multiple threads from working on same bulk file

Prevent Metadata Corruption

  • #1309 Prevent cloning of the metadata table, which could lead to data loss during accumulo-gc for either the clone or the original accumulo.metadata table
  • #1310 Improve GC handling of WALs used by root tablet. If the root tablet had WALs, the GC did not consider them during collection
  • #1379 During GC scans, an error will be thrown if the GC fails consistency checks; added a check to ensure the last tablet was seen

Other Miscellaneous Bug Fixes

  • #1107 Fix ConcurrentModificationException in HostRegexTableLoadBalancer
  • #1185 Fixed a bug where we were using an unauthenticated ZooKeeper client to try to read data with an ACL configured; this was previously permitted until ZooKeeper fixed a security bug in their own code, which revealed our incorrect ZooKeeper client code
  • #1371 Fix a bug in our MapReduce code that prevented some users from reading tables they had valid permissions to read
  • #1401 Display trace information correctly in accumulo-monitor
  • #1478 Don’t ignore the instance and zookeepers parameters on the command-line when running certain utilities
  • #1532 Remove need for ANT on classpath
  • #1555 Fix idempotency bug in importtable
  • #1644 Retry minor compactions to prevent transient iterator issues blocking forever

Major Improvements

Performance Enhancements

  • #990 Avoid multiple threads loading same cache block
  • #1352 Add an option to configure the metadata action after an accumulo-gc cycle using a new property instead of a hard-coded compaction
  • #1462, #1526 Temporarily cache the existence check for recovery WALs, so multiple tablets pointing to the same WAL to avoid expensive redundant checks

Identifying Busy Tablets

TServer Startup and Shutdown Protections

New Metrics

  • #1406 Add GC cycle metrics (file and wal collection) to be reported via the hadoop2 metrics. This exposes the gc cycle metrics available in the monitor to external metrics systems and includes run time for the new gc post operation (compact, flush)
    • Enable with new property, gc.metrics.enabled
      • AccGcCandidates - number of candidates for GC
      • AccGcDeleted - number of candidates deleted
      • AccGcErrors - number of deletion errors
      • AccGcFinished - timestamp of GC cycle finished
      • AccGcInUse - number of candidates still in use
      • AccGcPostOpDuration - duration of compact / flush
      • AccGcRunCycleCount - 1-up cycle count
      • AccGcStarted - timestamp of GC cycle start
      • AccGcWalCandidates - number of WAL candidates for collection
      • AccGcWalDeleted - number of WALs deleted
      • AccGcWalErrors - number of errors during WAL deletion
      • AccGcWalFinished - timestamp of WAL collection completion
      • AccGcWalInUse - number of WALs in use
      • AccGcWalStarted - timestamp of WAL collection start

Other Miscellaneous Improvements

  • #1108 Improve logging when ZooKeeper session expires
  • #1299 Add optional -t tablename to importdirectory shell command
  • #1338 Reduce verbose logging of merge operations in Master log
  • #1475 Option to leave cloned tables offline on creation
  • #1503 Support ZooKeeper 3.5 (and later), in addition to 3.4

View all releases in the archive