Apache Accumulo 1.9.3

10 Apr 2019

Apache Accumulo 1.9.3 contains bug fixes for Write Ahead Logs and compaction. Users of 1.9.2 are encouraged to upgrade.

  • User Manual - In-depth developer and administrator documentation
  • Javadocs - Accumulo 1.9 API
  • Examples - Code with corresponding readme files that give step by step instructions for running example code

Notable Changes

Multiple Fixes for Write Ahead Logs

This release fixes Write Ahead Logs issues that slow or prevent recovery and in some cases lead to data loss. The fixes reduce the number of WALS referenced by a tserver, improve error handing, and improve clean up.

  • Eliminates a race condition that could result in data loss during recovery. If the GC deletes unreferenced WALs from ZK while the master is reading recovery WALs from ZK, the master may skip WALs it should not, resulting in data loss. Fixed in #866.

  • Opening a new WAL in DFS may fail, but still be advertised in ZK. This could result in a missing WAL during recovery, preventing tablets from loading. There is no data loss in this case, just WAL references that should not exists. Reported in #949 and fixed in #1005.

  • tserver failures could result in many empty WALs that unnecessarily slow recovery. This was fixed in #823.

  • Some write patterns caused tservers to unnecessarily reference a lot of WALs, which could slow any recovery. In #854 the max WALs referenced was limited regardless of the write pattern, avoiding long recovery times.

  • During tablet recovery, filter out logs that do not define the tablet. #881

  • If a tserver fails sorting, a marker file is written to the recovery directory. This marker prevents any subsequent recovery attempts from succeeding. Fixed by modifying the WAL RecoveryLogReader to handle failed file markers in #961.

  • Improve performance of serializing mutations to a WAL by avoiding frequent synchronization. #669

Multiple Fixes for Compaction Issues

  • Stop locking during compaction. Compactions acquired the tablet lock between each key value. This created unnecessary contention with other operations like scan and bulk imports. The synchronization was removed #1031.

  • Only re-queue compaction when there is activity. #759

Fix ArrayOutOfBounds error when new files are created (affects all previous versions)

If the 7 digit base 36 number used to name files attempted to go to 8 digits, then compactions would fail. This was fixed in #562.

Updated Master Metrics to include FATE metrics.

Added master metrics to provide a snapshot of current FATE operations. The metrics added:

  • the number of current FATE transactions in progress,
  • the count of child operations that have occurred on the zookeeper FATE node
  • a count of zookeeper connection errors when the snapshot is taken.

The number of child operations provides a light-weight surrogate for FATE transaction progression between snapshots. The metrics are controlled with the following properties:

  • master.fate.metrics.enabled - default to false preserve current metric reporting
  • master.fate.metrics.min.update.interval - default to 60s - there is a hard limit of 10s.

When enabled, the metrics are published to JMX and can optionally be configured using standard hadoop metrics2 configuration files.

Fixed issues with Native Maps with libstdc++ 8.2 and higher

Versions of libstdc++ 8.2 and higher triggered errors within within the native map code. This release fixes issues #767, #769, #1064 , and #1070 .

Fixed splitting tablets with files and no data

The split code assumed that if a tablet had files that it had data in those files. There are some edge case where this is not true. Updated the split code to handle this #998.

Log when a scan waits a long time for files.

Accumulo has a configurable limit on the max number of files open in a tserver for all scans. When too many files are open, scans must wait. In #978 and #981 scans that wait too long for files now log a message.

Fixed race condition in table existence check.

The Accumulo client code that checks if tables exists had a race condition. The race was fixed in #768 and #973

Support running Mini Accumulo using Java 11

Mini Accumulo made some assumptions about classloaders that were no longer true in Java 11. This caused Mini to fail in Java 11. In #924 Mini was updated to work with Java 11, while still working with Java 7 and 8.

Fixed issue with improperly configured Snappy

If snappy was configured and the snappy libraries were not available then minor compactions could hang forever. In #920 and #925 this was fixed and minor compactions will proceed when a different compression is configured.

Handle bad locality group config.

Improperly configured locality groups could cause a tablet to become inoperative. This was fixed in #819 and #840.

Fixed bulk import race condition.

There was a race condition in bulk import that could result in files being imported after a bulk import transaction had completed. In the worst case these files were already compacted and garbage collected. This would cause a tablet to have a reference to a file that did not exists. No data would have been lost, but it would cause scans to fail. The race was fixed in #800 and #837

Fixed issue with HostRegexTableLoadBalancer

This addresses an issue when using the HostRegexTableLoadBalancer when the default pool is empty. The load balancer will not assign the tablets at all. Here, we select a random pool to assign the tablets to. This behavior is on by default in the HostRegexTableLoadBalancer but can be disabled via HostRegexTableLoadBalancer configuration setting table.custom.balancer.host.regex.HostTableLoadBalancer.ALL Fixed in #691 - backported to 1.9 in #710

Update to libthrift version

The packaged, binary tarball contains updated version of libthrift to version 0.9.3-1 to address thrift CVE. Issue #1029

Upgrading

View the Upgrading Accumulo documentation for guidance.

View all releases in the archive