Apache Accumulo 1.10.0
03 Sep 2020
Please check our release archive for a newer version.
About
Apache Accumulo 1.10.0 is a continuation of the 1.x release line, and is essentially the next maintenance release of 1.8/1.9, following the 1.9.3 version with some small additional internal improvements. Earlier 1.x versions are now superseded by this maintenance release, and will no longer be maintained.
The semver minor version number increase (1.9 to 1.10) signals that this release is backwards compatible with previous minor releases (1.8 and 1.9). Rather than API additions, the primary reason for this minor version increase is due to the decision to make Java 8 the minimum supported Java version (see below for more).
This release contains contributions from more than 13 contributors from the Apace Accumulo community in over 80 commits and 16 months of work since the 1.9.3 release. These release notes are highlights of those changes. The full detailed changes can be seen in the git history. If anything is missing from this list, please contact us to have it included.
According to the Long Term Maintenance (LTM) strategy, the intent is to maintain the 1.10 release line with critical bug and security fixes until one year after the next LTM version is released. However, this is anticipated to be the final 1.x legacy release, so it is not expected to receive any new features or significant non-critical updates, so users wanting new features should plan to upgrade to a 2.x release, where new feature development is still being done.
Users of 1.9.3 or earlier are urged to upgrade to 1.10.0 as soon as it is available, as this is a continuation of the 1.9 maintenance line. and to consider migrating to a 2.x version when a suitable one becomes available. Accumulo 2.0.0 is currently available, and 2.1.0 is anticipated to be the next LTM release. If you would like to start preparing for 2.1.0 now, one way to do this is to start building and testing the next version of your software against Accumulo 2.0.0 because all 2.x releases should be backwards compatible with 2.0.0, following semantic versioning.
Minimum Requirements
The versions mentioned here are a guide only. It is not expected that our convenience binary tarball will work out-of-the-box with your particular environment, and some responsibility is placed on users to properly configure Accumulo, or even patch and rebuild it from source, for their particular environment.
Please contact us or file a bug report if you have trouble with a specific version or wish to seek tips. Be prepared to provide details of the problems you encounter, as well as perform some troubleshooting steps of your own, in order to get the best response.
Java 8
Java 8 is now the minimum supported Java version, and it is designed to work on Java 11, as well. To build the project from source, Java 11 or later is required. Please contact us if you find any bugs on any Java version.
Hadoop 2 or 3
This release has been built using Hadoop 2.6.5, and is expected to work with any Hadoop version 2.6.5 or later. It has also been tested with 3.0.3, and is expected to work with Hadoop 3.0 versions as well. Hadoop 3.1.3, 3.2.1, and 3.3.0 have also been tested with this version, and are known to work (with at least basic functionality) with some class path modifications (specifically, using Guava 27.0-jre instead of the provided 14.0 version).
Particular class path pain points are known to be guava, commons-io, commons-vfs2, and possibly other commons libraries.
ZooKeeper
This release has been built agains ZooKeeper 3.4.14, the latest 3.4 release. It is known to work against 3.5 and 3.6 versions as well, when configured properly.
Major Bug Fixes
Accumulo GC Bug
- #1314, #1318 Eliminate task creation leak caused by the an
additional timed-task created for each
accumulo-gc
cycle
Bulk Import Concurrency Bug
- #1153 Prevent multiple threads from working on same bulk file
Prevent Metadata Corruption
- #1309 Prevent cloning of the metadata table, which could lead to
data loss during
accumulo-gc
for either the clone or the originalaccumulo.metadata
table - #1310 Improve GC handling of WALs used by root tablet. If the root tablet had WALs, the GC did not consider them during collection
- #1379 During GC scans, an error will be thrown if the GC fails consistency checks; added a check to ensure the last tablet was seen
Other Miscellaneous Bug Fixes
- #1107 Fix
ConcurrentModificationException
inHostRegexTableLoadBalancer
- #1185 Fixed a bug where we were using an unauthenticated ZooKeeper client to try to read data with an ACL configured; this was previously permitted until ZooKeeper fixed a security bug in their own code, which revealed our incorrect ZooKeeper client code
- #1371 Fix a bug in our MapReduce code that prevented some users from reading tables they had valid permissions to read
- #1401 Display trace information correctly in
accumulo-monitor
- #1478 Don’t ignore the instance and zookeepers parameters on the command-line when running certain utilities
- #1532 Remove need for ANT on classpath
- #1555 Fix idempotency bug in importtable
- #1644 Retry minor compactions to prevent transient iterator issues blocking forever
Major Improvements
Performance Enhancements
- #990 Avoid multiple threads loading same cache block
- #1352 Add an option to configure the metadata action after an
accumulo-gc
cycle using a new property instead of a hard-coded compaction - #1462, #1526 Temporarily cache the existence check for recovery WALs, so multiple tablets pointing to the same WAL to avoid expensive redundant checks
Identifying Busy Tablets
- #1291, #1296 Log busy tablets by ingest and query at configurable intervals for better hot-spot detection using new properties
TServer Startup and Shutdown Protections
- #1158 Require a configurable number of servers to be online, up to a max wait time, before assignments begin on startup
- #1456 Throttle the number of shutdown requests sent to tservers to prevent cluster self-destruction and give time for triage
New Metrics
- #1406 Add GC cycle metrics (file and wal collection) to be reported
via the hadoop2 metrics. This exposes the gc cycle metrics available in the
monitor to external metrics systems and includes run time for the new gc post
operation (compact, flush)
- Enable with new property, gc.metrics.enabled
AccGcCandidates
- number of candidates for GCAccGcDeleted
- number of candidates deletedAccGcErrors
- number of deletion errorsAccGcFinished
- timestamp of GC cycle finishedAccGcInUse
- number of candidates still in useAccGcPostOpDuration
- duration of compact / flushAccGcRunCycleCount
- 1-up cycle countAccGcStarted
- timestamp of GC cycle startAccGcWalCandidates
- number of WAL candidates for collectionAccGcWalDeleted
- number of WALs deletedAccGcWalErrors
- number of errors during WAL deletionAccGcWalFinished
- timestamp of WAL collection completionAccGcWalInUse
- number of WALs in useAccGcWalStarted
- timestamp of WAL collection start
- Enable with new property, gc.metrics.enabled
Other Miscellaneous Improvements
- #1108 Improve logging when ZooKeeper session expires
- #1299 Add optional
-t
tablename to importdirectory shell command - #1338 Reduce verbose logging of merge operations in Master log
- #1475 Option to leave cloned tables offline on creation
- #1503 Support ZooKeeper 3.5 (and later), in addition to 3.4
Useful Links
View all releases in the archive