Apache Accumulo 1.7.0
18 May 2015
Please check our release archive for a newer version.
Apache Accumulo 1.7.0 is a significant release that includes many important milestone features which expand the functionality of Accumulo. These include features related to security, availability, and extensibility. Nearly 700 JIRA issues were resolved in this version. Approximately two-thirds were bugs and one-third were improvements.
Below are resources for this release:
In the context of Accumulo’s Semantic Versioning guidelines, this is a “minor version”. This means that new APIs have been created, some deprecations may have been added, but no deprecated APIs have been removed. Code written against 1.6.x should work against 1.7.0, likely binary-compatible but definitely source-compatible. As always, the Accumulo developers take API compatibility very seriously and have invested much time to ensure that we meet the promises set forth to our users.
Major Changes
Updated Minimum Requirements
Apache Accumulo 1.7.0 comes with an updated set of minimum requirements.
- Java7 is required. Java6 support is dropped.
- Hadoop 2.2.0 or greater is required. Hadoop 1.x support is dropped.
- ZooKeeper 3.4.x or greater is required.
Client Authentication with Kerberos
Kerberos is the de-facto means to provide strong authentication across Hadoop and other related components. Kerberos requires a centralized key distribution center to authentication users who have credentials provided by an administrator. When Hadoop is configured for use with Kerberos, all users must provide Kerberos credentials to interact with the filesystem, launch YARN jobs, or even view certain web pages.
While Accumulo has long supported operating on Kerberos-enabled HDFS, it still required Accumulo users to use password-based authentication to authenticate with Accumulo. ACCUMULO-2815 added support for allowing Accumulo clients to use the same Kerberos credentials to authenticate to Accumulo that they would use to authenticate to other Hadoop components, instead of a separate user name and password just for Accumulo.
This authentication leverages Simple Authentication and Security Layer (SASL) and GSSAPI to support Kerberos authentication over the existing Apache Thrift-based RPC infrastructure that Accumulo employs.
These additions represent a significant forward step for Accumulo, bringing its client-authentication up to speed with the rest of the Hadoop ecosystem. This results in a much more cohesive authentication story for Accumulo that resonates with the battle-tested cell-level security and authorization model already familiar to Accumulo users.
More information on configuration, administration, and application of Kerberos client authentication can be found in the Kerberos chapter of the Accumulo User Manual.
Data-Center Replication
In previous releases, Accumulo only operated within the constraints of a single installation. Because single instances of Accumulo often consist of many nodes and Accumulo’s design scales (near) linearly across many nodes, it is typical that one Accumulo is run per physical installation or data-center. ACCUMULO-378 introduces support in Accumulo to automatically copy data from one Accumulo instance to another.
This data-center replication feature is primarily applicable to users wishing to implement a disaster recovery strategy. Data can be automatically copied from a primary instance to one or more other Accumulo instances. In contrast to normal Accumulo operation, in which ingest and query are strongly consistent, data-center replication is a lazy, eventually consistent operation. This is desirable for replication, as it prevents additional latency for ingest operations on the primary instance. Additionally, the implementation of this feature can sustain prolonged outages between the primary instance and replicas without any administrative overhead.
The Accumulo User Manual contains a new chapter on replication which details the design and implementation of the feature, explains how users can configure replication, and describes special cases to consider when choosing to integrate the feature into a user application.
User-Initiated Compaction Strategies
Per-table compaction strategies were added in 1.6.0 to provide custom logic to decide which files are involved in a major compaction. In 1.7.0, the ability to specify a compaction strategy for a user-initiated compaction was added in ACCUMULO-1798. This allows surgical compactions on a subset of tablet files. Previously, a user-initiated compaction would compact all files in a tablet.
In the Java API, this new feature can be accessed in the following way:
Connection conn = ...
CompactionStrategyConfig csConfig = new CompactionStrategyConfig(strategyClassName).setOptions(strategyOpts);
CompactionConfig compactionConfig = new CompactionConfig().setCompactionStrategy(csConfig);
connector.tableOperations().compact(tableName, compactionConfig)
In ACCUMULO-3134, the shell’s compact
command was modified
to enable selecting which files to compact based on size, name, and path.
Options were also added to the shell’s compaction command to allow setting
RFile options for the compaction output. Setting the output options could be
useful for testing. For example, one tablet to be compacted using snappy
compression.
The following is an example shell command that compacts all files less than 10MB, if the tablet has at least two files that meet this criteria. If a tablet had a 100MB, 50MB, 7MB, and 5MB file then the 7MB and 5MB files would be compacted. If a tablet had a 100MB and 5MB file, then nothing would be done because there are not at least two files meeting the selection criteria.
compact -t foo --min-files 2 --sf-lt-esize 10M
The following is an example shell command that compacts all bulk imported files in a table.
compact -t foo --sf-ename I.*
These provided convenience options to select files execute using a specialized
compaction strategy. Options were also added to the shell to specify an
arbitrary compaction strategy. The option to specify an arbitrary compaction
strategy is mutually exclusive with the file selection and file creation
options, since those options are unique to the specialized compaction strategy
provided. See compact --help
in the shell for the available options.
API Clarification
The declared API in 1.6.x was incomplete. Some important classes like ColumnVisibility were not declared as Accumulo API. Significant work was done under ACCUMULO-3657 to correct the API statement and clean up the API to be representative of all classes which users are intended to interact with. The expanded and simplified API statement is in the README.
In some places in the API, non-API types were used. Ideally, public API
members would only use public API types. A tool called APILyzer
was created to find all API members that used non-API types. Many of the
violations found by this tool were deprecated to clearly communicate that a
non-API type was used. One example is a public API method that returned a
class called KeyExtent
. KeyExtent
was never intended to be in the public
API because it contains code related to Accumulo internals. KeyExtent
and
the API methods returning it have since been deprecated. These were replaced
with a new class for identifying tablets that does not expose internals.
Deprecating a type like this from the API makes the API more stable while also
making it easier for contributors to change Accumulo internals without
impacting the API.
The changes in ACCUMULO-3657 also included an Accumulo API regular expression for use with checkstyle. Starting with 1.7.0, projects building on Accumulo can use this checkstyle rule to ensure they are only using Accumulo’s public API. The regular expression can be found in the README.
Performance Improvements
Configurable Threadpool Size for Assignments
During start-up, the Master quickly assigns tablets to Tablet Servers. However, Tablet Servers load those assigned tablets one at a time. In 1.7, the servers will be more aggressive, and will load tablets in parallel, so long as they do not have mutations that need to be recovered.
ACCUMULO-1085 allows the size of the threadpool used in the Tablet Servers for assignment processing to be configurable.
Group-Commit Threshold as a Factor of Data Size
When ingesting data into Accumulo, the majority of time is spent in the
write-ahead log. As such, this is a common place that optimizations are added.
One optimization is known as “group-commit”. When multiple clients are
writing data to the same Accumulo tablet, it is not efficient for each of them
to synchronize the WAL, flush their updates to disk for durability, and then
release the lock. The idea of group-commit is that multiple writers can queue
the write for their mutations to the WAL and then wait for a sync that will
satisfy the durability constraints of their batch of updates. This has a
drastic improvement on performance, since many threads writing batches
concurrently can “share” the same fsync
.
In previous versions, Accumulo controlled the frequency in which this
group-commit sync was performed as a factor of the number of clients writing
to Accumulo. This was both confusing to correctly configure and also
encouraged sub-par performance with few write threads.
ACCUMULO-1950 introduced a new configuration property
tserver.total.mutation.queue.max
which defines the amount of data that is
queued before a group-commit is performed in such a way that is agnostic of
the number of writers. This new configuration property is much easier to
reason about than the previous (now deprecated) tserver.mutation.queue.max
.
Users who have set tserver.mutation.queue.max
in the past are encouraged
to start using the new tserver.total.mutation.queue.max
property.
Other improvements
Balancing Groups of Tablets
By default, Accumulo evenly spreads each table’s tablets across a cluster. In some situations, it is advantageous for query or ingest to evenly spreads groups of tablets within a table. For ACCUMULO-3439, a new balancer was added to evenly spread groups of tablets to optimize performance. This blog post provides more details about when and why users may desire to leverage this feature..
User-specified Durability
Accumulo constantly tries to balance durability with performance. Guaranteeing
durability of every write to Accumulo is very difficult in a
massively-concurrent environment that requires high throughput. One common
area of focus is the write-ahead log, since it must eventually call fsync
on
the local filesystem to guarantee that data written is durable in the face of
unexpected power failures. In some cases where durability can be sacrificed,
either due to the nature of the data itself or redundant power supplies,
ingest performance improvements can be attained.
Prior to 1.7, a user could only configure the level of durability for
individual tables. With the implementation of ACCUMULO-1957,
the durability can be specified by the user when creating a BatchWriter
,
giving users control over durability at the level of the individual writes.
Every Mutation
written using that BatchWriter
will be written with the
provided durability. This can result in substantially faster ingest rates when
the durability can be relaxed.
waitForBalance API
When creating a new Accumulo table, the next step is typically adding splits to that table before starting ingest. This can be extremely important since a table without any splits will only be hosted on a single tablet server and create a ingest bottleneck until the table begins to naturally split. Adding many splits before ingesting will ensure that a table is distributed across many servers and result in high throughput when ingest first starts.
Adding splits to a table has long been a synchronous operation, but the
assignment of those splits was asynchronous. A large number of splits could be
processed, but it was not guaranteed that they would be evenly distributed
resulting in the same problem as having an insufficient number of splits.
ACCUMULO-2998 adds a new method to InstanceOperations
which
allows users to wait for all tablets to be balanced. This method lets users
wait until tablets are appropriately distributed so that ingest can be run at
full-bore immediately.
Hadoop Metrics2 Support
Accumulo has long had its own metrics system implemented using Java MBeans. This enabled metrics to be reported by Accumulo services, but consumption by other systems often required use of an additional tool like jmxtrans to read the metrics from the MBeans and send them to some other system.
ACCUMULO-1817 replaces this custom metrics system Accumulo with Hadoop Metrics2. Metrics2 has a number of benefits, the most common of which is invalidating the need for an additional process to send metrics to common metrics storage and visualization tools. With Metrics2 support, Accumulo can send its metrics to common tools like Ganglia and Graphite.
For more information on enabling Hadoop Metrics2, see the Metrics Chapter in the Accumulo User Manual.
Distributed Tracing with HTrace
HTrace has recently started gaining traction as a standalone project, especially with its adoption in HDFS. Accumulo has long had distributed tracing support via its own “Cloudtrace” library, but this wasn’t intended for use outside of Accumulo.
ACCUMULO-898 replaces Accumulo’s Cloudtrace code with HTrace. This has the benefit of adding timings (spans) from HDFS into Accumulo spans automatically.
Users who inspect traces via the Accumulo Monitor (or another system) will begin to see timings from HDFS during operations like Major and Minor compactions when running with at least Apache Hadoop 2.6.0.
VERSIONS file present in binary distribution
In the pre-built binary distribution or distributions built by users from the
official source release, users will now see a VERSIONS
file present in the
lib/
directory alongside the Accumulo server-side jars. Because the created
tarball strips off versions from the jar file names, it can require extra work
to actually find what the version of each dependent jar (typically inspecting
the jar’s manifest).
ACCUMULO-2863 adds a VERSIONS
file to the lib/
directory
which contains the Maven groupId, artifactId, and verison (GAV) information for
each jar file included in the distribution.
Per-Table Volume Chooser
The VolumeChooser
interface is a server-side extension point that allows user
tables to provide custom logic in choosing where its files are written when
multiple HDFS instances are available. By default, a randomized volume chooser
implementation is used to evenly balance files across all HDFS instances.
Previously, this VolumeChooser logic was instance-wide which meant that it would
affect all tables. This is potentially undesirable as it might unintentionally
impact other users in a multi-tenant system. ACCUMULO-3177
introduces a new per-table property which supports configuration of a
VolumeChooser
. This ensures that the implementation to choose how HDFS
utilization happens when multiple are available is limited to the expected
subset of all tables.
Table and namespace custom properties
In order to avoid errors caused by mis-typed configuration properties, Accumulo was strict about which configuration properties
could be set. However, this prevented users from setting arbitrary properties that could be used by custom balancers, compaction
strategies, volume choosers, and iterators. Under ACCUMULO-2841, the ability to set arbitrary table and
namespace properties was added. The properties need to be prefixed with table.custom.
. The changes made in
ACCUMULO-3177 and ACCUMULO-3439 leverage this new feature.
Notable Bug Fixes
SourceSwitchingIterator Deadlock
An instance of SourceSwitchingIterator, the Accumulo iterator which transparently manages whether data for a tablet read from memory (the in-memory map) or disk (HDFS after a minor compaction), was found deadlocked in a production system.
This deadlock prevented the scan and the minor compaction from ever successfully completing without restarting the tablet server. ACCUMULO-3745 fixes the inconsistent synchronization inside of the SourceSwitchingIterator to prevent this deadlock from happening in the future.
The only mitigation of this bug was to restart the tablet server that is deadlocked.
Table flush blocked indefinitely
While running the Accumulo RandomWalk distributed test, it was observed that all activity in Accumulo had stopped and there was an offline Accumulo metadata table tablet. The system first tried to flush a user tablet, but the metadata table was not online (likely due to the agitation process which stops and starts Accumulo processes during the test). After this call, a call to load the metadata tablet was queued but could not complete until the previous flush call. Thus, a deadlock occurred.
This deadlock happened because the synchronous flush call could not complete before the load tablet call completed, but the load tablet call couldn’t run because of connection caching we perform in Accumulo’s RPC layer to reduce the quantity of sockets we need to create to send data. ACCUMULO-3597 prevents this deadlock by forcing the use of a non-cached connection for the RPC message requesting a metadata tablet to be loaded.
While this feature does result in additional network resources to be used, the concern is minimal because the number of metadata tablets is typically very small with respect to the total number of tablets in the system.
The only mitigation of this bug was to restart the tablet server that is hung.
Testing
Each unit and functional test only runs on a single node, while the RandomWalk and Continuous Ingest tests run on any number of nodes. Agitation refers to randomly restarting Accumulo processes and Hadoop DataNode processes, and, in HDFS High-Availability instances, forcing NameNode fail-over.
During testing, multiple Accumulo developers noticed some stability issues
with HDFS using Apache Hadoop 2.6.0 when restarting Accumulo processes and
HDFS datanodes. The developers investigated these issues as a part of the
normal release testing procedures, but were unable to find a definitive cause
of these failures. Users are encouraged to follow
ACCUMULO-2388 if they wish to follow any future developments.
One possible workaround is to increase the general.rpc.timeout
in the
Accumulo configuration from 120s
to 240s
.
OS | Hadoop | Nodes | ZooKeeper | HDFS HA | Tests |
---|---|---|---|---|---|
Gentoo | N/A | 1 | N/A | No | Unit and Integration Tests |
Gentoo | 2.6.0 | 1 (2 TServers) | 3.4.5 | No | 24hr CI w/ agitation and verification, 24hr RW w/o agitation. |
Centos 6.6 | 2.6.0 | 3 | 3.4.6 | No | 24hr RW w/ agitation, 24hr CI w/o agitation, 72hr CI w/ and w/o agitation |
Amazon Linux | 2.6.0 | 20 m1large | 3.4.6 | No | 24hr CI w/o agitation |
View all releases in the archive