Copyright © 2011-2016 The Apache Software Foundation, Licensed under the Apache License, Version 2.0. Apache Accumulo, Accumulo, Apache, and the Apache Accumulo project logo are trademarks of the Apache Software Foundation.

1. Introduction

Apache Accumulo is a highly scalable structured store based on Google’s BigTable. Accumulo is written in Java and operates over the Hadoop Distributed File System (HDFS), which is part of the popular Apache Hadoop project. Accumulo supports efficient storage and retrieval of structured data, including queries for ranges, and provides support for using Accumulo tables as input and output for MapReduce jobs.

Accumulo features automatic load-balancing and partitioning, data compression and fine-grained security labels.

2. Accumulo Design

2.1. Data Model

Accumulo provides a richer data model than simple key-value stores, but is not a fully relational database. Data is represented as key-value pairs, where the key and value are comprised of the following elements:

 Key Value Row ID Column Timestamp Family Qualifier Visibility

All elements of the Key and the Value are represented as byte arrays except for Timestamp, which is a Long. Accumulo sorts keys by element and lexicographically in ascending order. Timestamps are sorted in descending order so that later versions of the same Key appear first in a sequential scan. Tables consist of a set of sorted key-value pairs.

2.2. Architecture

Accumulo is a distributed data storage and retrieval system and as such consists of several architectural components, some of which run on many individual servers. Much of the work Accumulo does involves maintaining certain properties of the data, such as organization, availability, and integrity, across many commodity-class machines.

2.3. Components

An instance of Accumulo includes many TabletServers, one Garbage Collector process, one Master server and many Clients.

2.3.1. Tablet Server

The TabletServer manages some subset of all the tablets (partitions of tables). This includes receiving writes from clients, persisting writes to a write-ahead log, sorting new key-value pairs in memory, periodically flushing sorted key-value pairs to new files in HDFS, and responding to reads from clients, forming a merge-sorted view of all keys and values from all the files it has created and the sorted in-memory store.

TabletServers also perform recovery of a tablet that was previously on a server that failed, reapplying any writes found in the write-ahead log to the tablet.

2.3.2. Garbage Collector

Accumulo processes will share files stored in HDFS. Periodically, the Garbage Collector will identify files that are no longer needed by any process, and delete them. Multiple garbage collectors can be run to provide hot-standby support. They will perform leader election among themselves to choose a single active instance.

2.3.3. Master

The Accumulo Master is responsible for detecting and responding to TabletServer failure. It tries to balance the load across TabletServer by assigning tablets carefully and instructing TabletServers to unload tablets when necessary. The Master ensures all tablets are assigned to one TabletServer each, and handles table creation, alteration, and deletion requests from clients. The Master also coordinates startup, graceful shutdown and recovery of changes in write-ahead logs when Tablet servers fail.

Multiple masters may be run. The masters will choose among themselves a single master, and the others will become backups if the master should fail.

2.3.4. Tracer

The Accumulo Tracer process supports the distributed timing API provided by Accumulo. One to many of these processes can be run on a cluster which will write the timing information to a given Accumulo table for future reference. Seeing the section on Tracing for more information on this support.

2.3.5. Monitor

The Accumulo Monitor is a web application that provides a wealth of information about the state of an instance. The Monitor shows graphs and tables which contain information about read/write rates, cache hit/miss rates, and Accumulo table information such as scan rate and active/queued compactions. Additionally, the Monitor should always be the first point of entry when attempting to debug an Accumulo problem as it will show high-level problems in addition to aggregated errors from all nodes in the cluster. See the section on Monitoring for more information.

Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active at one time. Leader election will be performed internally to choose the active Monitor.

2.3.6. Client

Accumulo includes a client library that is linked to every application. The client library contains logic for finding servers managing a particular tablet, and communicating with TabletServers to write and retrieve key-value pairs.

2.4. Data Management

Accumulo stores data in tables, which are partitioned into tablets. Tablets are partitioned on row boundaries so that all of the columns and values for a particular row are found together within the same tablet. The Master assigns Tablets to one TabletServer at a time. This enables row-level transactions to take place without using distributed locking or some other complicated synchronization mechanism. As clients insert and query data, and as machines are added and removed from the cluster, the Master migrates tablets to ensure they remain available and that the ingest and query load is balanced across the cluster.

2.5. Tablet Service

When a write arrives at a TabletServer it is written to a Write-Ahead Log and then inserted into a sorted data structure in memory called a MemTable. When the MemTable reaches a certain size, the TabletServer writes out the sorted key-value pairs to a file in HDFS called a Relative Key File (RFile), which is a kind of Indexed Sequential Access Method (ISAM) file. This process is called a minor compaction. A new MemTable is then created and the fact of the compaction is recorded in the Write-Ahead Log.

When a request to read data arrives at a TabletServer, the TabletServer does a binary search across the MemTable as well as the in-memory indexes associated with each RFile to find the relevant values. If clients are performing a scan, several key-value pairs are returned to the client in order from the MemTable and the set of RFiles by performing a merge-sort as they are read.

2.6. Compactions

In order to manage the number of files per tablet, periodically the TabletServer performs Major Compactions of files within a tablet, in which some set of RFiles are combined into one file. The previous files will eventually be removed by the Garbage Collector. This also provides an opportunity to permanently remove deleted key-value pairs by omitting key-value pairs suppressed by a delete entry when the new file is created.

2.7. Splitting

When a table is created it has one tablet. As the table grows its initial tablet eventually splits into two tablets. Its likely that one of these tablets will migrate to another tablet server. As the table continues to grow, its tablets will continue to split and be migrated. The decision to automatically split a tablet is based on the size of a tablets files. The size threshold at which a tablet splits is configurable per table. In addition to automatic splitting, a user can manually add split points to a table to create new tablets. Manually splitting a new table can parallelize reads and writes giving better initial performance without waiting for automatic splitting.

As data is deleted from a table, tablets may shrink. Over time this can lead to small or empty tablets. To deal with this, merging of tablets was introduced in Accumulo 1.4. This is discussed in more detail later.

2.8. Fault-Tolerance

If a TabletServer fails, the Master detects it and automatically reassigns the tablets assigned from the failed server to other servers. Any key-value pairs that were in memory at the time the TabletServer fails are automatically reapplied from the Write-Ahead Log(WAL) to prevent any loss of data.

Tablet servers write their WALs directly to HDFS so the logs are available to all tablet servers for recovery. To make the recovery process efficient, the updates within a log are grouped by tablet. TabletServers can quickly apply the mutations from the sorted logs that are destined for the tablets they have now been assigned.

TabletServer failures are noted on the Master’s monitor page, accessible via http://master-address:50095/monitor.

3. Accumulo Shell

Accumulo provides a simple shell that can be used to examine the contents and configuration settings of tables, insert/update/delete values, and change configuration settings.

The shell can be started by the following command:

$ACCUMULO_HOME/bin/accumulo shell -u [username] The shell will prompt for the corresponding password to the username specified and then display the following prompt: Shell - Apache Accumulo Interactive Shell - - version 1.6 - instance name: myinstance - instance id: 00000000-0000-0000-0000-000000000000 - - type 'help' for a list of available commands - 3.1. Basic Administration The Accumulo shell can be used to create and delete tables, as well as to configure table and instance specific options. root@myinstance> tables accumulo.metadata accumulo.root root@myinstance> createtable mytable root@myinstance mytable> root@myinstance mytable> tables accumulo.metadata accumulo.root mytable root@myinstance mytable> createtable testtable root@myinstance testtable> root@myinstance testtable> deletetable testtable deletetable { testtable } (yes|no)? yes Table: [testtable] has been deleted. root@myinstance> The Shell can also be used to insert updates and scan tables. This is useful for inspecting tables. root@myinstance mytable> scan root@myinstance mytable> insert row1 colf colq value1 insert successful root@myinstance mytable> scan row1 colf:colq [] value1 The value in brackets “[]” would be the visibility labels. Since none were used, this is empty for this row. You can use the -st option to scan to see the timestamp for the cell, too. 3.2. Table Maintenance The compact command instructs Accumulo to schedule a compaction of the table during which files are consolidated and deleted entries are removed. root@myinstance mytable> compact -t mytable 07 16:13:53,201 [shell.Shell] INFO : Compaction of table mytable started for given range The flush command instructs Accumulo to write all entries currently in memory for a given table to disk. root@myinstance mytable> flush -t mytable 07 16:14:19,351 [shell.Shell] INFO : Flush of table mytable initiated... 3.3. User Administration The Shell can be used to add, remove, and grant privileges to users. root@myinstance mytable> createuser bob Enter new password for 'bob': ********* Please confirm new password for 'bob': ********* root@myinstance mytable> authenticate bob Enter current password for 'bob': ********* Valid root@myinstance mytable> grant System.CREATE_TABLE -s -u bob root@myinstance mytable> user bob Enter current password for 'bob': ********* bob@myinstance mytable> userpermissions System permissions: System.CREATE_TABLE Table permissions (accumulo.metadata): Table.READ Table permissions (mytable): NONE bob@myinstance mytable> createtable bobstable bob@myinstance bobstable> bob@myinstance bobstable> user root Enter current password for 'root': ********* root@myinstance bobstable> revoke System.CREATE_TABLE -s -u bob 4. Writing Accumulo Clients 4.1. Running Client Code There are multiple ways to run Java code that uses Accumulo. Below is a list of the different ways to execute client code. • using java executable • using the accumulo script • using the tool script In order to run client code written to run against Accumulo, you will need to include the jars that Accumulo depends on in your classpath. Accumulo client code depends on Hadoop and Zookeeper. For Hadoop add the hadoop client jar, all of the jars in the Hadoop lib directory, and the conf directory to the classpath. For recent Zookeeper versions, you only need to add the Zookeeper jar, and not what is in the Zookeeper lib directory. You can run the following command on a configured Accumulo system to see what its using for its classpath. $ACCUMULO_HOME/bin/accumulo classpath

Another option for running your code is to put a jar file in $ACCUMULO_HOME/lib/ext. After doing this you can use the accumulo script to execute your code. For example if you create a jar containing the class com.foo.Client and placed that in lib/ext, then you could use the command $ACCUMULO_HOME/bin/accumulo com.foo.Client to execute your code.

If you are writing map reduce job that access Accumulo, then you can use the bin/tool.sh script to run those jobs. See the map reduce example.

4.2. Connecting

All clients must first identify the Accumulo instance to which they will be communicating. Code to do this is as follows:

String instanceName = "myinstance";
String zooServers = "zooserver-one,zooserver-two"
Instance inst = new ZooKeeperInstance(instanceName, zooServers);

Connector conn = inst.getConnector("user", new PasswordToken("passwd"));

The PasswordToken is the most common implementation of an \texttt{AuthenticationToken}. This general interface allow authentication as an Accumulo user to come from a variety of sources or means. The CredentialProviderToken leverages the Hadoop CredentialProviders (new in Hadoop 2.6).

For example, the CredentialProviderToken can be used in conjunction with a Java KeyStore to alleviate passwords stored in cleartext. When stored in HDFS, a single KeyStore can be used across an entire instance. Be aware that KeyStores stored on the local filesystem must be made available to all nodes in the Accumulo cluster.

KerberosToken token = new KerberosToken();
Connector conn = inst.getConnector(token.getPrincipal(), token);

The KerberosToken can be provided to use the authentication provided by Kerberos. Using Kerberos requires external setup and additional configuration, but provides a single point of authentication through HDFS, YARN and ZooKeeper and allowing for password-less authentication with Accumulo.

4.3. Writing Data

Data are written to Accumulo by creating Mutation objects that represent all the changes to the columns of a single row. The changes are made atomically in the TabletServer. Clients then add Mutations to a BatchWriter which submits them to the appropriate TabletServers.

Mutations can be created thus:

Text rowID = new Text("row1");
Text colFam = new Text("myColFam");
Text colQual = new Text("myColQual");
ColumnVisibility colVis = new ColumnVisibility("public");
long timestamp = System.currentTimeMillis();

Value value = new Value("myValue".getBytes());

Mutation mutation = new Mutation(rowID);
mutation.put(colFam, colQual, colVis, timestamp, value);

4.3.1. BatchWriter

The BatchWriter is highly optimized to send Mutations to multiple TabletServers and automatically batches Mutations destined for the same TabletServer to amortize network overhead. Care must be taken to avoid changing the contents of any Object passed to the BatchWriter since it keeps objects in memory while batching.

Mutations are added to a BatchWriter thus:

// BatchWriterConfig has reasonable defaults
BatchWriterConfig config = new BatchWriterConfig();
config.setMaxMemory(10000000L); // bytes available to batchwriter for buffering mutations

BatchWriter writer = conn.createBatchWriter("table", config)

writer.close();

An example of using the batch writer can be found at accumulo/docs/examples/README.batch.

4.3.2. ConditionalWriter

The ConditionalWriter enables efficient, atomic read-modify-write operations on rows. The ConditionalWriter writes special Mutations which have a list of per column conditions that must all be met before the mutation is applied. The conditions are checked in the tablet server while a row lock is held (Mutations written by the BatchWriter will not obtain a row lock). The conditions that can be checked for a column are equality and absence. For example a conditional mutation can require that column A is absent inorder to be applied. Iterators can be applied when checking conditions. Using iterators, many other operations besides equality and absence can be checked. For example, using an iterator that converts values less than 5 to 0 and everything else to 1, its possible to only apply a mutation when a column is less than 5.

In the case when a tablet server dies after a client sent a conditional mutation, its not known if the mutation was applied or not. When this happens the ConditionalWriter reports a status of UNKNOWN for the ConditionalMutation. In many cases this situation can be dealt with by simply reading the row again and possibly sending another conditional mutation. If this is not sufficient, then a higher level of abstraction can be built by storing transactional information within a row.

An example of using the batch writer can be found at accumulo/docs/examples/README.reservations.

4.3.3. Durability

By default, Accumulo writes out any updates to the Write-Ahead Log (WAL). Every change goes into a file in HDFS and is sync’d to disk for maximum durability. In the event of a failure, writes held in memory are replayed from the WAL. Like all files in HDFS, this file is also replicated. Sending updates to the replicas, and waiting for a permanent sync to disk can significantly write speeds.

Accumulo allows users to use less tolerant forms of durability when writing. These levels are:

• none: no durability guarantees are made, the WAL is not used

• log: the WAL is used, but not flushed; loss of the server probably means recent writes are lost

• flush: updates are written to the WAL, and flushed out to replicas; loss of a single server is unlikely to result in data loss.

• sync: updates are written to the WAL, and synced to disk on all replicas before the write is acknowledge. Data will not be lost even if the entire cluster suddenly loses power.

The user can set the default durability of a table in the shell. When writing, the user can configure the BatchWriter or ConditionalWriter to use a different level of durability for the session. This will override the default durability setting.

BatchWriterConfig cfg = new BatchWriterConfig();
// We don't care about data loss with these writes:
// This is DANGEROUS:
cfg.setDurability(Durability.NONE);

Connection conn = ... ;
BatchWriter bw = conn.createBatchWriter(table, cfg);

Accumulo is optimized to quickly retrieve the value associated with a given key, and to efficiently return ranges of consecutive keys and their associated values.

4.4.1. Scanner

To retrieve data, Clients use a Scanner, which acts like an Iterator over keys and values. Scanners can be configured to start and stop at particular keys, and to return a subset of the columns available.

// specify which visibilities we are allowed to see
Authorizations auths = new Authorizations("public");

Scanner scan =
conn.createScanner("table", auths);

scan.setRange(new Range("harry","john"));
scan.fetchColumnFamily(new Text("attributes"));

for(Entry<Key,Value> entry : scan) {
Text row = entry.getKey().getRow();
Value value = entry.getValue();
}

4.4.2. Isolated Scanner

Accumulo supports the ability to present an isolated view of rows when scanning. There are three possible ways that a row could change in Accumulo :

• a mutation applied to a table

• iterators executed as part of a minor or major compaction

• bulk import of new files

Isolation guarantees that either all or none of the changes made by these operations on a row are seen. Use the IsolatedScanner to obtain an isolated view of an Accumulo table. When using the regular scanner it is possible to see a non isolated view of a row. For example if a mutation modifies three columns, it is possible that you will only see two of those modifications. With the isolated scanner either all three of the changes are seen or none.

The IsolatedScanner buffers rows on the client side so a large row will not crash a tablet server. By default rows are buffered in memory, but the user can easily supply their own buffer if they wish to buffer to disk when rows are large.

For an example, look at the following

examples/simple/src/main/java/org/apache/accumulo/examples/simple/isolation/InterferenceTest.java

4.4.3. BatchScanner

For some types of access, it is more efficient to retrieve several ranges simultaneously. This arises when accessing a set of rows that are not consecutive whose IDs have been retrieved from a secondary index, for example.

The BatchScanner is configured similarly to the Scanner; it can be configured to retrieve a subset of the columns available, but rather than passing a single Range, BatchScanners accept a set of Ranges. It is important to note that the keys returned by a BatchScanner are not in sorted order since the keys streamed are from multiple TabletServers in parallel.

ArrayList<Range> ranges = new ArrayList<Range>();
// populate list of ranges ...

BatchScanner bscan =
conn.createBatchScanner("table", auths, 10);
bscan.setRanges(ranges);
bscan.fetchColumnFamily("attributes");

for(Entry<Key,Value> entry : bscan) {
System.out.println(entry.getValue());
}

An example of the BatchScanner can be found at accumulo/docs/examples/README.batch.

4.5. Proxy

The proxy API allows the interaction with Accumulo with languages other than Java. A proxy server is provided in the codebase and a client can further be generated. The proxy API can also be used instead of the traditional ZooKeeperInstance class to provide a single TCP port in which clients can be securely routed through a firewall, without requiring access to all tablet servers in the cluster.

4.5.1. Prerequisites

The proxy server can live on any node in which the basic client API would work. That means it must be able to communicate with the Master, ZooKeepers, NameNode, and the DataNodes. A proxy client only needs the ability to communicate with the proxy server.

4.5.2. Configuration

The configuration options for the proxy server live inside of a properties file. At the very least, you need to supply the following properties:

protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken port=42424 instance=test zookeepers=localhost:2181 You can find a sample configuration file in your distribution: $ACCUMULO_HOME/proxy/proxy.properties.

This sample configuration file further demonstrates an ability to back the proxy server by MockAccumulo or the MiniAccumuloCluster.

4.5.3. Running the Proxy Server

After the properties file holding the configuration is created, the proxy server can be started using the following command in the Accumulo distribution (assuming your properties file is named config.properties):

$ACCUMULO_HOME/bin/accumulo proxy -p config.properties 4.5.4. Creating a Proxy Client Aside from installing the Thrift compiler, you will also need the language-specific library for Thrift installed to generate client code in that language. Typically, your operating system’s package manager will be able to automatically install these for you in an expected location such as /usr/lib/python/site-packages/thrift. You can find the thrift file for generating the client: $ACCUMULO_HOME/proxy/proxy.thrift.

After a client is generated, the port specified in the configuration properties above will be used to connect to the server.

4.5.5. Using a Proxy Client

The following examples have been written in Java and the method signatures may be slightly different depending on the language specified when generating client with the Thrift compiler. After initiating a connection to the Proxy (see Apache Thrift’s documentation for examples of connecting to a Thrift service), the methods on the proxy client will be available. The first thing to do is log in:

Map password = new HashMap<String,String>();
ByteBuffer token = client.login("root", password);

Once logged in, the token returned will be used for most subsequent calls to the client. Let’s create a table, add some data, scan the table, and delete it.

First, create a table.

client.createTable(token, "myTable", true, TimeType.MILLIS);

// first, create a writer on the server
String writer = client.createWriter(token, "myTable", new WriterOptions());

//rowid
ByteBuffer rowid = ByteBuffer.wrap("UUID".getBytes());

//mutation like class
ColumnUpdate cu = new ColumnUpdate();
cu.setColFamily("MyFamily".getBytes());
cu.setColQualifier("MyQualifier".getBytes());
cu.setColVisibility("VisLabel".getBytes());
cu.setValue("Some Value.".getBytes());

Map<ByteBuffer, List<ColumnUpdate>> cellsToUpdate = new HashMap<ByteBuffer, List<ColumnUpdate>>();

// send updates to the server
client.updateAndFlush(writer, "myTable", cellsToUpdate);

client.closeWriter(writer);

Scan for the data and batch the return of the results on the server:

String scanner = client.createScanner(token, "myTable", new ScanOptions());
ScanResult results = client.nextK(scanner, 100);

for(KeyValue keyValue : results.getResultsIterator()) {
// do something with results
}

client.closeScanner(scanner);

5. Development Clients

Normally, Accumulo consists of lots of moving parts. Even a stand-alone version of Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, etc. If you want to write a unit test that uses Accumulo, you need a lot of infrastructure in place before your test can run.

5.1. Mock Accumulo

Mock Accumulo supplies mock implementations for much of the client API. It presently does not enforce users, logins, permissions, etc. It does support Iterators and Combiners. Note that MockAccumulo holds all data in memory, and will not retain any data or settings between runs.

While normal interaction with the Accumulo client looks like this:

Instance instance = new ZooKeeperInstance(...);
Connector conn = instance.getConnector(user, passwordToken);

To interact with the MockAccumulo, just replace the ZooKeeperInstance with MockInstance:

Instance instance = new MockInstance();

In fact, you can use the --fake option to the Accumulo shell and interact with MockAccumulo:

root@primary> config -s replication.peer.peer=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,peer,$peer_zk_quorum Set the authentication credentials We want to use that special username and password that we created on the peer, so we have a means to write data to the table that we want to replicate to. The configuration key is of the form "replication.peer.user.$peer_name".

root@primary> config -s replication.peer.user.peer=peer
root@primary> config -s replication.peer.password.peer=peer
Enable replication on the table

Now that we have defined the peer on the primary and provided the authentication credentials, we need to configure our table with the implementation of ReplicaSystem we want to use to replicate to the peer. In this case, our peer is an Accumulo instance, so we want to use the AccumuloReplicaSystem.

The configuration for the AccumuloReplicaSystem is the table ID for the table on the peer instance that we want to replicate into. Be sure to use the correct value for $peer_table_id. The configuration key is of the form "table.replication.target.$peer_name".

root@primary> config -t my_table -s table.replication.target.peer=$peer_table_id Finally, we can enable replication on this table. root@primary> config -t my_table -s table.replication=true 12.8. Extra considerations for use While this feature is intended for general-purpose use, its implementation does carry some baggage. Like any software, replication is a feature that operates well within some set of use cases but is not meant to support all use cases. For the benefit of the users, we can enumerate these cases. 12.8.1. Latency As previously mentioned, the replication feature uses the Write-Ahead Log files for a number of reasons, one of which is to prevent the need for data to be written to RFiles before it is available to be replicated. While this can help reduce the latency for a batch of Mutations that have been written to Accumulo, the latency is at least seconds to tens of seconds for replication once ingest is active. For a table which replication has just been enabled on, this is likely to take a few minutes before replication will begin. Once ingest is active and flowing into the system at a regular rate, replication should be occurring at a similar rate, given sufficient computing resources. Replication attempts to copy data at a rate that is to be considered low latency but is not a replacement for custom indexing code which can ensure near real-time referential integrity on secondary indexes. 12.8.2. Table-Configured Iterators Accumulo Iterators tend to be a heavy hammer which can be used to solve a variety of problems. In general, it is highly recommended that Iterators which are applied at major compaction time are both idempotent and associative due to the non-determinism in which some set of files for a Tablet might be compacted. In practice, this translates to common patterns, such as aggregation, which are implemented in a manner resilient to duplication (such as using a Set instead of a List). Due to the asynchronous nature of replication and the expectation that hardware failures and network partitions will exist, it is generally not recommended to not configure replication on a table which has Iterators set which are not idempotent. While the replication implementation can make some simple assertions to try to avoid re-replication of data, it is not presently guaranteed that all data will only be sent to a peer once. Data will be replicated at least once. Typically, this is not a problem as the VersioningIterator will automaticaly deduplicate this over-replication because they will have the same timestamp; however, certain Combiners may result in inaccurate aggregations. As a concrete example, consider a table which has the SummingCombiner configured to sum all values for multiple versions of the same Key. For some key, consider a set of numeric values that are written to a table on the primary: [1, 2, 3]. On the primary, all of these are successfully written and thus the current value for the given key would be 6, (1 + 2 + 3). Consider, however, that each of these updates to the peer were done independently (because other data was also included in the write-ahead log that needed to be replicated). The update with a value of 1 was successfully replicated, and then we attempted to replicate the update with a value of 2 but the remote server never responded. The primary does not know whether the update with a value of 2 was actually applied or not, so the only recourse is to re-send the update. After we receive confirmation that the update with a value of 2 was replicated, we will then replicate the update with 3. If the peer did never apply the first update of 2, the summation is accurate. If the update was applied but the acknowledgement was lost for some reason (system failure, network partition), the update will be resent to the peer. Because addition is non-idempotent, we have created an inconsistency between the primary and peer. As such, the SummingCombiner wouldn’t be recommended on a table being replicated. While there are changes that could be made to the replication implementation which could attempt to mitigate this risk, presently, it is not recommended to configure Iterators or Combiners which are not idempotent to support cases where inaccuracy of aggregations is not acceptable. 12.8.3. Duplicate Keys In Accumulo, when more than one key exists that are exactly the same, keys that are equal down to the timestamp, the retained value is non-deterministic. Replication introduces another level of non-determinism in this case. For a table that is being replicated and has multiple equal keys with different values inserted into it, the final value in that table on the primary instance is not guaranteed to be the final value on all replicas. For example, say the values that were inserted on the primary instance were value1 and value2 and the final value was value1, it is not guaranteed that all replicas will have value1 like the primary. The final value is non-deterministic for each instance. As is the recommendation without replication enabled, if multiple values for the same key (sans timestamp) are written to Accumulo, it is strongly recommended that the value in the timestamp properly reflects the intended version by the client. That is to say, newer values inserted into the table should have larger timestamps. If the time between writing updates to the same key is significant (order minutes), this concern can likely be ignored. 12.8.4. Bulk Imports Currently, files that are bulk imported into a table configured for replication are not replicated. There is no technical reason why it was not implemented, it was simply omitted from the initial implementation. This is considered a fair limitation because bulk importing generated files multiple locations is much simpler than bifurcating "live" ingest data into two instances. Given some existing bulk import process which creates files and them imports them into an Accumulo instance, it is trivial to copy those files to a new HDFS instance and import them into another Accumulo instance using the same process. Hadoop’s distcp command provides an easy way to copy large amounts of data to another HDFS instance which makes the problem of duplicating bulk imports very easy to solve. 13. Implementation Details 13.1. Fault-Tolerant Executor (FATE) Accumulo must implement a number of distributed, multi-step operations to support the client API. Creating a new table is a simple example of an atomic client call which requires multiple steps in the implementation: get a unique table ID, configure default table permissions, populate information in ZooKeeper to record the table’s existence, create directories in HDFS for the table’s data, etc. Implementing these steps in a way that is tolerant to node failure and other concurrent operations is very difficult to achieve. Accumulo includes a Fault-Tolerant Executor (FATE) which is widely used server-side to implement the client API safely and correctly. FATE is the implementation detail which ensures that tables in creation when the Master dies will be successfully created when another Master process is started. This alleviates the need for any external tools to correct some bad state — Accumulo can undo the failure and self-heal without any external intervention. 13.2. Overview FATE consists of two primary components: a repeatable, persisted operation (REPO), a storage layer for REPOs and an execution system to run REPOs. Accumulo uses ZooKeeper as the storage layer for FATE and the Accumulo Master acts as the execution system to run REPOs. The important characteristic of REPOs are that they implemented in a way that is idempotent: every operation must be able to undo or replay a partial execution of itself. Requiring the implementation of the operation to support this functional greatly simplifies the execution of these operations. This property is also what guarantees safety in light of failure conditions. 13.3. Administration Sometimes, it is useful to inspect the current FATE operations, both pending and executing. For example, a command that is not completing could be blocked on the execution of another operation. Accumulo provides an Accumulo shell command to interact with fate. The fate shell command accepts a number of arguments for different functionality: list/print, fail, delete. 13.3.1. List/Print Without any additional arguments, this command will print all operations that still exist in the FATE store (ZooKeeper). This will include active, pending, and completed operations (completed operations are lazily removed from the store). Each operation includes a unique "transaction ID", the state of the operation (e.g. NEW, IN_PROGRESS, FAILED), any locks the transaction actively holds and any locks it is waiting to acquire. This option can also accept transaction IDs which will restrict the list of transactions shown. 13.3.2. Fail This command can be used to manually fail a FATE transaction and requires a transaction ID as an argument. Failing an operation is not a normal procedure and should only be performed by an administrator who understands the implications of why they are failing the operation. 13.3.3. Delete This command requires a transaction ID and will delete any locks that the transaction holds. Like the fail command, this command should only be used in extreme circumstances by an administrator that understands the implications of the command they are about to invoke. It is not normal to invoke this command. 14. SSL Accumulo, through Thrift’s TSSLTransport, provides the ability to encrypt wire communication between Accumulo servers and clients using secure sockets layer (SSL). SSL certifcates signed by the same certificate authority control the "circle of trust" in which a secure connection can be established. Typically, each host running Accumulo processes would be given a certificate which identifies itself. Clients can optionally also be given a certificate, when client-auth is enabled, which prevents unwanted clients from accessing the system. The SSL integration presently provides no authentication support within Accumulo (an Accumulo username and password are still required) and is only used to establish a means for secure communication. 14.1. Server configuration As previously mentioned, the circle of trust is established by the certificate authority which created the certificates in use. Because of the tight coupling of certificate generation with an organization’s policies, Accumulo does not provide a method in which to automatically create the necessary SSL components. Administrators without existing infrastructure built on SSL are encourage to use OpenSSL and the keytool command. An example of these commands are included in a section below. Accumulo servers require a certificate and keystore, in the form of Java KeyStores, to enable SSL. The following configuration assumes these files already exist. In $ACCUMULO_CONF_DIR/accumulo-site.xml, the following properties are required:

• rpc.javax.net.ssl.keyStore=The path on the local filesystem to the keystore containing the server’s certificate

• rpc.javax.net.ssl.trustStore=The path on the local filesystem to the keystore containing the certificate authority’s public key

• rpc.javax.net.ssl.trustStorePassword=The password for the keystore containing the certificate authority’s public key

• instance.rpc.ssl.enabled=true

Optionally, SSL client-authentication (two-way SSL) can also be enabled by setting instance.rpc.ssl.clientAuth=true in ACCUMULO_CONF_DIR/accumulo-site.xml. This requires that each client has access to valid certificate to set up a secure connection to the servers. By default, Accumulo uses one-way SSL which does not require clients to have their own certificate. 14.2. Client configuration To establish a connection to Accumulo servers, each client must also have special configuration. This is typically accomplished through the use of the client configuration file whose default location is ~/.accumulo/config. The following properties must be set to connect to an Accumulo instance using SSL: • rpc.javax.net.ssl.trustStore=The path on the local filesystem to the keystore containing the certificate authority’s public key • rpc.javax.net.ssl.trustStorePassword=The password for the keystore containing the certificate authority’s public key • instance.rpc.ssl.enabled=true If two-way SSL if enabled (instance.rpc.ssl.clientAuth=true) for the instance, the client must also define their own certificate and enable client authenticate as well. • rpc.javax.net.ssl.keyStore=The path on the local filesystem to the keystore containing the server’s certificate • rpc.javax.net.ssl.keyStorePassword=The password for the keystore containing the server’s certificate • instance.rpc.ssl.clientAuth=true 14.3. Generating SSL material using OpenSSL The following is included as an example for generating your own SSL material (certificate authority and server/client certificates) using OpenSSL and Java’s KeyTool command. 14.3.1. Generate a certificate authority # Create a private key openssl genrsa -des3 -out root.key 4096 # Create a certificate request using the private key openssl req -x509 -new -key root.key -days 365 -out root.pem # Generate a Base64-encoded version of the PEM just created openssl x509 -outform der -in root.pem -out root.der # Import the key into a Java KeyStore keytool -import -alias root-key -keystore truststore.jks -file root.der # Remove the DER formatted key file (as we don't need it anymore) rm root.der The truststore.jks file is the Java keystore which contains the certificate authority’s public key. 14.3.2. Generate a certificate/keystore per host It’s common that each host in the instance is issued its own certificate (notably to ensure that revocation procedures can be easily followed). The following steps can be taken for each host. # Create the private key for our server openssl genrsa -out server.key 4096 # Generate a certificate signing request (CSR) with our private key openssl req -new -key server.key -out server.csr # Use the CSR and the CA to create a certificate for the server (a reply to the CSR) openssl x509 -req -in server.csr -CA root.pem -CAkey root.key -CAcreateserial \ -out server.crt -days 365 # Use the certificate and the private key for our server to create PKCS12 file openssl pkcs12 -export -in server.crt -inkey server.key -certfile server.crt \ -name 'server-key' -out server.p12 # Create a Java KeyStore for the server using the PKCS12 file (private key) keytool -importkeystore -srckeystore server.p12 -srcstoretype pkcs12 -destkeystore \ server.jks -deststoretype JKS # Remove the PKCS12 file as we don't need it rm server.p12 # Import the CA-signed certificate to the keystore keytool -import -trustcacerts -alias server-crt -file server.crt -keystore server.jks The server.jks file is the Java keystore containing the certificate for a given host. The above methods are equivalent whether the certficate is generate for an Accumulo server or a client. 15. Kerberos 15.1. Overview Kerberos is a network authentication protocol that provides a secure way for peers to prove their identity over an unsecure network in a client-server model. A centralized key-distribution center (KDC) is the service that coordinates authentication between a client and a server. Clients and servers use "tickets", obtained from the KDC via a password or a special file called a "keytab", to communicate with the KDC and prove their identity. A KDC administrator must create the principal (name for the client/server identiy) and the password or keytab, securely passing the necessary information to the actual user/service. Properly securing the KDC and generated ticket material is central to the security model and is mentioned only as a warning to administrators running their own KDC. To interact with Kerberos programmatically, GSSAPI and SASL are two standards which allow cross-language integration with Kerberos for authentication. GSSAPI, the generic security service application program interface, is a standard which Kerberos implements. In the Java programming language, the language itself also implements GSSAPI which is leveraged by other applications, like Apache Hadoop and Apache Thrift. SASL, simple authentication and security layer, is a framework for authentication and and security over the network. SASL provides a number of mechanisms for authentication, one of which is GSSAPI. Thus, SASL provides the transport which authenticates using GSSAPI that Kerberos implements. Kerberos is a very complicated software application and is deserving of much more description than can be provided here. An explain like I’m 5 blog post is very good at distilling the basics, while MIT Kerberos’s project page contains lots of documentation for users or administrators. Various Hadoop "vendors" also provide free documentation that includes step-by-step instructions for configuring Hadoop and ZooKeeper (which will be henceforth considered as prerequisites). 15.2. Within Hadoop Out of the box, HDFS and YARN have no ability to enforce that a user is who they claim they are. Thus, any basic Hadoop installation should be treated as unsecure: any user with access to the cluster has the ability to access any data. Using Kerberos to provide authentication, users can be strongly identified, delegating to Kerberos to determine who a user is and enforce that a user is who they claim to be. As such, Kerberos is widely used across the entire Hadoop ecosystem for strong authentication. Since server processes accessing HDFS or YARN are required to use Kerberos to authenticate with HDFS, it makes sense that they also require Kerberos authentication from their clients, in addition to other features provided by SASL. A typical deployment involves the creation of Kerberos principals for all server processes (Hadoop datanodes and namenode(s), ZooKeepers), the creation of a keytab file for each principal and then proper configuration for the Hadoop site xml files. Users also need Kerberos principals created for them; however, a user typically uses a password to identify themselves instead of a keytab. Users can obtain a ticket granting ticket (TGT) from the KDC using their password which allows them to authenticate for the lifetime of the TGT (typically one day by default) and alleviates the need for further password authentication. For client server applications, like web servers, a keytab can be created which allow for fully-automated Kerberos identification removing the need to enter any password, at the cost of needing to protect the keytab file. These principals will apply directly to authentication for clients accessing Accumulo and the Accumulo processes accessing HDFS. 15.3. Delegation Tokens MapReduce, a common way that clients interact with Accumulo, does not map well to the client-server model that Kerberos was originally designed to support. Specifically, the parallelization of tasks across many nodes introduces the problem of securely sharing the user credentials across these tasks in as safe a manner as possible. To address this problem, Hadoop introduced the notion of a delegation token to be used in distributed execution settings. A delegation token is nothing more than a short-term, on-the-fly password generated after authenticating with the user’s credentials. In Hadoop itself, the Namenode and ResourceManager, for HDFS and YARN respectively, act as the gateway for delegation tokens requests. For example, before a YARN job is submitted, the implementation will request delegation tokens from the NameNode and ResourceManager so the YARN tasks can communicate with HDFS and YARN. In the same manner, support has been added in the Accumulo Master to generate delegation tokens to enable interaction with Accumulo via MapReduce when Kerberos authentication is enabled in a manner similar to HDFS and YARN. Generating an expiring password is, arguably, more secure than distributing the user’s credentials across the cluster as only access to HDFS, YARN and Accumulo would be compromised in the case of the token being compromised as opposed to the entire Kerberos credential. Additional details for clients and servers will be covered in subsequent sections. 15.4. Configuring Accumulo To configure Accumulo for use with Kerberos, both client-facing and server-facing changes must be made for a functional system on secured Hadoop. As previously mentioned, numerous guidelines already exist on the subject of configuring Hadoop and ZooKeeper for use with Kerberos and won’t be covered here. It is assumed that you have functional Hadoop and ZooKeeper already installed. 15.4.1. Servers The first step is to obtain a Kerberos identity for the Accumulo server processes. When running Accumulo with Kerberos enabled, a valid Kerberos identity will be required to initiate any RPC between Accumulo processes (e.g. Master and TabletServer) in addition to any HDFS action (e.g. client to HDFS or TabletServer to HDFS). Generate Principal and Keytab In the kadmin.local shell or using the -q option on kadmin.local, create a principal for Accumulo for all hosts that are running Accumulo processes. A Kerberos principal is of the form "primary/instance@REALM". "accumulo" is commonly the "primary" (although not required) and the "instance" is the fully-qualified domain name for the host that will be running the Accumulo process — this is required. kadmin.local -q "addprinc -randkey accumulo/host.domain.com" Perform the above for each node running Accumulo processes in the instance, modifying "host.domain.com" for your network. The randkey option generates a random password because we will use a keytab for authentication, not a password, since the Accumulo server processes don’t have an interactive console to enter a password into. kadmin.local -q "xst -k accumulo.hostname.keytab accumulo/host.domain.com" To simplify deployments, at thet cost of security, all Accumulo principals could be globbed into a single keytab kadmin.local -q "xst -k accumulo.service.keytab -glob accumulo*" To ensure that the SASL handshake can occur from clients to servers and servers to servers, all Accumulo servers must share the same instance and realm principal components as the "client" needs to know these to set up the connection with the "server". Server Configuration A number of properties need to be changed to account to properly configure servers in accumulo-site.xml. Key Default Value Description general.kerberos.keytab /etc/security/keytabs/accumulo.service.keytab The path to the keytab for Accumulo on local filesystem. Change the value to the actual path on your system. general.kerberos.principal accumulo/_HOST@REALM The Kerberos principal for Accumulo, needs to match the keytab. "_HOST" can be used instead of the actual hostname in the principal and will be automatically expanded to the current FQDN which reduces the configuration file burden. instance.rpc.sasl.enabled true Enables SASL for the Thrift Servers (supports GSSAPI) rpc.sasl.qop auth One of "auth", "auth-int", or "auth-conf". These map to the SASL defined properties for quality of protection. "auth" is authentication only. "auth-int" is authentication and data integrity. "auth-conf" is authentication, data integrity and confidentiality. instance.security.authenticator org.apache.accumulo.server.security. handler.KerberosAuthenticator Configures Accumulo to use the Kerberos principal as the Accumulo username/principal instance.security.authorizor org.apache.accumulo.server.security. handler.KerberosAuthorizor Configures Accumulo to use the Kerberos principal for authorization purposes instance.security.permissionHandler org.apache.accumulo.server.security. handler.KerberosPermissionHandler Configures Accumulo to use the Kerberos principal for permission purposes trace.token.type org.apache.accumulo.core.client. security.tokens.KerberosToken Configures the Accumulo Tracer to use the KerberosToken for authentication when serializing traces to the trace table. trace.user accumulo/_HOST@REALM The tracer process needs valid credentials to serialize traces to Accumulo. While the other server processes are creating a SystemToken from the provided keytab and principal, we can still use a normal KerberosToken and the same keytab/principal to serialize traces. Like non-Kerberized instances, the table must be created and permissions granted to the trace.user. The same _HOST replacement is performed on this value, substituted the FQDN for _HOST. general.delegation.token.lifetime 7d The length of time that the server-side secret used to create delegation tokens is valid. After a server-side secret expires, a delegation token created with that secret is no longer valid. general.delegation.token.update.interval 1d The frequency in which new server-side secrets should be generated to create delegation tokens for clients. Generating new secrets reduces the likelihood of cryptographic attacks. Although it should be a prerequisite, it is ever important that you have DNS properly configured for your nodes and that Accumulo is configured to use the FQDN. It is extremely important to use the FQDN in each of the "hosts" files for each Accumulo process: masters, monitors, slaves, tracers, and gc. Normally, no changes are needed in accumulo-env.sh to enable Kerberos. Typically, the krb5.conf is installed on the local machine in /etc/, and the Java library implementations will look here to find the necessary configuration to communicate with the KDC. Some installations may require a different krb5.conf to be used for Accumulo: ACCUMULO_KRB5_CONF enables this. ACCUMULO_KRB5_CONF can be configured to a directory containing a file named krb5.conf or the path to the file itself. This will be provided to all Accumulo server and client processes via the JVM system property java.security.krb5.conf. If the environment variable is not set, java.security.krb5.conf will not be set either. KerberosAuthenticator The KerberosAuthenticator is an implementation of the pluggable security interfaces that Accumulo provides. It builds on top of what the default ZooKeeper-based implementation, but removes the need to create user accounts with passwords in Accumulo for clients. As long as a client has a valid Kerberos identity, they can connect to and interact with Accumulo, but without any permissions (e.g. cannot create tables or write data). Leveraging ZooKeeper removes the need to change the permission handler and authorizor, so other Accumulo functions regarding permissions and cell-level authorizations do not change. It is extremely important to note that, while user operations like SecurityOperations.listLocalUsers(), SecurityOperations.dropLocalUser(), and SecurityOperations.createLocalUser() will not return errors, these methods are not equivalent to normal installations, as they will only operate on users which have, at one point in time, authenticated with Accumulo using their Kerberos identity. The KDC is still the authoritative entity for user management. The previously mentioned methods are provided as they simplify management of users within Accumulo, especially with respect to granting Authorizations and Permissions to new users. Accumulo Initialization Out of the box (without Kerberos enabled), Accumulo has a single user with administrative permissions "root". This users is used to "bootstrap" other users, creating less-privileged users for applications using the system. In Kerberos, to authenticate with the system, it’s required that the client presents Kerberos credentials for the principal (user) the client is trying to authenticate as. Because of this, an administrative user named "root" would be useless in an instance using Kerberos, because it is very unlikely to have Kerberos credentials for a principal named root. When Kerberos is enabled, Accumulo will prompt for the name of a user to grant the same permissions as what the root user would normally have. The name of the Accumulo user to grant administrative permissions to can also be given by the -u or --user options. Verifying secure access To verify that servers have correctly started with Kerberos enabled, ensure that the processes are actually running (they should exit immediately if login fails) and verify that you see something similar to the following in the application log. 2015-01-07 11:57:56,826 [security.SecurityUtil] INFO : Attempting to login with keytab as accumulo/hostname@EXAMPLE.COM 2015-01-07 11:57:56,830 [security.UserGroupInformation] INFO : Login successful for user accumulo/hostname@EXAMPLE.COM using keytab file /etc/security/keytabs/accumulo.service.keytab Impersonation Impersonation is functionality which allows a certain user to act as another. One direct application of this concept within Accumulo is the Thrift proxy. The Thrift proxy is configured to accept user requests and pass them onto Accumulo, enabling client access to Accumulo via any thrift-compatible language. When the proxy is running with SASL transports, this enforces that clients present a valid Kerberos identity to make a connection. In this situation, the Thrift proxy server does not have access to the secret key material in order to make a secure connection to Accumulo as the client, it can only connect to Accumulo as itself. Impersonation, in this context, refers to the ability of the proxy to authenticate to Accumulo as itself, but act on behalf of an Accumulo user. Accumulo supports basic impersonation of end-users by a third party via static rules in Accumulo’s site configuration file. These two properties are semi-colon separated properties which are aligned by index. This first element in the user impersonation property value matches the first element in the host impersonation property value, etc. <property> <name>instance.rpc.sasl.allowed.user.impersonation</name> <value>PROXY_USER:*</value>
</property>

<property>
<name>instance.rpc.sasl.allowed.host.impersonation</name>
<value>*</value>
</property>

Here, $PROXY_USER can impersonate any user from any host. The following is an example of specifying a subset of users $PROXY_USER can impersonate and also limiting the hosts from which $PROXY_USER can initiate requests from. <property> <name>instance.rpc.sasl.allowed.user.impersonation</name> <value>$PROXY_USER:user1,user2;$PROXY_USER2:user2,user4</value> </property> <property> <name>instance.rpc.sasl.allowed.host.impersonation</name> <value>host1.domain.com,host2.domain.com;*</value> </property> Here, $PROXY_USER can impersonate user1 and user2 only from host1.domain.com or host2.domain.com. $PROXY_USER2 can impersonate user2 and user4 from any host. In these examples, the value $PROXY_USER is the Kerberos principal of the server which is acting on behalf of a user. Impersonation is enforced by the Kerberos principal and the host from which the RPC originated (from the perspective of the Accumulo TabletServers/Masters). An asterisk (*) can be used to specify all users or all hosts (depending on the context).

Delegation Tokens

Within Accumulo services, the primary task to implement delegation tokens is the generation and distribution of a shared secret among all Accumulo tabletservers and the master. The secret key allows for generation of delegation tokens for users and verification of delegation tokens presented by clients. If a server process is unaware of the secret key used to create a delegation token, the client cannot be authenticated. As ZooKeeper distribution is an asynchronous operation (typically on the order of seconds), the value for general.delegation.token.update.interval should be on the order of hours to days to reduce the likelihood of servers rejecting valid clients because the server did not yet see a new secret key.

Supporting authentication with both Kerberos credentials and delegation tokens, the SASL thrift server accepts connections with either GSSAPI and DIGEST-MD5 mechanisms set. The DIGEST-MD5 mechanism enables authentication as a normal username and password exchange which DelegationTokens leverages.

Since delegation tokens are a weaker form of authentication than Kerberos credentials, user access to obtain delegation tokens from Accumulo is protected with the DELEGATION_TOKEN system permission. Only users with the system permission are allowed to obtain delegation tokens. It is also recommended to configure confidentiality with SASL, using the rpc.sasl.qop=auth-conf configuration property, to ensure that prying eyes cannot view the DelegationToken as it passes over the network.

# Check a user's permissions

# Grant the DELEGATION_TOKEN system permission to a user
admin@REALM@accumulo> grant System.DELEGATION_TOKEN -s -u user@REALM

15.4.2. Clients

Create client principal

Like the Accumulo servers, clients must also have a Kerberos principal created for them. The primary difference between a server principal is that principals for users are created with a password and also not qualified to a specific instance (host).

kadmin.local -q "addprinc $user" The above will prompt for a password for that user which will be used to identify that$user. The user can verify that they can authenticate with the KDC using the command kinit $user. Upon entering the correct password, a local credentials cache will be made which can be used to authenticate with Accumulo, access HDFS, etc. The user can verify the state of their local credentials cache by using the command klist. $ klist
Ticket cache: FILE:/tmp/krb5cc_123
Default principal: user@EXAMPLE.COM

Valid starting       Expires              Service principal
01/07/2015 11:56:35  01/08/2015 11:56:35  krbtgt/EXAMPLE.COM@EXAMPLE.COM
renew until 01/14/2015 11:56:35
Configuration

The second thing clients need to do is to set up their client configuration file. By default, this file is stored in ~/.accumulo/conf, $ACCUMULO_CONF_DIR/client.conf or $ACCUMULO_HOME/conf/client.conf. Accumulo utilities also allow you to provide your own copy of this file in any location using the --config-file command line option.

• instance.rpc.sasl.enabled=true

• kerberos.server.primary=accumulo

The second and third properties must match the configuration of the accumulo servers; this is required to set up the SASL transport.

DelegationTokens with MapReduce

To use DelegationTokens in a custom MapReduce job, the call to setConnectorInfo() method on AccumuloInputFormat or AccumuloOutputFormat should be the only necessary change. Instead of providing an instance of a KerberosToken, the user must call SecurityOperations.getDelegationToken using a Connector obtained with that KerberosToken, and pass the DelegationToken to setConnectorInfo instead of the KerberosToken. It is expected that the user launching the MapReduce job is already logged in via Kerberos via a keytab or via a locally-cached Kerberos ticket-granting-ticket (TGT).

Instance instance = getInstance();
KerberosToken kt = new KerberosToken();
Connector conn = instance.getConnector(principal, kt);
DelegationToken dt = conn.securityOperations().getDelegationToken();

AccumuloInputFormat.setConnectorInfo(job, principal, dt);

// Writing to Accumulo
AccumuloOutputFormat.setConnectorInfo(job, principal, dt);

If the user passes a KerberosToken to the setConnectorInfo method, the implementation will attempt to obtain a DelegationToken automatically, but this does have limitations based on the other MapReduce configuration methods already called and permissions granted to the calling user. It is best for the user to acquire the DelegationToken on their own and provide it directly to setConnectorInfo.

Users must have the DELEGATION_TOKEN system permission to call the getDelegationToken method. The obtained delegation token is only valid for the requesting user for a period of time dependent on Accumulo’s configuration (general.delegation.token.lifetime).

It is also possible to obtain and use DelegationTokens outside of the context of MapReduce.

String principal = "user@REALM";
Instance instance = getInstance();
Connector connector = instance.getConnector(principal, new KerberosToken());
DelegationToken delegationToken = connector.securityOperations().getDelegationToken();

Connector dtConnector = instance.getConnector(principal, delegationToken);

Use of the dtConnector will perform each operation as the original user, but without their Kerberos credentials.

For the duration of validity of the DelegationToken, the user must take the necessary precautions to protect the DelegationToken from prying eyes as it can be used by any user on any host to impersonate the user who requested the DelegationToken. YARN ensures that passing the delegation token from the client JVM to each YARN task is secure, even in multi-tenant instances.

15.4.3. Debugging

Q: I have valid Kerberos credentials and a correct client configuration file but I still get errors like:

java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

A: When you have a valid client configuration and Kerberos TGT, it is possible that the search path for your local credentials cache is incorrect. Check the value of the KRB5CCNAME environment value, and ensure it matches the value reported by klist.

$echo$KRB5CCNAME

$klist Ticket cache: FILE:/tmp/krb5cc_123 Default principal: user@EXAMPLE.COM Valid starting Expires Service principal 01/07/2015 11:56:35 01/08/2015 11:56:35 krbtgt/EXAMPLE.COM@EXAMPLE.COM renew until 01/14/2015 11:56:35$ export KRB5CCNAME=/tmp/krb5cc_123
$echo$KRB5CCNAME
/tmp/krb5cc_123

Q: I thought I had everything configured correctly, but my client/server still fails to log in. I don’t know what is actually failing.

A: Add the following system property to the JVM invocation:

-Dsun.security.krb5.debug=true

This will enable lots of extra debugging at the JVM level which is often sufficient to diagnose some high-level configuration problem. Client applications can add this system property by hand to the command line and Accumulo server processes or applications started using the accumulo script by adding the property to ACCUMULO_GENERAL_OPTS in $ACCUMULO_CONF_DIR/accumulo-env.sh. Additionally, you can increase the log4j levels on org.apache.hadoop.security, which includes the Hadoop UserGroupInformation class, which will include some high-level debug statements. This can be controlled in your client application, or using $ACCUMULO_CONF_DIR/generic_logger.xml

Q: All of my Accumulo processes successfully start and log in with their keytab, but they are unable to communicate with each other, showing the following errors:

2015-01-12 14:47:27,055 [transport.TSaslTransport] ERROR: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:53) at org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.accumulo.core.rpc.UGIAssumingTransport.open(UGIAssumingTransport.java:49)
at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:357)
at org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:255)
at org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.getTableMap(LiveTServerSet.java:106) at org.apache.accumulo.master.Master.gatherTableInformation(Master.java:996) at org.apache.accumulo.master.Master.access$600(Master.java:160)
at org.apache.accumulo.master.Master$StatusThread.updateStatus(Master.java:911) at org.apache.accumulo.master.Master$StatusThread.run(Master.java:901)
Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
... 16 more
Caused by: KrbException: Server not found in Kerberos database (7) - LOOKING_UP_SERVER
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:203)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:309)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:115)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:454)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641)
... 19 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:143)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:66)
at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:61)
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
... 25 more

or

2015-01-12 14:47:29,440 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608) at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 10 more

A: As previously mentioned, the hostname, and subsequently the address each Accumulo process is bound/listening on, is extremely important when negotiating an SASL connection. This problem commonly arises when the Accumulo servers are not configured to listen on the address denoted by their FQDN.

The values in the Accumulo "hosts" files (In $ACCUMULO_CONF_DIR: masters, monitors, slaves, tracers, and gc) should match the instance componentof the Kerberos server principal (e.g. host in accumulo/host\@EXAMPLE.COM). 16. Administration 16.1. Hardware Because we are running essentially two or three systems simultaneously layered across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have at least one core and 2 - 4 GB each. One core running HDFS can typically keep 2 to 4 disks busy, so each machine may typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks. It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB each, but in this case it is recommended to only run up to two processes per machine — i.e. DataNode and TabletServer or DataNode and MapReduce worker but not all three. The constraint here is having enough available heap space for all the processes on a machine. 16.2. Network Accumulo communicates via remote procedure calls over TCP/IP for both passing data and control messages. In addition, Accumulo uses HDFS clients to communicate with HDFS. To achieve good ingest and query performance, sufficient network bandwidth must be available between any two machines. In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will use the following default ports. Please make sure that they are open, or change their value in conf/accumulo-site.xml. Table 1. Accumulo default ports Port Description Property Name 4445 Shutdown Port (Accumulo MiniCluster) n/a 4560 Accumulo monitor (for centralized log display) monitor.port.log4j 9997 Tablet Server tserver.port.client 9999 Master Server master.port.client 12234 Accumulo Tracer trace.port.client 42424 Accumulo Proxy Server n/a 50091 Accumulo GC gc.port.client 50095 Accumulo HTTP monitor monitor.port.client 10001 Master Replication service master.replication.coordinator.port 10002 TabletServer Replication service replication.receipt.service.port In addition, the user can provide 0 and an ephemeral port will be chosen instead. This ephemeral port is likely to be unique and not already bound. Thus, configuring ports to use 0 instead of an explicit value, should, in most cases, work around any issues of running multiple distinct Accumulo instances (or any other process which tries to use the same default ports) on the same hardware. 16.3. Installation Choose a directory for the Accumulo installation. This directory will be referenced by the environment variable $ACCUMULO_HOME. Run the following:

$tar xzf accumulo-1.6.0-bin.tar.gz # unpack to subdirectory$ mv accumulo-1.6.0 $ACCUMULO_HOME # move to desired location Repeat this step at each machine within the cluster. Usually all machines have the same $ACCUMULO_HOME.

16.4. Dependencies

Accumulo requires HDFS and ZooKeeper to be configured and running before starting. Password-less SSH should be configured between at least the Accumulo master and TabletServer machines. It is also a good idea to run Network Time Protocol (NTP) within the cluster to ensure nodes' clocks don’t get too out of sync, which can cause problems with automatically timestamped data.

16.5. Configuration

Accumulo is configured by editing several Shell and XML files found in $ACCUMULO_HOME/conf. The structure closely resembles Hadoop’s configuration files. Logging is primarily controlled using the log4j configuration files, generic_logger.xml and monitor_logger.xml (or their corresponding .properties version if the .xml version is missing). The generic logger is used for most server types, and is typically configured to send logs to the monitor, as well as log files. The monitor logger is used by the monitor, and is typically configured to log only errors the monitor itself generates, rather than all the logs that it receives from other server types. 16.5.1. Edit conf/accumulo-env.sh Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh and specify the following: 1. Enter the location of the installation directory of Accumulo for $ACCUMULO_HOME

2. Enter your system’s Java home for $JAVA_HOME 3. Enter the location of Hadoop for $HADOOP_PREFIX

4. Choose a location for Accumulo logs and enter it for $ACCUMULO_LOG_DIR 5. Enter the location of ZooKeeper for $ZOOKEEPER_HOME

By default Accumulo TabletServers are set to use 1GB of memory. You may change this by altering the value of $ACCUMULO_TSERVER_OPTS. Note the syntax is that of the Java JVM command line options. This value should be less than the physical memory of the machines running TabletServers. There are similar options for the master’s memory usage and the garbage collector process. Reduce these if they exceed the physical RAM of your hardware and increase them, within the bounds of the physical RAM, if a process fails because of insufficient memory. Note that you will be specifying the Java heap space in accumulo-env.sh. You should make sure that the total heap space used for the Accumulo tserver and the Hadoop DataNode and TaskTracker is less than the available memory on each slave node in the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate machines to allow them to use more heap space. If you are running these on the same machine on a small cluster, likewise make sure their heap space settings fit within the available memory. 16.5.2. Native Map The tablet server uses a data structure called a MemTable to store sorted key/value pairs in memory when they are first received from the client. When a minor compaction occurs, this data structure is written to HDFS. The MemTable will default to using memory in the JVM but a JNI version, called the native map, can be used to significantly speed up performance by utilizing the memory space of the native operating system. The native map also avoids the performance implications brought on by garbage collection in the JVM by causing it to pause much less frequently. Building 32-bit and 64-bit Linux and Mac OS X versions of the native map can be built from the Accumulo bin package by executing $ACCUMULO_HOME/bin/build_native_library.sh. If your system’s default compiler options are insufficient, you can add additional compiler options to the command line, such as options for the architecture. These will be passed to the Makefile in the environment variable USERFLAGS.

Examples:

1. $ACCUMULO_HOME/bin/build_native_library.sh 2. $ACCUMULO_HOME/bin/build_native_library.sh -m32

After building the native map from the source, you will find the artifact in $ACCUMULO_HOME/lib/native. Upon starting up, the tablet server will look in this directory for the map library. If the file is renamed or moved from its target directory, the tablet server may not be able to find it. The system can also locate the native maps shared library by setting LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on Mac OS X) in $ACCUMULO_HOME/conf/accumulo-env.sh.

Native Maps Configuration

As mentioned, Accumulo will use the native libraries if they are found in the expected location and if it is not configured to ignore them. Using the native maps over JVM Maps nets a noticable improvement in ingest rates; however, certain configuration variables are important to modify when increasing the size of the native map.

To adjust the size of the native map, increase the value of tserver.memory.maps.max. By default, the maximum size of the native map is 1GB. When increasing this value, it is also important to adjust the values of table.compaction.minor.logs.threshold and tserver.walog.max.size. table.compaction.minor.logs.threshold is the maximum number of write-ahead log files that a tablet can reference before they will be automatically minor compacted. tserver.walog.max.size is the maximum size of a write-ahead log.

The maximum size of the native maps for a server should be less than the product of the write-ahead log maximum size and minor compaction threshold for log files:

2. Write the IP addresses or domain name of the machines that will be TabletServers in $ACCUMULO_HOME/conf/slaves, one per line. Note that if using domain names rather than IP addresses, DNS must be configured properly for all machines participating in the cluster. DNS can be a confusing source of errors. 16.5.4. Accumulo Settings Specify appropriate values for the following settings in $ACCUMULO_HOME/conf/accumulo-site.xml :

<property>
<name>instance.zookeeper.host</name>
<value>zooserver-one:2181,zooserver-two:2181</value>
<description>list of zookeeper servers</description>
</property>

This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate settings between processes and helps finalize TabletServer failure.

<property>
<name>instance.secret</name>
<value>DEFAULT</value>
</property>

The instance needs a secret to enable secure communication between servers. Configure your secret and make sure that the accumulo-site.xml file is not readable to other users. For alternatives to storing the instance.secret in plaintext, please read the Sensitive Configuration Values section.

Some settings can be modified via the Accumulo shell and take effect immediately, but some settings require a process restart to take effect. See the configuration documentation (available in the docs directory of the tarball and in Configuration Management) for details.

One aspect of Accumulo’s configuration which is different as compared to the rest of the Hadoop ecosystem is that the server-process classpath is determined in part by multiple values. A bootstrap classpath is based soley on the accumulo-start.jar, Log4j and $ACCUMULO_CONF_DIR. A second classloader is used to dynamically load all of the resources specified by general.classpaths in $ACCUMULO_CONF_DIR/accumulo-site.xml. This value is a comma-separated list of regular-expression paths which are all loaded into a secondary classloader. This includes Hadoop, Accumulo and ZooKeeper jars necessary to run Accumulo. When this value is not defined, a default value is used which attempts to load Hadoop from multiple potential locations depending on how Hadoop was installed. It is strongly recommended that general.classpaths is defined and limited to only the necessary jars to prevent extra jars from being unintentionally loaded into Accumulo processes.

16.5.5. Hostnames in configuration files

Accumulo has a number of configuration files which can contain references to other hosts in your network. All of the "host" configuration files for Accumulo (gc, masters, slaves, monitor, tracers) as well as instance.volumes in accumulo-site.xml must contain some host reference.

While IP address, short hostnames, or fully qualified domain names (FQDN) are all technically valid, it is good practice to always use FQDNs for both Accumulo and other processes in your Hadoop cluster. Failing to consistently use FQDNs can have unexpected consequences in how Accumulo uses the FileSystem.

A common way for this problem can be observed is via applications that use Bulk Ingest. The Accumulo Master coordinates moving the input files to Bulk Ingest to an Accumulo-managed directory. However, Accumulo cannot safely move files across different Hadoop FileSystems. This is problematic because Accumulo also cannot make reliable assertions across what is the same FileSystem which is specified with different names. Naively, while 127.0.0.1:8020 might be a valid identifier for an HDFS instance, Accumulo identifies localhost:8020 as a different HDFS instance than 127.0.0.1:8020.

Copy the masters, slaves, accumulo-env.sh, and if necessary, accumulo-site.xml from the $ACCUMULO_HOME/conf/ directory on the master to all the machines specified in the slaves file. 16.5.7. Sensitive Configuration Values Accumulo has a number of properties that can be specified via the accumulo-site.xml file which are sensitive in nature, instance.secret and trace.token.property.password are two common examples. Both of these properties, if compromised, have the ability to result in data being leaked to users who should not have access to that data. In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as a common implementation to abstract away the storage and retrieval of passwords from plaintext storage in configuration files. Any Property marked with the Sensitive annotation is a candidate for use with these CredentialProviders. For version of Hadoop which lack these classes, the feature will just be unavailable for use. A comma separated list of CredentialProviders can be configured using the Accumulo Property general.security.credential.provider.paths. Each configured URL will be consulted when the Configuration object for accumulo-site.xml is accessed. 16.5.8. Using a JavaKeyStoreCredentialProvider for storage One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore CredentialProvider. Each entry in the KeyStore is the Accumulo Property key name. For example, to store the \texttt{instance.secret}, the following command can be used: hadoop credential create instance.secret --provider jceks://file/etc/accumulo/conf/accumulo.jceks The command will then prompt you to enter the secret to use and create a keystore in: /etc/accumulo/conf/accumulo.jceks Then, accumulo-site.xml must be configured to use this KeyStore as a CredentialProvider: <property> <name>general.security.credential.provider.paths</name> <value>jceks://file/etc/accumulo/conf/accumulo.jceks</value> </property> This configuration will then transparently extract the instance.secret from the configured KeyStore and alleviates a human readable storage of the sensitive property. A KeyStore can also be stored in HDFS, which will make the KeyStore readily available to all Accumulo servers. If the local filesystem is used, be aware that each Accumulo server will expect the KeyStore in the same location. 16.5.9. Client Configuration In version 1.6.0, Accumulo includes a new type of configuration file known as a client configuration file. One problem with the traditional "site.xml" file that is prevalent through Hadoop is that it is a single file used by both clients and servers. This makes is very difficult to protect secrets that are only meant for the server processes while allowing the clients to connect to the servers. The client configuration file is a subset of the information stored in accumulo-site.xml meant only for consumption by clients of Accumulo. By default, Accumulo checks a number of locations for a client configuration by default: • ${ACCUMULO_CONF_DIR}/client.conf

• /etc/accumulo/client.conf

• /etc/accumulo/conf/client.conf

• ~/.accumulo/config

These files are Java Properties files. These files can currently contain information about ZooKeeper servers, RPC properties (such as SSL or SASL connectors), distributed tracing properties. Valid properties are defined by the ClientProperty enum contained in the client API.

16.5.10. Custom Table Tags

Accumulo has the ability for users to add custom tags to tables. This allows applications to set application-level metadata about a table. These tags can be anything from a table description, administrator notes, date created, etc. This is done by naming and setting a property with a prefix table.custom.*.

Currently, table properties are stored in ZooKeeper. This means that the number and size of custom properties should be restricted on the order of 10’s of properties at most without any properties exceeding 1MB in size. ZooKeeper’s performance can be very sensitive to an excessive number of nodes and the sizes of the nodes. Applications which leverage the user of custom properties should take these warnings into consideration. There is no enforcement of these warnings via the API.

16.6. Initialization

Accumulo must be initialized to create the structures it uses internally to locate data across the cluster. HDFS is required to be configured and running before Accumulo can be initialized.

Once HDFS is started, initialization can be performed by executing $ACCUMULO_HOME/bin/accumulo init . This script will prompt for a name for this instance of Accumulo. The instance name is used to identify a set of tables and instance-specific settings. The script will then write some information into HDFS so Accumulo can start properly. The initialization script will prompt you to set a root password. Once Accumulo is initialized it can be started. 16.7. Running 16.7.1. Starting Accumulo Make sure Hadoop is configured on all of the machines in the cluster, including access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running. Make sure ZooKeeper is configured and running on at least one machine in the cluster. Start Accumulo using the bin/start-all.sh script. To verify that Accumulo is running, check the Status page as described in Monitoring. In addition, the Shell can provide some information about the status of tables via reading the metadata tables. 16.7.2. Stopping Accumulo To shutdown cleanly, run bin/stop-all.sh and the master will orchestrate the shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may take some time for particular configurations. 16.7.3. Adding a Node Update your $ACCUMULO_HOME/conf/slaves (or $ACCUMULO_CONF_DIR/slaves) file to account for the addition. $ACCUMULO_HOME/bin/accumulo admin start <host(s)> {<host> ...}

Alternatively, you can ssh to each of the hosts you want to add and run:

$ACCUMULO_HOME/bin/start-here.sh Make sure the host in question has the new configuration, or else the tablet server won’t start; at a minimum this needs to be on the host(s) being added, but in practice it’s good to ensure consistent configuration across all nodes. 16.7.4. Decomissioning a Node If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet server. Accumulo will automatically rebalance the tablets across the available tablet servers. $ACCUMULO_HOME/bin/accumulo admin stop <host(s)> {<host> ...}

Alternatively, you can ssh to each of the hosts you want to remove and run:

16.7.6. Running multiple TabletServers on a single node

With very powerful nodes, it may be beneficial to run more than one TabletServer on a given node. This decision should be made carefully and with much deliberation as Accumulo is designed to be able to scale to using 10’s of GB of RAM and 10’s of CPU cores.

To run multiple TabletServers on a single host, it is necessary to create multiple Accumulo configuration directories. Ensuring that these properties are appropriately set (and remain consistent) are an exercise for the user.

Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from other nodes. This requires additional configuration values in accumulo-site.xml:

• tserver.port.client

• replication.receipt.service.port

Normally, setting a value of 0 for these configuration properties is sufficient. In some environment, the ports used by Accumulo must be well-known for security reasons and require a separate copy of the configuration files to use a static port for each TabletServer instance.

It is also necessary to update the following exported variables in accumulo-env.sh.

• ACCUMULO_LOG_DIR

The values for these properties are left up to the user to define; there are no constraints other than ensuring that the directory exists and the user running Accumulo has the permission to read/write into that directory.

Accumulo’s provided scripts for stopping a cluster operate under the assumption that one process is running per host. As such, starting and stopping multiple TabletServers on one host requires more effort on the user. It is important to ensure that ACCUMULO_CONF_DIR is correctly set for the instance of the TabletServer being started.

$ACCUMULO_CONF_DIR=$ACCUMULO_HOME/conf $ACCUMULO_HOME/bin/accumulo tserver --address <your_server_ip> & To stop TabletServers, the normal stop-all.sh will stop all instances of TabletServers across all nodes. Using the provided kill command by your operation system is an option to stop a single instance on a single node. stop-server.sh can be used to stop all TabletServers on a single node. 16.8. Monitoring 16.8.1. Accumulo Monitor The Accumulo Monitor provides an interface for monitoring the status and health of Accumulo components. The Accumulo Monitor provides a web UI for accessing this information at http://monitorhost:50095/. Things highlighted in yellow may be in need of attention. If anything is highlighted in red on the monitor page, it is something that definitely needs attention. The Overview page contains some summary information about the Accumulo instance, including the version, instance name, and instance ID. There is a table labeled Accumulo Master with current status, a table listing the active Zookeeper servers, and graphs displaying various metrics over time. These include ingest and scan performance and other useful measurements. The Master Server, Tablet Servers, and Tables pages display metrics grouped in different ways (e.g. by tablet server or by table). Metrics typically include number of entries (key/value pairs), ingest and query rates. The number of running scans, major and minor compactions are in the form number_running (number_queued). Another important metric is hold time, which is the amount of time a tablet has been waiting but unable to flush its memory in a minor compaction. The Server Activity page graphically displays tablet server status, with each server represented as a circle or square. Different metrics may be assigned to the nodes' color and speed of oscillation. The Overall Avg metric is only used on the Server Activity page, and represents the average of all the other metrics (after normalization). Similarly, the Overall Max metric picks the metric with the maximum normalized value. The Garbage Collector page displays a list of garbage collection cycles, the number of files found of each type (including deletion candidates in use and files actually deleted), and the length of the deletion cycle. The Traces page displays data for recent traces performed (see the following section for information on Tracing). The Recent Logs page displays warning and error logs forwarded to the monitor from all Accumulo processes. Also, the XML and JSON links provide metrics in XML and JSON formats, respectively. 16.8.2. SSL SSL may be enabled for the monitor page by setting the following properties in the accumulo-site.xml file: monitor.ssl.keyStore monitor.ssl.keyStorePassword monitor.ssl.trustStore monitor.ssl.trustStorePassword If the Accumulo conf directory has been configured (in particular the accumulo-env.sh file must be set up), the generate_monitor_certificate.sh script in the Accumulo bin directory can be used to create the keystore and truststore files with random passwords. The script will print out the properties that need to be added to the accumulo-site.xml file. The stores can also be generated manually with the Java keytool command, whose usage can be seen in the generate_monitor_certificate.sh script. If desired, the SSL ciphers allowed for connections can be controlled via the following properties in accumulo-site.xml: monitor.ssl.include.ciphers monitor.ssl.exclude.ciphers If SSL is enabled, the monitor URL can only be accessed via https. This also allows you to access the Accumulo shell through the monitor page. The left navigation bar will have a new link to Shell. An Accumulo user name and password must be entered for access to the shell. 16.9. Metrics Accumulo is capable of using the Hadoop Metrics2 library and is configured by default to use it. Metrics2 is a library which allows for routing of metrics generated by registered MetricsSources to configured MetricsSinks. Examples of sinks that are implemented by Hadoop include file-based logging, Graphite and Ganglia. All metric sources are exposed via JMX when using Metrics2. Previous to Accumulo 1.7.0, JMX endpoints could be exposed in addition to file-based logging of those metrics configured via the accumulo-metrics.xml file. This mechanism can still be used by setting general.legacy.metrics to true in accumulo-site.xml. 16.9.1. Metrics2 Configuration Metrics2 is configured by examining the classpath for a file that matches hadoop-metrics2*.properties. The example configuration files that Accumulo provides for use include hadoop-metrics2-accumulo.properties as a template which can be used to enable file, Graphite or Ganglia sinks (some minimal configuration required for Graphite and Ganglia). Because the Hadoop configuration is also on the Accumulo classpath, be sure that you do not have multiple Metrics2 configuration files. It is recommended to consolidate metrics in a single properties file in a central location to remove ambiguity. The contents of hadoop-metrics2-accumulo.properties can be added to a central hadoop-metrics2.properties in $HADOOP_CONF_DIR.

As a note for configuring the file sink, the provided path should be absolute. A relative path or file name will be created relative to the directory in which the Accumulo process was started. External tools, such as logrotate, can be used to prevent these files from growing without bound.

Each server process should have log messages from the Metrics2 library about the sinks that were created. Be sure to check the Accumulo processes log files when debugging missing metrics output.

For additional information on configuring Metrics2, visit the Javadoc page for Metrics2.

16.10. Tracing

It can be difficult to determine why some operations are taking longer than expected. For example, you may be looking up items with very low latency, but sometimes the lookups take much longer. Determining the cause of the delay is difficult because the system is distributed, and the typical lookup is fast.

Accumulo has been instrumented to record the time that various operations take when tracing is turned on. The fact that tracing is enabled follows all the requests made on behalf of the user throughout the distributed infrastructure of accumulo, and across all threads of execution.

These time spans will be inserted into the trace table in Accumulo. You can browse recent traces from the Accumulo monitor page. You can also read the trace table directly like any other table.

The design of Accumulo’s distributed tracing follows that of Google’s Dapper.

16.12. Watcher

Accumulo includes scripts to automatically restart server processes in the case of intermittent failures. To enable this watcher, edit conf/accumulo-env.sh to include the following:

# Should process be automatically restarted
export ACCUMULO_WATCHER="true"

# What settings should we use for the watcher, if enabled
export UNEXPECTED_TIMESPAN="3600"
export UNEXPECTED_RETRIES="2"

export OOM_TIMESPAN="3600"
export OOM_RETRIES="5"

export ZKLOCK_TIMESPAN="600"
export ZKLOCK_RETRIES="5"

When an Accumulo process dies, the watcher will look at the logs and exit codes to determine how the process failed and either restart or fail depending on the recent history of failures. The restarting policy for various failure conditions is configurable through the *_TIMESPAN and *_RETRIES variables shown above.

16.13. Recovery

In the event of TabletServer failure or error on shutting Accumulo down, some mutations may not have been minor compacted to HDFS properly. In this case, Accumulo will automatically reapply such mutations from the write-ahead log either when the tablets from the failed server are reassigned by the Master (in the case of a single TabletServer failure) or the next time Accumulo starts (in the event of failure during shutdown).

Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing updates. The sort status of each file is displayed on Accumulo monitor status page. Once the recovery is complete any tablets involved should return to an “online” state. Until then those tablets will be unavailable to clients.

The Accumulo client library is configured to retry failed mutations and in many cases clients will be able to continue processing after the recovery process without throwing an exception.

16.14. Migrating Accumulo from non-HA Namenode to HA Namenode

The following steps will allow a non-HA instance to be migrated to an HA instance. Consider an HDFS URL hdfs://namenode.example.com:8020 which is going to be moved to hdfs://nameservice1.

Before moving HDFS over to the HA namenode, use $ACCUMULO_HOME/bin/accumulo admin volumes to confirm that the only volume displayed is the volume from the current namenode’s HDFS URL. Listing volumes referenced in zookeeper Volume : hdfs://namenode.example.com:8020/accumulo Listing volumes referenced in accumulo.root tablets section Volume : hdfs://namenode.example.com:8020/accumulo Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time) Listing volumes referenced in accumulo.metadata tablets section Volume : hdfs://namenode.example.com:8020/accumulo Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time) After verifying the current volume is correct, shut down the cluster and transition HDFS to the HA nameservice. Edit $ACCUMULO_HOME/conf/accumulo-site.xml to notify accumulo that a volume is being replaced. First, add the new nameservice volume to the instance.volumes property. Next, add the instance.volumes.replacements property in the form of old new. It’s important to not include the volume that’s being replaced in instance.volumes, otherwise it’s possible accumulo could continue to write to the volume.

<!-- instance.dfs.uri and instance.dfs.dir should not be set-->
<property>
<name>instance.volumes</name>
<value>hdfs://nameservice1/accumulo</value>
</property>
<property>
<name>instance.volumes.replacements</name>
<value>hdfs://namenode.example.com:8020/accumulo hdfs://nameservice1/accumulo</value>
</property>

Run $ACCUMULO_HOME/bin/accumulo init --add-volumes and start up the accumulo cluster. Verify that the new nameservice volume shows up with $ACCUMULO_HOME/bin/accumulo admin volumes.

Listing volumes referenced in zookeeper
Volume : hdfs://namenode.example.com:8020/accumulo
Volume : hdfs://nameservice1/accumulo
Listing volumes referenced in accumulo.root tablets section
Volume : hdfs://namenode.example.com:8020/accumulo
Volume : hdfs://nameservice1/accumulo
Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time)
Listing volumes referenced in accumulo.metadata tablets section
Volume : hdfs://namenode.example.com:8020/accumulo
Volume : hdfs://nameservice1/accumulo
Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time)

Some erroneous GarbageCollector messages may still be seen for a small period while data is transitioning to the new volumes. This is expected and can usually be ignored.

16.15. Achieving Stability in a VM Environment

For testing, demonstration, and even operation uses, Accumulo is often installed and run in a virtual machine (VM) environment. The majority of long-term operational uses of Accumulo are on bare-metal cluster. However, the core design of Accumulo and its dependencies do not preclude running stably for long periods within a VM. Many of Accumulo’s operational robustness features to handle failures like periodic network partitioning in a large cluster carry over well to VM environments. This guide covers general recommendations for maximizing stability in a VM environment, including some of the common failure modes that are more common when running in VMs.

16.15.1. Known failure modes: Setup and Troubleshooting

In addition to the general failure modes of running Accumulo, VMs can introduce a couple of environmental challenges that can affect process stability. Clock drift is something that is more common in VMs, especially when VMs are suspended and resumed. Clock drift can cause Accumulo servers to assume that they have lost connectivity to the other Accumulo processes and/or lose their locks in Zookeeper. VM environments also frequently have constrained resources, such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally deals well with constrained resources from a stability perspective (optimizing performance will require additional tuning, which is not covered in this section), however there are some limits.

Physical Memory

One of those limits has to do with the Linux out of memory killer. A common failure mode in VM environments (and in some bare metal installations) is when the Linux out of memory killer decides to kill processes in order to avoid a kernel panic when provisioning a memory page. This often happens in VMs due to the large number of processes that must run in a small memory footprint. In addition to the Linux core processes, a single-node Accumulo setup requires a Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer. Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop ResourceManager, a Hadoop NodeManager, provisioning software, and client applications. Between all of these processes, it is not uncommon to over-subscribe the available RAM in a VM. We recommend setting up VMs without swap enabled, so rather than performance grinding to a halt when physical memory is exhausted the kernel will randomly* select processes to kill in order to free up memory.

Calculating the maximum possible memory usage is essential in creating a stable Accumulo VM setup. Safely engineering memory allocation for stability is a matter of then bringing the calculated maximum memory usage under the physical memory by a healthy margin. The margin is to account for operating system-level operations, such as managing process, maintaining virtual memory pages, and file system caching. When the java out-of-memory killer finds your process, you will probably only see evidence of that in /var/log/messages. Out-of-memory process kills do not show up in Accumulo or Hadoop logs.

To calculate the max memory usage of all java virtual machine (JVM) processes add the maximum heap size (often limited by a -Xmx…​ argument, such as in accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage includes the following:

• "Permanent Space", where the JVM stores Classes, Methods, and other code elements. This can be limited by a JVM flag such as -XX:MaxPermSize:100m, and is typically tens of megabytes.

• Code generation space, where the JVM stores just-in-time compiled code. This is typically small enough to ignore

• Socket buffers, where the JVM stores send and receive buffers for each socket.

• Thread stacks, where the JVM allocates memory to manage each thread.

• Direct memory space and JNI code, where applications can allocate memory outside of the JVM-managed space. For Accumulo, this includes the native in-memory maps that are allocated with the memory.maps.max parameter in accumulo-site.xml.

• Garbage collection space, where the JVM stores information used for garbage collection.

You can assume that each Hadoop and Accumulo process will use ~100-150MB for Off-heap memory, plus the in-memory map of the Accumulo TServer process. A simple calculation for physical memory requirements follows:

  Physical memory needed
= (per-process off-heap memory) + (heap memory) + (other processes) + (margin)
= (number of java processes * 150M + native map) + (sum of -Xmx settings for java process) + (total applications memory, provisioning memory, etc.) + (1G)
= (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M +512M) + (2G) + (1G)
= (2150M) + (7G) + (2G) + (1G)
= ~12GB

These calculations can add up quickly with the large number of processes, especially in constrained VM environments. To reduce the physical memory requirements, it is a good idea to reduce maximum heap limits and turn off unnecessary processes. If you’re not using YARN in your application, you can turn off the ResourceManager and NodeManager. If you’re not expecting to re-provision the cluster frequently you can turn off or reduce provisioning processes such as Salt Stack minions and masters.

Disk Space

Disk space is primarily used for two operations: storing data and storing logs. While Accumulo generally stores all of its key/value data in HDFS, Accumulo, Hadoop, and Zookeeper all store a significant amount of logs in a directory on a local file system. Care should be taken to make sure that (a) limitations to the amount of logs generated are in place, and (b) enough space is available to host the generated logs on the partitions that they are assigned. When space is not available to log, processes will hang. This can cause interruptions in availability of Accumulo, as well as cascade into failures of various processes.

Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of them has a way of limiting the logs and directing them to a particular directory. Logs are generated independently for each process, so when considering the total space you need to add up the maximum logs generated by each process. Typically, a rolling log setup in which each process can generate something like 10 100MB files is instituted, resulting in a maximum file system usage of 1GB per process. Default setups for Hadoop and Zookeeper are often unbounded, so it is important to set these limits in the logging configuration files for each subsystem. Consult the user manual for each system for instructions on how to limit generated logs.

Zookeeper Interaction

Accumulo is designed to scale up to thousands of nodes. At that scale, intermittent interruptions in network service and other rare failures of compute nodes become more common. To limit the impact of node failures on overall service availability, Accumulo uses a heartbeat monitoring system that leverages Zookeeper’s ephemeral locks. There are several conditions that can occur that cause Accumulo process to lose their Zookeeper locks, some of which are true interruptions to availability and some of which are false positives. Several of these conditions become more common in VM environments, where they can be exacerbated by resource constraints and clock drift.

Accumulo includes a mechanism to limit the impact of the false positives known as the Watcher. The watcher monitors Accumulo processes and will restart them when they fail for certain reasons. The watcher can be configured within the accumulo-env.sh file inside of Accumulo’s configuration directory. We recommend using the watcher to monitor Accumulo processes, as it will restore the system to full capacity without administrator interaction after many of the common failure modes.

16.15.2. Tested Versions

Another large consideration for Accumulo stability is to use versions of software that have been tested together in a VM environment. Any cluster of processes that have not been tested together are likely to expose running conditions that vary from the environments individually tested in the various components. For example, Accumulo’s use of HDFS includes many short block reads, which differs from the more common full file read used in most map/reduce applications. We have found that certain versions of Accumulo and Hadoop will include stability bugs that greatly affect overall stability. In our testing, Accumulo 1.6.2, Hadoop 2.6.0, and Zookeeper 3.4.6 resulted in a stable VM clusters that did not fail a month of testing, while Accumulo 1.6.1, Hadoop 2.5.1, and Zookeeper 3.4.5 had a mean time between failure of less than a week under heavy ingest and query load. We expect that results will vary with other configurations, and you should choose your software versions with that in mind.

17. Multi-Volume Installations

This is an advanced configuration setting for very large clusters under a lot of write pressure.

The HDFS NameNode holds all of the metadata about the files in HDFS. For fast performance, all of this information needs to be stored in memory. A single NameNode with 64G of memory can store the metadata for tens of millions of files.However, when scaling beyond a thousand nodes, an active Accumulo system can generate lots of updates to the file system, especially when data is being ingested. The large number of write transactions to the NameNode, and the speed of a single edit log, can become the limiting factor for large scale Accumulo installations.

You can see the effect of slow write transactions when the Accumulo Garbage Collector takes a long time (more than 5 minutes) to delete the files Accumulo no longer needs. If your Garbage Collector routinely runs in less than a minute, the NameNode is performing well.

However, if you do begin to experience slow-down and poor GC performance, Accumulo can be configured to use multiple NameNode servers. The configuration “instance.volumes” should be set to a comma-separated list, using full URI references to different NameNode servers:

<property>
<name>instance.volumes</name>
<value>hdfs://ns1:9001,hdfs://ns2:9001</value>
</property>

The introduction of multiple volume support in 1.6 changed the way Accumulo stores pointers to files. It now stores fully qualified URI references to files. Before 1.6, Accumulo stored paths that were relative to a table directory. After an upgrade these relative paths will still exist and are resolved using instance.dfs.dir, instance.dfs.uri, and Hadoop configuration in the same way they were before 1.6.

If the URI for a namenode changes (e.g. namenode was running on host1 and its moved to host2), then Accumulo will no longer function. Even if Hadoop and Accumulo configurations are changed, the fully qualified URIs stored in Accumulo will still contain the old URI. To handle this Accumulo has the following configuration property for replacing URI stored in its metadata. The example configuration below will replace ns1 with nsA and ns2 with nsB in Accumulo metadata. For this property to take affect, Accumulo will need to be restarted.

<property>
<name>instance.volumes.replacements</name>
<value>hdfs://ns1:9001 hdfs://nsA:9001, hdfs://ns2:9001 hdfs://nsB:9001</value>
</property>

Using viewfs or HA namenode, introduced in Hadoop 2, offers another option for managing the fully qualified URIs stored in Accumulo. Viewfs and HA namenode both introduce a level of indirection in the Hadoop configuration. For example assume viewfs:///nn1 maps to hdfs://nn1 in the Hadoop configuration. If viewfs://nn1 is used by Accumulo, then its easy to map viewfs://nn1 to hdfs://nnA by changing the Hadoop configuration w/o doing anything to Accumulo. A production system should probably use a HA namenode. Viewfs may be useful on a test system with a single non HA namenode.

You may also want to configure your cluster to use Federation, available in Hadoop 2.0, which allows DataNodes to respond to multiple NameNode servers, so you do not have to partition your DataNodes by NameNode.

18. Troubleshooting

18.1. Logs

Q: The tablet server does not seem to be running!? What happened?

Accumulo is a distributed system. It is supposed to run on remote equipment, across hundreds of computers. Each program that runs on these remote computers writes down events as they occur, into a local file. By default, this is defined in $ACCUMULO_HOME/conf/accumule-env.sh as ACCUMULO_LOG_DIR. A: Look in the $ACCUMULO_LOG_DIR/tserver*.log file. Specifically, check the end of the file.

Q: The tablet server did not start and the debug log does not exists! What happened?

When the individual programs are started, the stdout and stderr output of these programs are stored in .out and .err files in $ACCUMULO_LOG_DIR. Often, when there are missing configuration options, files or permissions, messages will be left in these files. A: Probably a start-up problem. Look in $ACCUMULO_LOG_DIR/tserver*.err

18.2. Monitor

Q: Accumulo is not working, what’s wrong?

There’s a small web server that collects information about all the components that make up a running Accumulo instance. It will highlight unusual or unexpected conditions.

A: Point your browser to the monitor (typically the master host, on port 50095). Is anything red or yellow?

Q: My browser is reporting connection refused, and I cannot get to the monitor

The monitor program’s output is also written to .err and .out files in the $ACCUMULO_LOG_DIR. Look for problems in this file if the $ACCUMULO_LOG_DIR/monitor*.log file does not exist.

A: The monitor program is probably not running. Check the log files for errors.

Q: My browser hangs trying to talk to the monitor.

Your browser needs to be able to reach the monitor program. Often large clusters are firewalled, or use a VPN for internal communications. You can use SSH to proxy your browser to the cluster, or consult with your system administrator to gain access to the server from your browser.

It is sometimes helpful to use a text-only browser to sanity-check the monitor while on the machine running the monitor:

You can use:

$hadoop fsck /accumulo/path/to/corrupt/file -locations -blocks -files to locate the block references of individual corrupt files and use those references to search the name node and individual data node logs to determine which servers those blocks have been assigned and then try to fix any underlying file system issues on those nodes. On a larger cluster, you may need to increase the number of Xcievers for HDFS DataNodes: <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> A: Verify HDFS is healthy, check the datanode logs. 18.4. Zookeeper Q: accumulo init is hanging. It says something about talking to zookeeper. Zookeeper is also a distributed service. You will need to ensure that it is up. You can run the zookeeper command line tool to connect to any one of the zookeeper servers: $ zkCli.sh -server zoohost
...
[zk: zoohost:2181(CONNECTED) 0]

It is important to see the word CONNECTED! If you only see CONNECTING you will need to diagnose zookeeper errors.

A: Check to make sure that zookeeper is up, and that $ACCUMULO_HOME/conf/accumulo-site.xml has been pointed to your zookeeper server(s). Q: Zookeeper is running, but it does not say CONNECTED Zookeeper processes talk to each other to elect a leader. All updates go through the leader and propagate to a majority of all the other nodes. If a majority of the nodes cannot be reached, zookeeper will not allow updates. Zookeeper also limits the number connections to a server from any other single host. By default, this limit can be as small as 10 and can be reached in some everything-on-one-machine test configurations. You can check the election status and connection status of clients by asking the zookeeper nodes for their status. You connect to zookeeper and ask it with the four-letter stat command: $ nc zoohost 2181
stat
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
/127.0.0.1:58289[0](queued=0,recved=1,sent=0)
/127.0.0.1:60231[1](queued=0,recved=53910,sent=53915)

Latency min/avg/max: 0/5/3008
Sent: 1561592
Connections: 2
Outstanding: 0
Zxid: 0x621a3b
Mode: standalone
Node count: 22524

A: Check zookeeper status, verify that it has a quorum, and has not exceeded maxClientCnxns.

Q: My tablet server crashed! The logs say that it lost its zookeeper lock.

Tablet servers reserve a lock in zookeeper to maintain their ownership over the tablets that have been assigned to them. Part of their responsibility for keeping the lock is to send zookeeper a keep-alive message periodically. If the tablet server fails to send a message in a timely fashion, zookeeper will remove the lock and notify the tablet server. If the tablet server does not receive a message from zookeeper, it will assume its lock has been lost, too. If a tablet server loses its lock, it kills itself: everything assumes it is dead already.

A: Investigate why the tablet server did not send a timely message to zookeeper.

18.4.1. Keeping the tablet server lock

Q: My tablet server lost its lock. Why?

The primary reason a tablet server loses its lock is that it has been pushed into swap.

A large java program (like the tablet server) may have a large portion of its memory image unused. The operation system will favor pushing this allocated, but unused memory into swap so that the memory can be re-used as a disk buffer. When the java virtual machine decides to access this memory, the OS will begin flushing disk buffers to return that memory to the VM. This can cause the entire process to block long enough for the zookeeper lock to be lost.

A: Configure your system to reduce the kernel parameter swappiness from the default (60) to zero.

Q: My tablet server lost its lock, and I have already set swappiness to zero. Why?

Be careful not to over-subscribe memory. This can be easy to do if your accumulo processes run on the same nodes as hadoop’s map-reduce framework. Remember to add up:

• size of the JVM for the tablet server

• size of the in-memory map, if using the native map implementation

• size of the JVM for the data node

• size of the JVM for the task tracker

• size of the JVM times the maximum number of mappers and reducers

• size of the kernel and any support processes

If a 16G node can run 2 mappers and 2 reducers, and each can be 2G, then there is only 8G for the data node, tserver, task tracker and OS.

A: Reduce the memory footprint of each component until it fits comfortably.

Q: My tablet server lost its lock, swappiness is zero, and my node has lots of unused memory!

The JVM memory garbage collector may fall behind and cause a "stop-the-world" garbage collection. On a large memory virtual machine, this collection can take a long time. This happens more frequently when the JVM is getting low on free memory. Check the logs of the tablet server. You will see lines like this:

2013-06-20 13:43:20,607 [tabletserver.TabletServer] DEBUG: gc ParNew=0.00(+0.00) secs
ConcurrentMarkSweep=0.00(+0.00) secs freemem=1,868,325,952(+1,868,325,952) totalmem=2,040,135,680

When freemem becomes small relative to the amount of memory needed, the JVM will spend more time finding free memory than performing work. This can cause long delays in sending keep-alive messages to zookeeper.

A: Ensure the tablet server JVM is not running low on memory.

18.5. Tools

The accumulo script can be used to run classes from the command line. This section shows how a few of the utilities work, but there are many more.

There’s a class that will examine an accumulo storage file and print out basic metadata.

$./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo /accumulo/tables/1/default_tablet/A000000n.rf 2013-07-16 08:17:14,778 [util.NativeCodeLoader] INFO : Loaded the native-hadoop library Locality group : <DEFAULT> Start block : 0 Num blocks : 1 Index level 0 : 62 bytes 1 blocks First key : 288be9ab4052fe9e span:34078a86a723e5d3:3da450f02108ced5 [] 1373373521623 false Last key : start:13fc375709e id:615f5ee2dd822d7a [] 1373373821660 false Num entries : 466 Column families : [waitForCommits, start, md major compactor 1, md major compactor 2, md major compactor 3, bringOnline, prep, md major compactor 4, md major compactor 5, md root major compactor 3, minorCompaction, wal, compactFiles, md root major compactor 4, md root major compactor 1, md root major compactor 2, compact, id, client:update, span, update, commit, write, majorCompaction] Meta block : BCFile.index Raw size : 4 bytes Compressed size : 12 bytes Compression type : gz Meta block : RFile.index Raw size : 780 bytes Compressed size : 344 bytes Compression type : gz When trying to diagnose problems related to key size, the PrintInfo tool can provide a histogram of the individual key sizes: $ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo --histogram /accumulo/tables/1/default_tablet/A000000n.rf
...
Up to size      count      %-age
10 :        222  28.23%
100 :        244  71.77%
1000 :          0   0.00%
10000 :          0   0.00%
100000 :          0   0.00%
1000000 :          0   0.00%
10000000 :          0   0.00%
100000000 :          0   0.00%
1000000000 :          0   0.00%
10000000000 :          0   0.00%

Likewise, PrintInfo will dump the key-value pairs and show you the contents of the RFile:

$./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo --dump /accumulo/tables/1/default_tablet/A000000n.rf row columnFamily:columnQualifier [visibility] timestamp deleteFlag -> Value ... Q: Accumulo is not showing me any data! A: Do you have your auths set so that it matches your visibilities? Q: What are my visibilities? A: Use PrintInfo on a representative file to get some idea of the visibilities in the underlying data. Note that the use of PrintInfo is an administrative tool and can only by used by someone who can access the underlying Accumulo data. It does not provide the normal access controls in Accumulo. If you would like to backup, or otherwise examine the contents of Zookeeper, there are commands to dump and load to/from XML. $ ./bin/accumulo org.apache.accumulo.server.util.DumpZookeeper --root /accumulo >dump.xml
$./bin/accumulo org.apache.accumulo.server.util.RestoreZookeeper --overwrite < dump.xml Q: How can I get the information in the monitor page for my cluster monitoring system? A: Use GetMasterStats: $ ./bin/accumulo org.apache.accumulo.test.GetMasterStats | grep Load
OS Load Average: 0.27

Q: The monitor page is showing an offline tablet. How can I find out which tablet it is?

A: Use FindOfflineTablets:

$./bin/accumulo org.apache.accumulo.server.util.FindOfflineTablets 2<<@(null,null,localhost:9997) is UNASSIGNED #walogs:2 Here’s what the output means: 2<< This is the tablet from (-inf, +inf) for the table with id 2. The command tables -l in the shell will show table ids for tables. @(null, null, localhost:9997) Location information. The format is @(assigned, hosted, last). In this case, the tablet has not been assigned, is not hosted anywhere, and was once hosted on localhost. #walogs:2 The number of write-ahead logs that this tablet requires for recovery. An unassigned tablet with write-ahead logs is probably waiting for logs to be sorted for efficient recovery. Q: How can I be sure that the metadata tables are up and consistent? A: CheckForMetadataProblems will verify the start/end of every tablet matches, and the start and stop for the table is empty: $ ./bin/accumulo org.apache.accumulo.server.util.CheckForMetadataProblems -u root --password
All is well for table !0
All is well for table 1

Q: My hadoop cluster has lost a file due to a NameNode failure. How can I remove the file?

A: There’s a utility that will check every file reference and ensure that the file exists in HDFS. Optionally, it will remove the reference:

$./bin/accumulo org.apache.accumulo.server.util.RemoveEntriesForMissingFiles -u root --password Enter the connection password: 2013-07-16 13:10:57,293 [util.RemoveEntriesForMissingFiles] INFO : File /accumulo/tables/2/default_tablet/F0000005.rf is missing 2013-07-16 13:10:57,296 [util.RemoveEntriesForMissingFiles] INFO : 1 files of 3 missing Q: I have many entries in zookeeper for old instances I no longer need. How can I remove them? A: Use CleanZookeeper: $ ./bin/accumulo org.apache.accumulo.server.util.CleanZookeeper

This command will not delete the instance pointed to by the local conf/accumulo-site.xml file.

Q: I need to decommission a node. How do I stop the tablet server on it?

$./bin/accumulo admin stop hostname:9997 2013-07-16 13:15:38,403 [util.Admin] INFO : Stopping server 12.34.56.78:9997 Q: I cannot login to a tablet server host, and the tablet server will not shut down. How can I kill the server? A: Sometimes you can kill a "stuck" tablet server by deleting its lock in zookeeper: $ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks --list
127.0.0.1:9997 TSERV_CLIENT=127.0.0.1:9997
$./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -delete 127.0.0.1:9997$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -list
127.0.0.1:9997             null

You can find the master and instance id for any accumulo instances using the same zookeeper instance:

$./bin/accumulo org.apache.accumulo.server.util.ListInstances INFO : Using ZooKeepers localhost:2181 Instance Name | Instance ID | Master ---------------------+--------------------------------------+------------------------------- "test" | 6140b72e-edd8-4126-b2f5-e74a8bbe323b | 127.0.0.1:9999 18.6. System Metadata Tables Accumulo tracks information about tables in metadata tables. The metadata for most tables is contained within the metadata table in the accumulo namespace, while metadata for that table is contained in the root table in the accumulo namespace. The root table is composed of a single tablet, which does not split, so it is also called the root tablet. Information about the root table, such as its location and write-ahead logs, are stored in ZooKeeper. Let’s create a table and put some data into it: shell> createtable test shell> tables -l accumulo.metadata => !0 accumulo.root => +r test => 2 trace => 1 shell> insert a b c d shell> flush -w Now let’s take a look at the metadata for this table: shell> table accumulo.metadata shell> scan -b 3; -e 3< 3< file:/default_tablet/F000009y.rf [] 186,1 3< last:13fe86cd27101e5 [] 127.0.0.1:9997 3< loc:13fe86cd27101e5 [] 127.0.0.1:9997 3< log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6 3< srv:dir [] /default_tablet 3< srv:flush [] 1 3< srv:lock [] tservers/127.0.0.1:9997/zlock-0000000001$13fe86cd27101e5
3< srv:time []    M1373998392323
3< ~tab:~pr []    \x00

Let’s decode this little session:

scan -b 3; -e 3<

Every tablet gets its own row. Every row starts with the table id followed by ; or <, and followed by the end row split point for that tablet.

file:/default_tablet/F000009y.rf [] 186,1

File entry for this tablet. This tablet contains a single file reference. The file is /accumulo/tables/3/default_tablet/F000009y.rf. It contains 1 key/value pair, and is 186 bytes long.

last:13fe86cd27101e5 [] 127.0.0.1:9997

Last location for this tablet. It was last held on 127.0.0.1:9997, and the unique tablet server lock data was 13fe86cd27101e5. The default balancer will tend to put tablets back on their last location.

loc:13fe86cd27101e5 [] 127.0.0.1:9997

The current location of this tablet.

log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0. …​

This tablet has a reference to a single write-ahead log. This file can be found in /accumulo/wal/127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995. The value of this entry could refer to multiple files. This tablet’s data is encoded as 6 within the log.

srv:dir [] /default_tablet

Files written for this tablet will be placed into /accumulo/tables/3/default_tablet.

srv:flush [] 1

Flush id. This table has successfully completed the flush with the id of 1.

srv:lock [] tservers/127.0.0.1:9997/zlock-0000000001\$13fe86cd27101e5 This is the lock information for the tablet holding the present lock. This information is checked against zookeeper whenever this is updated, which prevents a metadata update from a tablet server that no longer holds its lock. srv:time [] M1373998392323 This indicates the time time type (M for milliseconds or L for logical) and the timestamp of the most recently written key in this tablet. It is used to ensure automatically assigned key timestamps are strictly increasing for the tablet, regardless of the tablet server’s system time. ~tab:~pr [] \x00 The end-row marker for the previous tablet (prev-row). The first byte indicates the presence of a prev-row. This tablet has the range (-inf, +inf), so it has no prev-row (or end row). Besides these columns, you may see: rowId future:zooKeeperID location Tablet has been assigned to a tablet, but not yet loaded. ~del:filename When a tablet server is done use a file, it will create a delete marker in the appropriate metadata table, unassociated with any tablet. The garbage collector will remove the marker, and the file, when no other reference to the file exists. ~blip:txid Bulk-Load In Progress marker. rowId loaded:filename A file has been bulk-loaded into this tablet, however the bulk load has not yet completed on other tablets, so this marker prevents the file from being loaded multiple times. rowId !cloned A marker that indicates that this tablet has been successfully cloned. rowId splitRatio:ratio A marker that indicates a split is in progress, and the files are being split at the given ratio. rowId chopped A marker that indicates that the files in the tablet do not contain keys outside the range of the tablet. rowId scan A marker that prevents a file from being removed while there are still active scans using it. 18.7. Simple System Recovery Q: One of my Accumulo processes died. How do I bring it back? The easiest way to bring all services online for an Accumulo instance is to run the start-all.sh script. $ bin/start-all.sh

This process will check the process listing, using jps on each host before attempting to restart a service on the given host. Typically, this check is sufficient except in the face of a hung/zombie process. For large clusters, it may be undesirable to ssh to every node in the cluster to ensure that all hosts are running the appropriate processes and start-here.sh may be of use.

$ssh host_with_dead_process$ bin/start-here.sh

start-here.sh should be invoked on the host which is missing a given process. Like start-all.sh, it will start all necessary processes that are not currently running, but only on the current host and not cluster-wide. Tools such as pssh or pdsh can be used to automate this process.

start-server.sh can also be used to start a process on a given host; however, it is not generally recommended for users to issue this directly as the start-all.sh and start-here.sh scripts provide the same functionality with more automation and are less prone to user error.

A: Use start-all.sh or start-here.sh.

Q: My process died again. Should I restart it via cron or tools like supervisord?

A: A repeatedly dying Accumulo process is a sign of a larger problem. Typically these problems are due to a misconfiguration of Accumulo or over-saturation of resources. Blind automation of any service restart inside of Accumulo is generally an undesirable situation as it is indicative of a problem that is being masked and ignored. Accumulo processes should be stable on the order of months and not require frequent restart.

18.8.1. HDFS Failure

Q: I had disasterous HDFS failure. After bringing everything back up, several tablets refuse to go online.

Data written to tablets is written into memory before being written into indexed files. In case the server is lost before the data is saved into a an indexed file, all data stored in memory is first written into a write-ahead log (WAL). When a tablet is re-assigned to a new tablet server, the write-ahead logs are read to recover any mutations that were in memory when the tablet was last hosted.

If a write-ahead log cannot be read, then the tablet is not re-assigned. All it takes is for one of the blocks in the write-ahead log to be missing. This is unlikely unless multiple data nodes in HDFS have been lost.

A: Get the WAL files online and healthy. Restore any data nodes that may be down.

Q: How do find out which tablets are offline?

A: Use accumulo admin checkTablets

$bin/accumulo admin checkTablets Q: I lost three data nodes, and I’m missing blocks in a WAL. I don’t care about data loss, how can I get those tablets online? See the discussion in System Metadata Tables, which shows a typical metadata table listing. The entries with a column family of log are references to the WAL for that tablet. If you know what WAL is bad, you can find all the references with a grep in the shell: shell> grep 0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 3< log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6 A: You can remove the WAL references in the metadata table. shell> grant -u root Table.WRITE -t accumulo.metadata shell> delete 3< log 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 Note: the colon (:) is omitted when specifying the row cf cq for the delete command. The master will automatically discover the tablet no longer has a bad WAL reference and will assign the tablet. You will need to remove the reference from all the tablets to get them online. Q: The metadata (or root) table has references to a corrupt WAL. This is a much more serious state, since losing updates to the metadata table will result in references to old files which may not exist, or lost references to new files, resulting in tablets that cannot be read, or large amounts of data loss. The best hope is to restore the WAL by fixing HDFS data nodes and bringing the data back online. If this is not possible, the best approach is to re-create the instance and bulk import all files from the old instance into a new tables. A complete set of instructions for doing this is outside the scope of this guide, but the basic approach is: • Use tables -l in the shell to discover the table name to table id mapping • Stop all accumulo processes on all nodes • Move the accumulo directory in HDFS out of the way:$ hadoop fs -mv /accumulo /corrupt

• Re-initalize accumulo

• Recreate tables, users and permissions

• Import the directories under /corrupt/tables/<id> into the new instance

Q: One or more HDFS Files under /accumulo/tables are corrupt

Accumulo maintains multiple references into the tablet files in the metadata tables and within the tablet server hosting the file, this makes it difficult to reliably just remove those references.

The directory structure in HDFS for tables will follow the general structure:

/accumulo
/accumulo/tables/
/accumulo/tables/!0
/accumulo/tables/!0/default_tablet/A000001.rf
/accumulo/tables/!0/t-00001/A000002.rf
/accumulo/tables/1
/accumulo/tables/1/default_tablet/A000003.rf
/accumulo/tables/1/t-00001/A000004.rf
/accumulo/tables/1/t-00001/A000005.rf
/accumulo/tables/2/default_tablet/A000006.rf
/accumulo/tables/2/t-00001/A000007.rf

If files under /accumulo/tables are corrupt, the best course of action is to recover those files in hdsf see the section on HDFS. Once these recovery efforts have been exhausted, the next step depends on where the missing file(s) are located. Different actions are required when the bad files are in Accumulo data table files or if they are metadata table files.

Data File Corruption

When an Accumulo data file is corrupt, the most reliable way to restore Accumulo operations is to replace the missing file with an “empty” file so that references to the file in the METADATA table and within the tablet server hosting the file can be resolved by Accumulo. An empty file can be created using the CreateEmpty utiity:

$accumulo org.apache.accumulo.core.file.rfile.CreateEmpty /path/to/empty/file/empty.rf The process is to delete the corrupt file and then move the empty file into its place (The generated empty file can be copied and used multiple times if necessary and does not need to be regenerated each time) $ hadoop fs –rm /accumulo/tables/corrupt/file/thename.rf; \
hadoop fs -mv /path/to/empty/file/empty.rf /accumulo/tables/corrupt/file/thename.rf

If the corrupt files are metadata files, see System Metadata Tables (under the path /accumulo/tables/!0) then you will need to rebuild the metadata table by initializing a new instance of Accumulo and then importing all of the existing data into the new instance. This is the same procedure as recovering from a zookeeper failure (see ZooKeeper Failure), except that you will have the benefit of having the existing user and table authorizations that are maintained in zookeeper.

You can use the DumpZookeeper utility to save this information for reference before creating the new instance. You will not be able to use RestoreZookeeper because the table names and references are likely to be different between the original and the new instances, but it can serve as a reference.

A: If the files cannot be recovered, replace corrupt data files with a empty rfiles to allow references in the metadata table and in the tablet servers to be resolved. Rebuild the metadata table if the corrupt files are metadata files.

18.8.2. ZooKeeper Failure

Q: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance?

ZooKeeper, in addition to its lock-service capabilities, also serves to bootstrap an Accumulo instance from some location in HDFS. It contains the pointers to the root tablet in HDFS which is then used to load the Accumulo metadata tablets, which then loads all user tables. ZooKeeper also stores all namespace and table configuration, the user database, the mapping of table IDs to table names, and more across Accumulo restarts.

Presently, the only way to recover such an instance is to initialize a new instance and import all of the old data into the new instance. The easiest way to tackle this problem is to first recreate the mapping of table ID to table name and then recreate each of those tables in the new instance. Set any necessary configuration on the new tables and add some split points to the tables to close the gap between how many splits the old table had and no splits.

The directory structure in HDFS for tables will follow the general structure:

/accumulo
/accumulo/tables/
/accumulo/tables/1
/accumulo/tables/1/default_tablet/A000001.rf
/accumulo/tables/1/t-00001/A000002.rf
/accumulo/tables/1/t-00001/A000003.rf
/accumulo/tables/2/default_tablet/A000004.rf
/accumulo/tables/2/t-00001/A000005.rf

For each table, make a new directory that you can move (or copy if you have the HDFS space to do so) all of the rfiles for a given table into. For example, to process the table with an ID of 1, make a new directory, say /new-table-1 and then copy all files from /accumulo/tables/1/*/*.rf into that directory. Additionally, make a directory, /new-table-1-failures, for any failures during the import process. Then, issue the import command using the Accumulo shell into the new table, telling Accumulo to not re-set the timestamp:

user@instance new_table> importdirectory /new-table-1 /new-table-1-failures false

Any RFiles which were failed to be loaded will be placed in /new-table-1-failures. Rfiles that were successfully imported will no longer exist in /new-table-1. For failures, move them back to the import directory and retry the importdirectory command.

It is extremely important to note that this approach may introduce stale data back into the tables. For a few reasons, RFiles may exist in the table directory which are candidates for deletion but have not yet been deleted. Additionally, deleted data which was not compacted away, but still exists in write-ahead logs if the original instance was somehow recoverable, will be re-introduced in the new instance. Table splits and merges (which also include the deleteRows API call on TableOperations, are also vulnerable to this problem. This process should not be used if these are unacceptable risks. It is possible to try to re-create a view of the accumulo.metadata table to prune out files that are candidates for deletion, but this is a difficult task that also may not be entirely accurate.

Likewise, it is also possible that data loss may occur from write-ahead log (WAL) files which existed on the old table but were not minor-compacted into an RFile. Again, it may be possible to reconstruct the state of these WAL files to replay data not yet in an RFile; however, this is a difficult task and is not implemented in any automated fashion.

A: The importdirectory shell command can be used to import RFiles from the old instance into a newly created instance, but extreme care should go into the decision to do this as it may result in reintroduction of stale data or the omission of new data.

Q: I upgraded from 1.4 to 1.5 to 1.6 but still have some WAL files on local disk. Do I have any way to recover them?

A: Yes, you can recover them by running the LocalWALRecovery utility on each node that needs recovery performed. The utility will default to using the directory specified by logger.dir.walog in your configuration, or can be overriden by using the --local-wal-directories option on the tool. It can be invoked as follows:

$ACCUMULO_HOME/bin/accumulo org.apache.accumulo.tserver.log.LocalWALRecovery 18.10. File Naming Conventions Q: Why are files named like they are? Why do some start with C and others with F? A: The file names give you a basic idea for the source of the file. The base of the filename is a base-36 unique number. All filenames in accumulo are coordinated with a counter in zookeeper, so they are always unique, which is useful for debugging. The leading letter gives you an idea of how the file was created: F Flush: entries in memory were written to a file (Minor Compaction) M Merging compaction: entries in memory were combined with the smallest file to create one new file C Several files, but not all files, were combined to produce this file (Major Compaction) A All files were compacted, delete entries were dropped I Bulk import, complete, sorted index files. Always in a directory starting with b- This simple file naming convention allows you to see the basic structure of the files from just their filenames, and reason about what should be happening to them next, just by scanning their entries in the metadata tables. For example, if you see multiple files with M prefixes, the tablet is, or was, up against its maximum file limit, so it began merging memory updates with files to keep the file count reasonable. This slows down ingest performance, so knowing there are many files like this tells you that the system is struggling to keep up with ingest vs the compaction strategy which reduces the number of files. Appendix A: Configuration Management A.1. Configuration Overview All accumulo properties have a default value in the source code. Properties can also be set in accumulo-site.xml and in zookeeper on per-table or system-wide basis. If properties are set in more than one location, accumulo will choose the property with the highest precedence. This order of precedence is described below (from highest to lowest): A.1.1. Zookeeper table properties Table properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. While table properties take precedent over system properties, both will override properties set in accumulo-site.xml Table properties consist of all properties with the table.* prefix. Table properties are configured on a per-table basis using the following shell commmand: config -t TABLE -s PROPERTY=VALUE A.1.2. Zookeeper system properties System properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. System properties consist of all properties with a yes in the Zookeeper Mutable column in the table below. They are set with the following shell command: config -s PROPERTY=VALUE If a table.* property is set using this method, the value will apply to all tables except those configured on per-table basis (which have higher precedence). While most system properties take effect immediately, some require a restart of the process which is indicated in Zookeeper Mutable. A.1.3. accumulo-site.xml Accumulo processes (master, tserver, etc) read their local accumulo-site.xml on start up. Therefore, changes made to accumulo-site.xml must rsynced across the cluster and processes must be restarted to apply changes. Certain properties (indicated by a no in Zookeeper Mutable) cannot be set in zookeeper and only set in this file. The accumulo-site.xml also allows you to configure tablet servers with different settings. A.1.4. Default Values All properties have a default value in the source code. This value has the lowest precedence and is overriden if set in accumulo-site.xml or zookeeper. While the default value is usually optimal, there are cases where a change can increase query and ingest performance. A.1.5. ZooKeeper Property Considerations Any properties that are stored in ZooKeeper should consider the limitations of ZooKeeper itself with respect to the number of nodes and the size of the node data. Custom table properties and options for Iterators configured on tables are two areas in which there aren’t any failsafes built into the API that can prevent the user from making this mistake. While these properties have the ability to add some much needed dynamic configuration tools, use cases which might fall into these warnings should be reconsidered. A.2. Configuration in the Shell The config command in the shell allows you to view the current system configuration. You can also use the -t option to view a table’s configuration as below: $ ./bin/accumulo shell -u root
Enter current password for 'root'@'accumulo-instance': ******

Shell - Apache Accumulo Interactive Shell
-
- version: 1.6.0
- instance name: accumulo-instance
- instance id: 4f48fa03-f692-43ce-ae03-94c9ea8b7181
-
- type 'help' for a list of available commands
-
root@accumulo-instance> config -t foo
---------+---------------------------------------------+------------------------------------------------------
SCOPE    | NAME                                        | VALUE
---------+---------------------------------------------+------------------------------------------------------
default  | table.balancer ............................ | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
default  | table.bloom.enabled ....................... | false
default  | table.bloom.error.rate .................... | 0.5%
default  | table.bloom.hash.type ..................... | murmur
default  | table.bloom.key.functor ................... | org.apache.accumulo.core.file.keyfunctor.RowFunctor
default  | table.bloom.load.threshold ................ | 1
default  | table.bloom.size .......................... | 1048576
default  | table.cache.block.enable .................. | false
default  | table.cache.index.enable .................. | false
default  | table.compaction.major.everything.at ...... | 19700101000000GMT
default  | table.compaction.major.everything.idle .... | 1h
default  | table.compaction.major.ratio .............. | 1.3
site     |    @override .............................. | 1.4
system   |    @override .............................. | 1.5
table    |    @override .............................. | 1.6
default  | table.compaction.minor.idle ............... | 5m
default  | table.compaction.minor.logs.threshold ..... | 3
default  | table.failures.ignore ..................... | false

A.3. Available Properties

A.3.1. rpc.*

Properties in this category related to the configuration of SSL keys for RPC. See also instance.ssl.enabled

rpc.javax.net.ssl.keyStore

Path of the keystore file for the servers' private SSL key

Type: PATH
Zookeeper Mutable: no
Default Value: $ACCUMULO_CONF_DIR/ssl/keystore.jks rpc.javax.net.ssl.keyStorePassword Password used to encrypt the SSL private keystore. Leave blank to use the Accumulo instance secret Type: STRING Zookeeper Mutable: no Default Value: empty rpc.javax.net.ssl.keyStoreType Type of SSL keystore Type: STRING Zookeeper Mutable: no Default Value: jks rpc.javax.net.ssl.trustStore Path of the truststore file for the root cert Type: PATH Zookeeper Mutable: no Default Value: $ACCUMULO_CONF_DIR/ssl/truststore.jks

Password used to encrypt the SSL truststore. Leave blank to use no password

Type: STRING
Zookeeper Mutable: no
Default Value: empty

rpc.javax.net.ssl.trustStoreType

Type of SSL truststore

Type: STRING
Zookeeper Mutable: no
Default Value: jks

rpc.sasl.qop

The quality of protection to be used with SASL. Valid values are auth, auth-int, and auth-conf

Type: STRING
Zookeeper Mutable: no
Default Value: auth

rpc.ssl.cipher.suites

Comma separated list of cipher suites that can be used by accepted connections

Type: STRING
Zookeeper Mutable: no
Default Value: empty

rpc.ssl.client.protocol

The protocol used to connect to a secure server, must be in the list of enabled protocols on the server side (rpc.ssl.server.enabled.protocols)

Type: STRING
Zookeeper Mutable: no
Default Value: TLSv1

rpc.ssl.server.enabled.protocols

Comma separated list of protocols that can be used to accept connections

Type: STRING
Zookeeper Mutable: no
Default Value: TLSv1,TLSv1.1,TLSv1.2

rpc.useJsse

Use JSSE system properties to configure SSL rather than the rpc.javax.net.ssl.* Accumulo properties

Type: BOOLEAN
Zookeeper Mutable: no
Default Value: false

A.3.2. instance.*

Properties in this category must be consistent throughout a cloud. This is enforced and servers won’t be able to communicate if these differ.

instance.dfs.dir

Deprecated. HDFS directory in which accumulo instance will run. Do not change after accumulo is initialized.

Type: ABSOLUTEPATH
Zookeeper Mutable: no
Default Value: /accumulo

instance.dfs.uri

Deprecated. A url accumulo should use to connect to DFS. If this is empty, accumulo will obtain this information from the hadoop configuration. This property will only be used when creating new files if instance.volumes is empty. After an upgrade to 1.6.0 Accumulo will start using absolute paths to reference files. Files created before a 1.6.0 upgrade are referenced via relative paths. Relative paths will always be resolved using this config (if empty using the hadoop config).

Type: URI
Zookeeper Mutable: no
Default Value: empty

instance.rpc.sasl.allowed.host.impersonation

One-line configuration property controlling the network locations (hostnames) that are allowed to impersonate other users

Type: STRING
Zookeeper Mutable: no
Default Value: empty

instance.rpc.sasl.allowed.user.impersonation

One-line configuration property controlling what users are allowed to impersonate other users

Type: STRING
Zookeeper Mutable: no
Default Value: empty

instance.rpc.sasl.enabled

Configures Thrift RPCs to require SASL with GSSAPI which supports Kerberos authentication. Mutually exclusive with SSL RPC configuration.

Type: BOOLEAN
Zookeeper Mutable: no
Default Value: false

instance.rpc.ssl.clientAuth

Require clients to present certs signed by a trusted root

Type: BOOLEAN
Zookeeper Mutable: no
Default Value: false

instance.rpc.ssl.enabled

Use SSL for socket connections from clients and among accumulo services. Mutually exclusive with SASL RPC configuration.

Type: BOOLEAN
Zookeeper Mutable: no
Default Value: false

instance.secret

A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo accumulo.server.util.ChangeSecret [oldpasswd] [newpasswd], and then update conf/accumulo-site.xml everywhere.

Type: STRING
Zookeeper Mutable: no
Default Value: DEFAULT

instance.security.authenticator

The authenticator class that accumulo will use to determine if a user has privilege to perform an action

Type: CLASSNAME
Zookeeper Mutable: no
Default Value: org.apache.accumulo.server.security.handler.ZKAuthenticator

instance.security.authorizor

The authorizor class that accumulo will use to determine what labels a user has privilege to see

Type: CLASSNAME
Zookeeper Mutable: no
Default Value: org.apache.accumulo.server.security.handler.ZKAuthorizor

instance.security.permissionHandler

The permission handler class that accumulo will use to determine if a user has privilege to perform an action

Type: CLASSNAME
Zookeeper Mutable: no
Default Value: org.apache.accumulo.server.security.handler.ZKPermHandler

instance.volumes

A comma seperated list of dfs uris to use. Files will be stored across these filesystems. If this is empty, then instance.dfs.uri will be used. After adding uris to this list, run accumulo init --add-volume and then restart tservers. If entries are removed from this list then tservers will need to be restarted. After a uri is removed from the list Accumulo will not create new files in that location, however Accumulo can still reference files created at that location before the config change. To use a comma or other reserved characters in a URI use standard URI hex encoding. For example replace commas with %2C.

Type: STRING
Zookeeper Mutable: no
Default Value: empty

instance.volumes.replacements

Since accumulo stores absolute URIs changing the location of a namenode could prevent Accumulo from starting. The property helps deal with that situation. Provide a comma separated list of uri replacement pairs here if a namenode location changes. Each pair shold be separated with a space. For example, if hdfs://nn1 was replaced with hdfs://nnA and hdfs://nn2 was replaced with hdfs://nnB, then set this property to hdfs://nn1 hdfs://nnA,hdfs://nn2 hdfs://nnB Replacements must be configured for use. To see which volumes are currently in use, run accumulo admin volumes -l. To use a comma or other reserved characters in a URI use standard URI hex encoding. For example replace commas with %2C.

Type: STRING
Zookeeper Mutable: no
Default Value: empty

instance.zookeeper.host

Comma separated list of zookeeper servers

Type: HOSTLIST
Zookeeper Mutable: no
Default Value: localhost:2181

instance.zookeeper.timeout

Zookeeper session timeout; max value when represented as milliseconds should be no larger than 2147483647

Type: TIMEDURATION
Zookeeper Mutable: no
Default Value: 30s

A.3.3. instance.rpc.sasl.impersonation.* (Deprecated)

Deprecated. Prefix that allows configuration of users that are allowed to impersonate other users

A.3.4. general.*

Properties in this category affect the behavior of accumulo overall, but do not have to be consistent throughout a cloud.

general.classpaths

A list of all of the places to look for a class. Order does matter, as it will look for the jar starting in the first location to the last. Please note, hadoop conf and hadoop lib directories NEED to be here, along with accumulo lib and zookeeper directory. Supports full regex on filename alone.

Type: STRING
Zookeeper Mutable: no
Default Value:

$ACCUMULO_CONF_DIR,$ACCUMULO_HOME/lib/[^.].*.jar,
$ZOOKEEPER_HOME/zookeeper[^.].*.jar,$HADOOP_CONF_DIR,
$HADOOP_PREFIX/[^.].*.jar,$HADOOP_PREFIX/lib/(?!slf4j)[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/common/[^.].*.jar,$HADOOP_PREFIX/share/hadoop/common/lib/(?!slf4j)[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/hdfs/[^.].*.jar,$HADOOP_PREFIX/share/hadoop/mapreduce/[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/yarn/[^.].*.jar,$HADOOP_PREFIX/share/hadoop/yarn/lib/jersey.*.jar,
/usr/hdp/current/hive-client/lib/hive-accumulo-handler.jar
/usr/lib/hadoop-yarn/lib/jersey.*.jar,

The length of time that delegation tokens and secret keys are valid

Type: TIMEDURATION
Zookeeper Mutable: no
Default Value: 7d

general.delegation.token.update.interval

The length of time between generation of new secret keys

Type: TIMEDURATION
Zookeeper Mutable: no
Default Value: 1d

general.dynamic.classpaths

A list of all of the places where changes in jars or classes will force a reload of the classloader.

Type: STRING
Zookeeper Mutable: no

general.vfs.classpaths

Configuration for a system level vfs classloader. Accumulo jar can be configured here and loaded out of HDFS.

Type: STRING
Zookeeper Mutable: no
Default Value: empty

A.3.5. master.*

Properties in this category affect the behavior of the master server

The number of threads to use when moving user files to bulk ingest directories under accumulo control

Type: COUNT
Zookeeper Mutable: yes
Default Value: 20

master.bulk.retries

The number of attempts to bulk-load a file before giving up.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 3

The number of threads to use when coordinating a bulk-import.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 5

master.bulk.timeout

The time to wait for a tablet server to process a bulk import request

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5m

The number of threads used to run FAult-Tolerant Executions. These are primarily table operations like merge.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 4

master.lease.recovery.interval

The amount of time to wait after requesting a WAL file to be recovered

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5s

master.port.client

The port used for handling client connections on the master

Type: PORT
Zookeeper Mutable: yes but requires restart of the master
Default Value: 9999

master.recovery.delay

When a tablet server’s lock is deleted, it takes time for it to completely quit. This delay gives it time before log recoveries begin.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 10s

master.recovery.max.age

Recovery files older than this age will be removed.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 60m

master.recovery.time.max

The maximum time to attempt recovery before giving up

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 30m

Type: COUNT
Zookeeper Mutable: yes
Default Value: 4

master.replication.coordinator.port

Port for the replication coordinator service

Type: PORT
Zookeeper Mutable: yes
Default Value: 10001

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5s

master.replication.status.scan.interval

Amount of time to sleep before scanning the status section of the replication table for new data

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 30s

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 1s

The minimum number of threads to use to handle incoming requests.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 20

master.tablet.balancer

The balancer class that accumulo will use to make tablet assignment and migration decisions.

Type: CLASSNAME
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.server.master.balancer.TableLoadBalancer

master.walog.closer.implementation

A class that implements a mechansim to steal write access to a file

Type: CLASSNAME
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.server.master.recovery.HadoopLogCloser

A.3.6. tserver.*

Properties in this category affect the behavior of the tablet servers

tserver.archive.walogs

Keep copies of the WALOGs for debugging purposes

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

tserver.assignment.concurrent.max

The number of threads available to load tablets. Recoveries are still performed serially.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 2

tserver.assignment.duration.warning

The amount of time an assignment can run before the server will print a warning along with the current stack trace. Meant to help debug stuck assignments

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 10m

The number of concurrent threads that will load bloom filters in the background. Setting this to zero will make bloom filters load in the foreground.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 4

The master delegates bulk file processing and assignment to tablet servers. After the bulk file has been processed, the tablet server will assign the file to the appropriate tablets on all servers. This property controls the number of threads used to communicate to the other servers.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 1

The master will task a tablet server with pre-processing a bulk file prior to assigning it to the appropriate tablet servers. This configuration value controls the number of threads used to process the files.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 1

tserver.bulk.retry.max

The number of times the tablet server will attempt to assign a file to a tablet as it migrates and splits.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 5

tserver.bulk.timeout

The time to wait for a tablet server to process a bulk import request.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5m

tserver.cache.data.size

Specifies the size of the cache for file data blocks.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 128M

tserver.cache.index.size

Specifies the size of the cache for file indices.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 512M

tserver.client.timeout

Time to wait for clients to continue scans before closing a session.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 3s

tserver.compaction.major.concurrent.max

The maximum number of concurrent major compactions for a tablet server

Type: COUNT
Zookeeper Mutable: yes
Default Value: 3

tserver.compaction.major.delay

Time a tablet server will sleep between checking which tablets need compaction.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 30s

Max number of files a major compaction thread can open at once.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 10

tserver.compaction.major.trace.percent

The percent of major compactions to trace

Type: FRACTION
Zookeeper Mutable: yes
Default Value: 0.1

tserver.compaction.minor.concurrent.max

The maximum number of concurrent minor compactions for a tablet server

Type: COUNT
Zookeeper Mutable: yes
Default Value: 4

tserver.compaction.minor.trace.percent

The percent of minor compactions to trace

Type: FRACTION
Zookeeper Mutable: yes
Default Value: 0.1

tserver.compaction.warn.time

When a compaction has not made progress for this time period, a warning will be logged

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 10m

tserver.default.blocksize

Specifies a default blocksize for the tserver caches

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 1M

tserver.dir.memdump

A long running scan could possibly hold memory that has been minor compacted. To prevent this, the in memory map is dumped to a local file and the scan is switched to that local file. We can not switch to the minor compacted file because it may have been modified by iterators. The file dumped to the local dir is an exact copy of what was in memory.

Type: PATH
Zookeeper Mutable: yes
Default Value: /tmp

tserver.files.open.idle

Tablet servers leave previously used files open for future queries. This setting determines how much time an unused file should be kept open until it is closed.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 1m

tserver.hold.time.max

The maximum time for a tablet server to be in the "memory full" state. If the tablet server cannot write out memory in this much time, it will assume there is some failure local to its node, and quit. A value of zero is equivalent to forever.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5m

tserver.memory.manager

An implementation of MemoryManger that accumulo will use.

Type: CLASSNAME
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager

tserver.memory.maps.max

Maximum amount of memory that can be used to buffer data written to a tablet server. There are two other properties that can effectively limit memory usage table.compaction.minor.logs.threshold and tserver.walog.max.size. Ensure that table.compaction.minor.logs.threshold * tserver.walog.max.size >= this property.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 1G

tserver.memory.maps.native.enabled

An in-memory data store for accumulo implemented in c++ that increases the amount of data accumulo can hold in memory and avoids Java GC pauses.

Type: BOOLEAN
Zookeeper Mutable: yes but requires restart of the tserver
Default Value: true

Type: COUNT
Zookeeper Mutable: yes
Default Value: 8

tserver.migrations.concurrent.max

The maximum number of concurrent tablet migrations for a tablet server

Type: COUNT
Zookeeper Mutable: yes
Default Value: 1

tserver.monitor.fs

When enabled the tserver will monitor file systems and kill itself when one switches from rw to ro. This is usually and indication that Linux has detected a bad disk.

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: true

tserver.mutation.queue.max

Deprecated. This setting is deprecated. See tserver.total.mutation.queue.max. The amount of memory to use to store write-ahead-log mutations-per-session before flushing them. Since the buffer is per write session, consider the max number of concurrent writer when configuring. When using Hadoop 2, Accumulo will call hsync() on the WAL . For a small number of concurrent writers, increasing this buffer size decreases the frequncy of hsync calls. For a large number of concurrent writers a small buffers size is ok because of group commit.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 1M

tserver.port.client

The port used for handling client connections on the tablet servers

Type: PORT
Zookeeper Mutable: yes but requires restart of the tserver
Default Value: 9997

if the ports above are in use, search higher ports until one is available

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

The maximum number of concurrent read ahead that will execute. This effectively limits the number of long running scans that can run concurrently per tserver.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 16

tserver.recovery.concurrent.max

The maximum number of threads to use to sort logs during recovery

Type: COUNT
Zookeeper Mutable: yes
Default Value: 2

tserver.replication.batchwriter.replayer.memory

Memory to provide to batchwriter to replay mutations for replication

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 50M

tserver.replication.default.replayer

Default AccumuloReplicationReplayer implementation

Type: CLASSNAME
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.tserver.replication.BatchWriterReplicationReplayer

tserver.scan.files.open.max

Maximum total files that all tablets in a tablet server can open for scans.

Type: COUNT
Zookeeper Mutable: yes but requires restart of the tserver
Default Value: 100

tserver.server.message.size.max

The maximum size of a message that can be sent to a tablet server.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 1G

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 1s

The minimum number of threads to use to handle incoming requests.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 20

tserver.session.idle.max

When a tablet server’s SimpleTimer thread triggers to check idle sessions, this configurable option will be used to evaluate scan sessions to determine if they can be closed due to inactivity

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 1m

tserver.session.update.idle.max

When a tablet server’s SimpleTimer thread triggers to check idle sessions, this configurable option will be used to evaluate update sessions to determine if they can be closed due to inactivity

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 1m

tserver.sort.buffer.size

The amount of memory to use when sorting logs during recovery.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 200M

tserver.tablet.split.midpoint.files.max

To find a tablets split points, all index files are opened. This setting determines how many index files can be opened at once. When there are more index files than this setting multiple passes must be made, which is slower. However opening too many files at once can cause problems.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 30

tserver.total.mutation.queue.max

The amount of memory used to store write-ahead-log mutations before flushing them.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 50M

tserver.wal.blocksize

The size of the HDFS blocks used to write to the Write-Ahead log. If zero, it will be 110% of tserver.walog.max.size (that is, try to use just one block)

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 0

tserver.wal.replication

The replication to use when writing the Write-Ahead log to HDFS. If zero, it will use the HDFS default replication setting.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 0

tserver.wal.sync

Use the SYNC_BLOCK create flag to sync WAL writes to disk. Prevents problems recovering from sudden system resets.

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: true

tserver.wal.sync.method

Deprecated. This property is deprecated. Use table.durability instead.

Type: STRING
Zookeeper Mutable: yes
Default Value: hsync

tserver.walog.max.size

The maximum size for each write-ahead log. See comment for property tserver.memory.maps.max

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 1G

tserver.walog.maximum.wait.duration

The maximum amount of time to wait after a failure to create a WAL file.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5m

tserver.walog.tolerated.creation.failures

The maximum number of failures tolerated when creating a new WAL file within the period specified by tserver.walog.failures.period. Exceeding this number of failures in the period causes the TabletServer to exit.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 50

tserver.walog.tolerated.wait.increment

The amount of time to wait between failures to create a WALog.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 1000ms

The number of threads for the distributed work queue. These threads are used for copying failed bulk files.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 2

A.3.7. tserver.replication.replayer.*

Allows configuration of implementation used to apply replicated data

A.3.8. logger.*

Properties in this category affect the behavior of the write-ahead logger servers

logger.dir.walog

This property is only needed if Accumulo was upgraded from a 1.4 or earlier version. In the upgrade to 1.5 this property is used to copy any earlier write ahead logs into DFS. In 1.6+, this property is used by the LocalWALRecovery utility in the event that something went wrong with that earlier upgrade. It is possible to specify a comma-separated list of directories.

Type: PATH
Zookeeper Mutable: yes
Default Value: walogs

A.3.9. gc.*

Properties in this category affect the behavior of the accumulo garbage collector.

gc.cycle.delay

Time between garbage collection cycles. In each cycle, old files no longer in use are removed from the filesystem.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5m

gc.cycle.start

Time to wait before attempting to garbage collect any old files.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 30s

gc.file.archive

Archive any files/directories instead of moving to the HDFS trash or deleting

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

gc.port.client

The listening port for the garbage collector’s monitor service

Type: PORT
Zookeeper Mutable: yes but requires restart of the gc
Default Value: 50091

The number of threads used to delete files

Type: COUNT
Zookeeper Mutable: yes
Default Value: 16

gc.trace.percent

Percent of gc cycles to trace

Type: FRACTION
Zookeeper Mutable: yes
Default Value: 0.01

gc.trash.ignore

Do not use the Trash, even if it is configured

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

A.3.10. monitor.*

Properties in this category affect the behavior of the monitor web server.

monitor.banner.background

The background color of the banner text displayed on the monitor page.

Type: STRING
Zookeeper Mutable: yes
Default Value: #304065

monitor.banner.color

The color of the banner text displayed on the monitor page.

Type: STRING
Zookeeper Mutable: yes
Default Value: #c4c4c4

monitor.banner.text

The banner text displayed on the monitor page.

Type: STRING
Zookeeper Mutable: yes
Default Value: empty

monitor.lock.check.interval

The amount of time to sleep between checking for the Montior ZooKeeper lock

Type: TIMEDURATION
Zookeeper Mutable: no
Default Value: 5s

monitor.log.date.format

The SimpleDateFormat string used to configure the date shown on the Recent Logs monitor page

Type: STRING
Zookeeper Mutable: no
Default Value: yyyy/MM/dd HH:mm:ss,SSS

monitor.port.client

The listening port for the monitor’s http service

Type: PORT
Zookeeper Mutable: no
Default Value: 50095

monitor.port.log4j

The listening port for the monitor’s log4j logging collection.

Type: PORT
Zookeeper Mutable: no
Default Value: 4560

monitor.ssl.exclude.ciphers

A comma-separated list of disallowed SSL Ciphers, see mmonitor.ssl.include.ciphers to allow ciphers

Type: STRING
Zookeeper Mutable: no
Default Value: empty

monitor.ssl.include.ciphers

A comma-separated list of allows SSL Ciphers, see monitor.ssl.exclude.ciphers to disallow ciphers

Type: STRING
Zookeeper Mutable: no
Default Value: empty

monitor.ssl.include.protocols

A comma-separate list of allowed SSL protocols

Type: STRING
Zookeeper Mutable: no
Default Value: TLSv1,TLSv1.1,TLSv1.2

monitor.ssl.keyStore

The keystore for enabling monitor SSL.

Type: PATH
Zookeeper Mutable: no
Default Value: empty

The keystore password for enabling monitor SSL.

Type: STRING
Zookeeper Mutable: no
Default Value: empty

monitor.ssl.keyStoreType

Type of SSL keystore

Type: STRING
Zookeeper Mutable: no
Default Value: empty

monitor.ssl.trustStore

The truststore for enabling monitor SSL.

Type: PATH
Zookeeper Mutable: no
Default Value: empty

The truststore password for enabling monitor SSL.

Type: STRING
Zookeeper Mutable: no
Default Value: empty

monitor.ssl.trustStoreType

Type of SSL truststore

Type: STRING
Zookeeper Mutable: no
Default Value: empty

A.3.11. trace.*

Properties in this category affect the behavior of distributed tracing.

The password for the user used to store distributed traces

Type: STRING
Zookeeper Mutable: no
Default Value: secret

trace.port.client

The listening port for the trace server

Type: PORT
Zookeeper Mutable: no
Default Value: 12234

A list of span receiver classes to send trace spans

Type: CLASSNAMELIST
Zookeeper Mutable: no
Default Value: org.apache.accumulo.tracer.ZooTraceClient

trace.table

The name of the table to store distributed traces

Type: STRING
Zookeeper Mutable: no
Default Value: trace

trace.token.type

An AuthenticationToken type supported by the authorizer

Type: CLASSNAME
Zookeeper Mutable: no
Default Value: org.apache.accumulo.core.client.security.tokens.PasswordToken

trace.user

The name of the user to store distributed traces

Type: STRING
Zookeeper Mutable: no
Default Value: root

trace.zookeeper.path

The zookeeper node where tracers are registered

Type: STRING
Zookeeper Mutable: no
Default Value: /tracers

Prefix for span receiver configuration properties

A.3.13. trace.token.property.*

The prefix used to create a token for storing distributed traces. For each propetry required by trace.token.type, place this prefix in front of it.

A.3.14. table.*

Properties in this category affect tablet server treatment of tablets, but can be configured on a per-table basis. Setting these properties in the site file will override the default globally for all tables and not any specific table. However, both the default and the global setting can be overridden per table using the table operations API or in the shell, which sets the overridden value in zookeeper. Restarting accumulo tablet servers after setting these properties in the site file will cause the global setting to take effect. However, you must use the API or the shell to change properties in zookeeper that are set on a table.

table.balancer

This property can be set to allow the LoadBalanceByTable load balancer to change the called Load Balancer for this table

Type: STRING
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.server.master.balancer.DefaultLoadBalancer

table.bloom.enabled

Use bloom filters on this table.

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

table.bloom.error.rate

Bloom filter error rate.

Type: FRACTION
Zookeeper Mutable: yes
Default Value: 0.5%

table.bloom.hash.type

The bloom filter hash type

Type: STRING
Zookeeper Mutable: yes
Default Value: murmur

table.bloom.key.functor

A function that can transform the key prior to insertion and check of bloom filter. org.apache.accumulo.core.file.keyfunctor.RowFunctor,,org.apache.accumulo.core.file.keyfunctor.ColumnFamilyFunctor, and org.apache.accumulo.core.file.keyfunctor.ColumnQualifierFunctor are allowable values. One can extend any of the above mentioned classes to perform specialized parsing of the key.

Type: CLASSNAME
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.core.file.keyfunctor.RowFunctor

This number of seeks that would actually use a bloom filter must occur before a file’s bloom filter is loaded. Set this to zero to initiate loading of bloom filters when a file is opened.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 1

table.bloom.size

Bloom filter size, as number of keys.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 1048576

table.cache.block.enable

Determines whether file block cache is enabled.

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

table.cache.index.enable

Determines whether index cache is enabled.

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: true

table.classpath.context

Per table classpath context

Type: STRING
Zookeeper Mutable: yes
Default Value: empty

table.compaction.major.everything.idle

After a tablet has been idle (no mutations) for this time period it may have all of its files compacted into one. There is no guarantee an idle tablet will be compacted. Compactions of idle tablets are only started when regular compactions are not running. Idle compactions only take place for tablets that have one or more files.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 1h

table.compaction.major.ratio

minimum ratio of total input size to maximum input file size for running a major compactionWhen adjusting this property you may want to also adjust table.file.max. Want to avoid the situation where only merging minor compactions occur.

Type: FRACTION
Zookeeper Mutable: yes
Default Value: 3

table.compaction.minor.idle

After a tablet has been idle (no mutations) for this time period it may have its in-memory map flushed to disk in a minor compaction. There is no guarantee an idle tablet will be compacted.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 5m

table.compaction.minor.logs.threshold

When there are more than this many write-ahead logs against a tablet, it will be minor compacted. See comment for property tserver.memory.maps.max

Type: COUNT
Zookeeper Mutable: yes
Default Value: 3

table.durability

The durability used to write to the write-ahead log. Legal values are: none, which skips the write-ahead log; log, which sends the data to the write-ahead log, but does nothing to make it durable; flush, which pushes data to the file system; and sync, which ensures the data is written to disk.

Type: DURABILITY
Zookeeper Mutable: yes
Default Value: sync

table.failures.ignore

If you want queries for your table to hang or fail when data is missing from the system, then set this to false. When this set to true missing data will be reported but queries will still run possibly returning a subset of the data.

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

table.file.blocksize

Overrides the hadoop dfs.block.size setting so that files have better query performance. The maximum value for this is 2147483647

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 0B

table.file.compress.blocksize

Similar to the hadoop io.seqfile.compress.blocksize setting, so that files have better query performance. The maximum value for this is 2147483647. (This setting is the size threshold prior to compression, and applies even compression is disabled.)

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 100K

table.file.compress.blocksize.index

Determines how large index blocks can be in files that support multilevel indexes. The maximum value for this is 2147483647. (This setting is the size threshold prior to compression, and applies even compression is disabled.)

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 128K

table.file.compress.type

One of gz,lzo,none

Type: STRING
Zookeeper Mutable: yes
Default Value: gz

table.file.max

Determines the max # of files each tablet in a table can have. When adjusting this property you may want to consider adjusting table.compaction.major.ratio also. Setting this property to 0 will make it default to tserver.scan.files.open.max-1, this will prevent a tablet from having more files than can be opened. Setting this property low may throttle ingest and increase query performance.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 15

table.file.replication

Determines how many replicas to keep of a tables' files in HDFS. When this value is LTE 0, HDFS defaults are used.

Type: COUNT
Zookeeper Mutable: yes
Default Value: 0

table.file.type

Change the type of file a table writes

Type: STRING
Zookeeper Mutable: yes
Default Value: rf

table.formatter

The Formatter class to apply on results in the shell

Type: STRING
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.core.util.format.DefaultFormatter

table.groups.enabled

A comma separated list of locality group names to enable for this table.

Type: STRING
Zookeeper Mutable: yes
Default Value: empty

table.interepreter

The ScanInterpreter class to apply on scan arguments in the shell

Type: STRING
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.core.util.interpret.DefaultScanInterpreter

table.majc.compaction.strategy

A customizable major compaction strategy.

Type: CLASSNAME
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy

table.replication

Is replication enabled for the given table

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: false

table.scan.max.memory

The maximum amount of memory that will be used to cache results of a client query/scan. Once this limit is reached, the buffered data is sent to the client.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 512K

table.security.scan.visibility.default

The security label that will be assumed at scan time if an entry does not have a visibility set. Note: An empty security label is displayed as []. The scan results will show an empty visibility even if the visibility from this setting is applied to the entry. CAUTION: If a particular key has an empty security label AND its table’s default visibility is also empty, access will ALWAYS be granted for users with permission to that table. Additionally, if this field is changed, all existing data with an empty visibility label will be interpreted with the new label on the next scan.

Type: STRING
Zookeeper Mutable: yes
Default Value: empty

table.split.endrow.size.max

Maximum size of end row

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 10K

table.split.threshold

When combined size of files exceeds this amount a tablet is split.

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 1G

table.walog.enabled

Deprecated. This setting is deprecated. Use table.durability=none instead.

Type: BOOLEAN
Zookeeper Mutable: yes
Default Value: true

A.3.15. table.custom.*

Prefix to be used for user defined arbitrary properties.

A.3.16. table.constraint.*

Properties in this category are per-table properties that add constraints to a table. These properties start with the category prefix, followed by a number, and their values correspond to a fully qualified Java class that implements the Constraint interface. For example: table.constraint.1 = org.apache.accumulo.core.constraints.MyCustomConstraint and: table.constraint.2 = my.package.constraints.MySecondConstraint

A.3.17. table.iterator.*

Properties in this category specify iterators that are applied at various stages (scopes) of interaction with a table. These properties start with the category prefix, followed by a scope (minc, majc, scan, etc.), followed by a period, followed by a name, as in table.iterator.scan.vers, or table.iterator.scan.custom. The values for these properties are a number indicating the ordering in which it is applied, and a class name such as: table.iterator.scan.vers = 10,org.apache.accumulo.core.iterators.VersioningIterator These iterators can take options if additional properties are set that look like this property, but are suffixed with a period, followed by opt followed by another period, and a property name. For example, table.iterator.minc.vers.opt.maxVersions = 3

A.3.18. table.iterator.scan.*

Convenience prefix to find options for the scan iterator scope

A.3.19. table.iterator.minc.*

Convenience prefix to find options for the minc iterator scope

A.3.20. table.iterator.majc.*

Convenience prefix to find options for the majc iterator scope

A.3.21. table.group.*

Properties in this category are per-table properties that define locality groups in a table. These properties start with the category prefix, followed by a name, followed by a period, and followed by a property for that group. For example table.group.group1=x,y,z sets the column families for a group called group1. Once configured, group1 can be enabled by adding it to the list of groups in the table.groups.enabled property. Additional group options may be specified for a named group by setting table.group.<name>.opt.<key>=<value>.

A.3.22. table.majc.compaction.strategy.opts.*

Properties in this category are used to configure the compaction strategy.

A.3.23. table.replication.target.*

Enumerate a mapping of other systems which this table should replicate their data to. The key suffix is the identifying cluster name and the value is an identifier for a location on the target system, e.g. the ID of the table on the target to replicate to

A.3.24. general.vfs.context.classpath.*

Properties in this category are define a classpath. These properties start with the category prefix, followed by a context name. The value is a comma seperated list of URIs. Supports full regex on filename alone. For example, general.vfs.context.classpath.cx1=hdfs://nn1:9902/mylibdir/*.jar. You can enable post delegation for a context, which will load classes from the context first instead of the parent first. Do this by setting general.vfs.context.classpath.<name>.delegation=post, where <name> is your context nameIf delegation is not specified, it defaults to loading from parent classloader first.

A.3.25. replication.*

Properties in this category affect the replication of data to other Accumulo instances.

replication.driver.delay

Amount of time to wait before the replication work loop begins in the master.

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 0s

replication.max.unit.size

Maximum size of data to send in a replication message

Type: MEMORY
Zookeeper Mutable: yes
Default Value: 64M

replication.max.work.queue

Upper bound of the number of files queued for replication

Type: COUNT
Zookeeper Mutable: yes
Default Value: 1000

replication.name

Name of this cluster with respect to replication. Used to identify this instance from other peers

Type: STRING
Zookeeper Mutable: yes
Default Value: empty

replication.receipt.service.port

Listen port used by thrift service in tserver listening for replication

Type: PORT
Zookeeper Mutable: yes
Default Value: 10002

Minimum number of threads for replication

Type: COUNT
Zookeeper Mutable: yes
Default Value: 1

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 30s

replication.trace.percent

The sampling percentage to use for replication traces

Type: FRACTION
Zookeeper Mutable: yes
Default Value: 0.1

replication.work.assigner

Replication WorkAssigner implementation to use

Type: CLASSNAME
Zookeeper Mutable: yes
Default Value: org.apache.accumulo.master.replication.UnorderedWorkAssigner

replication.work.assignment.sleep

Amount of time to sleep between replication work assignment

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 30s

replication.work.attempts

Number of attempts to try to replicate some data before giving up and letting it naturally be retried later

Type: COUNT
Zookeeper Mutable: yes
Default Value: 10

replication.work.processor.delay

Amount of time to wait before first checking for replication work, not useful outside of tests

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 0s

replication.work.processor.period

Amount of time to wait before re-checking for replication work, not useful outside of tests

Type: TIMEDURATION
Zookeeper Mutable: yes
Default Value: 0s

Type: COUNT
Zookeeper Mutable: yes
Default Value: 4

A.3.26. replication.peer.*

Properties in this category control what systems data can be replicated to

A.3.27. replication.peer.user.*

The username to provide when authenticating with the given peer

The password to provide when authenticating with the given peer

A.3.29. replication.peer.keytab.*

The keytab to use when authenticating with the given peer

A.4. Property Types

A.4.1. duration

A non-negative integer optionally followed by a unit of time (whitespace disallowed), as in 30s. If no unit of time is specified, seconds are assumed. Valid units are ms, s, m, h for milliseconds, seconds, minutes, and hours. Examples of valid durations are 600, 30s, 45m, 30000ms, 3d, and 1h. Examples of invalid durations are 1w, 1h30m, 1s 200ms, ms, ', and 'a. Unless otherwise stated, the max value for the duration represented in milliseconds is 9223372036854775807

A.4.2. date/time

A date/time string in the format: YYYYMMDDhhmmssTTT where TTT is the 3 character time zone

A.4.3. memory

A positive integer optionally followed by a unit of memory (whitespace disallowed), as in 2G. If no unit is specified, bytes are assumed. Valid units are B, K, M, G, for bytes, kilobytes, megabytes, and gigabytes. Examples of valid memories are 1024, 20B, 100K, 1500M, 2G. Examples of invalid memories are 1M500K, 1M 2K, 1MB, 1.5G, 1,024K, ', and 'a. Unless otherwise stated, the max value for the memory represented in bytes is 9223372036854775807

A.4.4. host list

A comma-separated list of hostnames or ip addresses, with optional port numbers. Examples of valid host lists are localhost:2000,www.example.com,10.10.1.1:500 and localhost. Examples of invalid host lists are ', ':1000, and localhost:80000

A.4.5. port

An positive integer in the range 1024-65535, not already in use or specified elsewhere in the configuration

A.4.6. count

A non-negative integer in the range of 0-2147483647

A.4.7. fraction/percentage

A floating point number that represents either a fraction or, if suffixed with the % character, a percentage. Examples of valid fractions/percentages are 10, 1000%, 0.05, 5%, 0.2%, 0.0005. Examples of invalid fractions/percentages are ', '10 percent, Hulk Hogan

A.4.8. path

A string that represents a filesystem path, which can be either relative or absolute to some directory. The filesystem depends on the property. The following environment variables will be substituted: [ACCUMULO_HOME, ACCUMULO_CONF_DIR]

A.4.9. absolute path

An absolute filesystem path. The filesystem depends on the property. This is the same as path, but enforces that its root is explicitly specified.

A.4.10. java class

A fully qualified java class name representing a class on the classpath. An example is java.lang.String, rather than String

A.4.11. java class list

A list of fully qualified java class names representing classes on the classpath. An example is java.lang.String, rather than String

A.4.12. durability

One of none, log, flush or sync.

A.4.13. string

An arbitrary string of characters whose format is unspecified and interpreted based on the context of the property to which it applies.

A.4.14. boolean

Has a value of either true or false

A valid URI