Accumulo 2.x Documentation >> Getting started >> Quick Start
This quick start provides basic instructions for installing and running Accumulo. For detailed instructions, see the in-depth installation guide.
Consider using automated tools
If you are setting up Accumulo for testing or development, consider using the following tools:
- Uno sets up Accumulo on a single machine for development
- Muchos sets up Accumulo on a cluster (optionally launched in Amazon EC2)
If you are setting up Accumulo for a production environment, follow the instructions in this quick start.
tar xzf /path/to/accumulo-2.0.0-alpha-2-bin.tar.gz cd accumulo-2.0.0-alpha-2
There are four scripts in the
bin directory of the tarball distribution that are used
to manage Accumulo:
accumulo- Runs Accumulo command-line tools and starts Accumulo processes
accumulo-service- Runs Accumulo processes as services
accumulo-cluster- Manages Accumulo cluster on a single node or several nodes
accumulo-util- Accumulo utilities for building native libraries, running jars, etc.
These scripts will be used in the remaining instructions to configure and run Accumulo.
For convenience, consider adding
accumulo-2.0.0-alpha-2/bin/ to your shell’s path.
The accumulo.properties file configures Accumulo server processes (i.e tablet server, master, monitor, etc). Follow these steps to set it up:
accumulo-util build-nativeto build native code. If this command fails, disable native maps by setting tserver.memory.maps.native.enabled to
Set instance.volumes to HDFS location where Accumulo will store data. If your namenode is running at 192.168.1.9:8020 and you want to store data in
/accumuloin HDFS, then set instance.volumes to
Set instance.zookeeper.host to the location of your Zookeepers
(Optional) Change instance.secret (which is used by Accumulo processes to communicate) from the default. This value should match on all servers.
The accumulo-env.sh file sets up environment variables needed by Accumulo:
ZOOKEEPER_HOMEto the location of your Hadoop and Zookeeper installations. Accumulo will use these locations to find Hadoop and Zookeeper jars and add them to your
CLASSPATHvariable. If you you are running a vendor-specific release of Hadoop or Zookeeper, you may need to modify how the
CLASSPATHvariable is built in accumulo-env.sh. If Accumulo has problems loading classes when you start it, run
accumulo classpathto print Accumulo’s classpath.
Accumulo tablet servers are configured by default to use 1GB of memory (768MB is allocated to JVM and 256MB is allocated for native maps). Native maps are allocated memory equal to 33% of the tserver JVM heap. The table below can be used if you would like to change tserver memory usage in the
JAVA_OPTSsection of accumulo-env.sh:
Native? 512MB 1GB 2GB 3GB Yes -Xmx384m -Xms384m -Xmx768m -Xms768m -Xmx1536m -Xms1536m -Xmx2g -Xms2g No -Xmx512m -Xms512m -Xmx1g -Xms1g -Xmx2g -Xms2g -Xmx3g -Xms3g
(Optional) Review the memory settings for the Accumulo master, garbage collector, and monitor in the
JAVA_OPTSsection of accumulo-env.sh.
The accumulo-client.properties file is used by the Accumulo shell and can be passed to Accumulo clients to simplify connecting to Accumulo. Below are steps to configure it.
Pick an authentication type and set auth.type accordingly. The most common
passwordwhich requires auth.principal to be set and auth.token to be set the password of
auth.principal. For the Accumulo shell,
auth.tokencan be commented out and the shell will prompt you for the password of
Accumulo needs to initialize the locations where it stores data in Zookeeper and HDFS. The following command will do this.
The initialization command will prompt for the following information.
- Instance name : This is the name of the Accumulo instance and its Accumulo clients need to know it inorder to connect.
- Root password : Initialization sets up an initial Accumulo root user and prompts for its password. This information will be needed to later connect to Accumulo.
There are several methods for running Accumulo:
Run Accumulo processes using
accumulocommand which runs processes in foreground and will not redirect stderr/stdout. Useful for creating init.d scripts that run Accumulo.
Run Accumulo processes as services using
accumulocommand but backgrounds processes, redirects stderr/stdout and manages pid files. Useful if you are using a cluster management tool (i.e Ansible, Salt, etc).
Run an Accumulo cluster on one or more nodes using
accumulo-serviceto run services). Useful for local development and testing or if you are not using a cluster management tool in production.
Each method above has instructions below.
Run Accumulo processes
Start Accumulo processes (tserver, master, monitor, etc) using command below:
The process will run in the foreground. Use ctrl-c to quit.
Run Accumulo services
Start Accumulo services (tserver, master, monitor, etc) using command below:
accumulo-service tserver start
Run an Accumulo cluster
Before using the
accumulo-cluster script, additional configuration files need
to be created. Use the command below to create them:
This creates five files (masters, gc, monitor, tservers, & tracers)
conf/ directory that contain the node names where Accumulo services
are run on your cluster. By default, all files are configured to
you are running a single-node Accumulo cluster, theses files do not need to be
changed and the next section should be skipped.
If you are running an Accumulo cluster on multiple nodes, the following files
conf/ should be configured with a newline separated list of node names:
- masters : Accumulo primary coordinating process. Must specify one node. Can specify a few for fault tolerance.
- gc : Accumulo garbage collector. Must specify one node. Can specify a few for fault tolerance.
- monitor : Node where Accumulo monitoring web server is run.
- tservers : Accumulo worker processes. List all of the nodes where tablet servers should run in this file.
- tracers : Optional capability. Can specify zero or more nodes.
The Accumulo, Hadoop, and Zookeeper software should be present at the same
location on every node. Also the files in the
conf directory must be copied
to every node. There are many ways to replicate the software and configuration,
two possible tools that can help replicate software and/or config are pdcp
accumulo-cluster script uses ssh to start processes on remote nodes. Before
attempting to start Accumulo, passwordless ssh must be setup on the cluster.
After configuring and initializing Accumulo, use the following command to start the cluster:
Once you have started Accumulo, use the following command to run the Accumulo shell:
accumulo shell -u root
Use your web browser to connect the Accumulo monitor page on port 9995.
http://<hostname in conf/monitor>:9995/
When finished, use the following commands to stop Accumulo:
- Stop Accumulo service:
accumulo-service tserver stop
- Stop Accumulo cluster: