Class AccumuloRowInputFormat
public class AccumuloRowInputFormat
extends org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.accumulo.core.util.PeekingIterator<Map.Entry<Key,Value>>>
This class allows MapReduce jobs to use Accumulo as the source of data. This
InputFormat
provides row names as Text as keys, and a corresponding PeekingIterator as a
value, which in turn makes the Key/Value pairs for that row available to the Map
function. Configure the job using the configure() method, which provides a fluent API.
For Example:
AccumuloRowInputFormat.configure().clientProperties(props).table(name) // required
.auths(auths).addIterator(iter1).ranges(ranges).fetchColumns(columns).executionHints(hints)
.samplerConfiguration(sampleConf).autoAdjustRanges(false) // enabled by default
.scanIsolation(true) // not available with batchScan()
.offlineScan(true) // not available with batchScan()
.store(job);
For descriptions of all options see
InputFormatBuilder.InputFormatOptions- Since:
- 2.0.0
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic InputFormatBuilder.ClientParams<org.apache.hadoop.mapreduce.Job>Sets all the information required for this map reduce job.org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.accumulo.core.util.PeekingIterator<Map.Entry<Key, Value>>> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) List<org.apache.hadoop.mapreduce.InputSplit>getSplits(org.apache.hadoop.mapreduce.JobContext context) Gets the splits of the tables that have been set on the job by reading the metadata table for the specified ranges.
-
Constructor Details
-
AccumuloRowInputFormat
public AccumuloRowInputFormat()
-
-
Method Details
-
createRecordReader
-
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException Gets the splits of the tables that have been set on the job by reading the metadata table for the specified ranges.- Specified by:
getSplitsin classorg.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.accumulo.core.util.PeekingIterator<Map.Entry<Key, Value>>> - Returns:
- the splits from the tables based on the ranges.
- Throws:
IOException- if a table set on the job doesn't exist or an error occurs initializing the tablet locator
-
configure
Sets all the information required for this map reduce job.
-