org.apache.hadoop.mapreduce.InputFormat<Key,Value>

org.apache.accumulo.hadoop.mapreduce.AccumuloInputFormat

public class AccumuloInputFormat extends org.apache.hadoop.mapreduce.InputFormat<Key,Value>

This class allows MapReduce jobs to use Accumulo as the source of data. This InputFormat provides keys and values of type Key and Value to the Map function. Configure the job using the configure() method, which provides a fluent API. For Example:

 AccumuloInputFormat.configure().clientProperties(props).table(name) // required
     .auths(auths).addIterator(iter1).ranges(ranges).fetchColumns(columns).executionHints(hints)
     .samplerConfiguration(sampleConf).autoAdjustRanges(false) // enabled by default
     .scanIsolation(true) // not available with batchScan()
     .offlineScan(true) // not available with batchScan()
     .store(job);

Multiple tables can be set by configuring clientProperties once and then calling .table() for each table. The methods following a call to .table() apply only to that table. For Example:

 AccumuloInputFormat.configure().clientProperties(props) // set client props once
     .table(table1).auths(auths1).fetchColumns(cols1).batchScan(true) // options for table1
     .table(table2).ranges(range2).auths(auths2).addIterator(iter2) // options for table2
     .table(table3).ranges(range3).auths(auths3).addIterator(iter3) // options for table3
     .store(job); // store all tables in the job when finished

For descriptions of all options see InputFormatBuilder.InputFormatOptions

Since:: 2.0

Constructor Summary

Constructors

Constructor

Description

AccumuloInputFormat()
Method Summary

Modifier and Type

Method

Description

static InputFormatBuilder.ClientParams<org.apache.hadoop.mapreduce.Job>

configure()

Sets all the information required for this map reduce job.

org.apache.hadoop.mapreduce.RecordReader<Key,Value>

createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)

List<org.apache.hadoop.mapreduce.InputSplit>

getSplits(org.apache.hadoop.mapreduce.JobContext context)

Gets the splits of the tables that have been set on the job by reading the metadata table for the specified ranges.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- AccumuloInputFormat
  
  public AccumuloInputFormat()
Method Details
- getSplits
  
  public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
  
  Gets the splits of the tables that have been set on the job by reading the metadata table for the specified ranges.
  
  Specified by:
  
  getSplits in class org.apache.hadoop.mapreduce.InputFormat<Key,Value>
  
  Returns:
  
  the splits from the tables based on the ranges.
  
  Throws:
  
  IOException - if a table set on the job doesn't exist or an error occurs initializing the tablet locator
- createRecordReader
  
  public org.apache.hadoop.mapreduce.RecordReader<Key,Value> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
  
  Specified by:
  
  createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<Key,Value>
- configure
  
  public static InputFormatBuilder.ClientParams<org.apache.hadoop.mapreduce.Job> configure()
  
  Sets all the information required for this map reduce job.

Class AccumuloInputFormat

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

AccumuloInputFormat

Method Details

getSplits

createRecordReader

configure