org.apache.hadoop.mapred.FileOutputFormat<Key,Value>

org.apache.accumulo.core.client.mapred.AccumuloFileOutputFormat

All Implemented Interfaces:: org.apache.hadoop.mapred.OutputFormat<Key,Value>

public class AccumuloFileOutputFormat extends org.apache.hadoop.mapred.FileOutputFormat<Key,Value>

This class allows MapReduce jobs to write output in the Accumulo data file format.
Care should be taken to write only sorted data (sorted by Key), as this is an important requirement of Accumulo data files.

The output path to be created must be specified via FileOutputFormat.setOutputPath(JobConf, Path). This is inherited from FileOutputFormat.setOutputPath(JobConf, Path). Other methods from FileOutputFormat are not supported and may be ignored or cause failures. Using other Hadoop configuration options that affect the behavior of the underlying files directly in the Job's configuration may work, but are not directly supported at this time.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileOutputFormat
org.apache.hadoop.mapred.FileOutputFormat.Counter
Field Summary

Fields

Modifier and Type

Field

Description

protected static final org.apache.log4j.Logger

log
Constructor Summary

Constructors

Constructor

Description

AccumuloFileOutputFormat()
Method Summary

Modifier and Type

Method

Description

protected static org.apache.accumulo.core.conf.AccumuloConfiguration

getAccumuloConfiguration(org.apache.hadoop.mapred.JobConf job)

Deprecated.
since 1.7.0 This method returns a type that is not part of the public API and is not guaranteed to be stable.

org.apache.hadoop.mapred.RecordWriter<Key,Value>

getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress)

static void

setCompressionType(org.apache.hadoop.mapred.JobConf job, String compressionType)

Sets the compression type to use for data blocks.

static void

setDataBlockSize(org.apache.hadoop.mapred.JobConf job, long dataBlockSize)

Sets the size for data blocks within each file.
Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.

static void

setFileBlockSize(org.apache.hadoop.mapred.JobConf job, long fileBlockSize)

Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.

static void

setIndexBlockSize(org.apache.hadoop.mapred.JobConf job, long indexBlockSize)

Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file.

static void

setReplication(org.apache.hadoop.mapred.JobConf job, int replication)

Sets the file system replication factor for the resulting file, overriding the file system default.

static void

setSampler(org.apache.hadoop.mapred.JobConf job, SamplerConfiguration samplerConfig)

Specify a sampler to be used when writing out data.

Methods inherited from class org.apache.hadoop.mapred.FileOutputFormat
checkOutputSpecs, getCompressOutput, getOutputCompressorClass, getOutputPath, getPathForCustomFile, getTaskOutputPath, getUniqueName, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath, setWorkOutputPath

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- log
  
  protected static final org.apache.log4j.Logger log
Constructor Details
- AccumuloFileOutputFormat
  
  public AccumuloFileOutputFormat()
Method Details
- getAccumuloConfiguration
  
  @Deprecated protected static org.apache.accumulo.core.conf.AccumuloConfiguration getAccumuloConfiguration(org.apache.hadoop.mapred.JobConf job)
  
  Deprecated.
  since 1.7.0 This method returns a type that is not part of the public API and is not guaranteed to be stable. The method was deprecated to discourage its use.
  
  This helper method provides an AccumuloConfiguration object constructed from the Accumulo defaults, and overridden with Accumulo properties that have been stored in the Job's configuration.
  
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Since:
  
  1.5.0
- setCompressionType
  
  public static void setCompressionType(org.apache.hadoop.mapred.JobConf job, String compressionType)
  
  Sets the compression type to use for data blocks. Specifying a compression may require additional libraries to be available to your Job.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  compressionType - one of "none", "gz", "lzo", or "snappy"
  
  Since:
  
  1.5.0
- setDataBlockSize
  
  public static void setDataBlockSize(org.apache.hadoop.mapred.JobConf job, long dataBlockSize)
  
  Sets the size for data blocks within each file.
  Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.
  Making this value smaller may increase seek performance, but at the cost of increasing the size of the indexes (which can also affect seek performance).
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  dataBlockSize - the block size, in bytes
  
  Since:
  
  1.5.0
- setFileBlockSize
  
  public static void setFileBlockSize(org.apache.hadoop.mapred.JobConf job, long fileBlockSize)
  
  Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  fileBlockSize - the block size, in bytes
  
  Since:
  
  1.5.0
- setIndexBlockSize
  
  public static void setIndexBlockSize(org.apache.hadoop.mapred.JobConf job, long indexBlockSize)
  
  Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file. This can affect the performance of queries.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  indexBlockSize - the block size, in bytes
  
  Since:
  
  1.5.0
- setReplication
  
  public static void setReplication(org.apache.hadoop.mapred.JobConf job, int replication)
  
  Sets the file system replication factor for the resulting file, overriding the file system default.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  replication - the number of replicas for produced files
  
  Since:
  
  1.5.0
- setSampler
  
  public static void setSampler(org.apache.hadoop.mapred.JobConf job, SamplerConfiguration samplerConfig)
  
  Specify a sampler to be used when writing out data. This will result in the output file having sample data.
  
  Parameters:
  
  job - The Hadoop job instance to be configured
  
  samplerConfig - The configuration for creating sample data in the output file.
  
  Since:
  
  1.8.0
- getRecordWriter
  
  public org.apache.hadoop.mapred.RecordWriter<Key,Value> getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress) throws IOException
  
  Specified by:
  
  getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat<Key,Value>
  
  Specified by:
  
  getRecordWriter in class org.apache.hadoop.mapred.FileOutputFormat<Key,Value>
  
  Throws:
  
  IOException

Class AccumuloFileOutputFormat

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileOutputFormat

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapred.FileOutputFormat

Methods inherited from class java.lang.Object

Field Details

log

Constructor Details

AccumuloFileOutputFormat

Method Details

getAccumuloConfiguration

setCompressionType

setDataBlockSize

setFileBlockSize

setIndexBlockSize

setReplication

setSampler

getRecordWriter