AccumuloFileOutputFormat (Apache Accumulo Project 1.5.4 API)

java.lang.Object
- org.apache.hadoop.mapreduce.OutputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<Key,Value>
  - - org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat

```
public class AccumuloFileOutputFormat
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<Key,Value>
```
This class allows MapReduce jobs to write output in the Accumulo data file format.
Care should be taken to write only sorted data (sorted by Key), as this is an important requirement of Accumulo data files.
The output path to be created must be specified via FileOutputFormat.setOutputPath(Job, Path). This is inherited from FileOutputFormat.setOutputPath(Job, Path). Other methods from FileOutputFormat are not supported and may be ignored or cause failures. Using other Hadoop configuration options that affect the behavior of the underlying files directly in the Job's configuration may work, but are not directly supported at this time.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
  org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter

Field Summary

Fields
Modifier and Type Field and Description

protected static org.apache.log4j.Logger log
- Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
  BASE_OUTPUT_NAME, PART

Fields
Modifier and Type	Field and Description
`protected static org.apache.log4j.Logger`	`log`

Constructor Summary

Constructors
Constructor and Description

AccumuloFileOutputFormat()

Constructors
Constructor and Description
`AccumuloFileOutputFormat()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected static org.apache.accumulo.core.conf.AccumuloConfiguration`	`getAccumuloConfiguration(org.apache.hadoop.mapreduce.JobContext context)` This helper method provides an AccumuloConfiguration object constructed from the Accumulo defaults, and overridden with Accumulo properties that have been stored in the Job's configuration.
`protected static Instance`	`getInstance(org.apache.hadoop.conf.Configuration conf)` Deprecated. since 1.5.0; This `OutputFormat` does not communicate with Accumulo. If this is needed, subclasses must implement their own configuration.
`org.apache.hadoop.mapreduce.RecordWriter<Key,Value>`	`getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)`
`protected static void`	`handleBlockSize(org.apache.hadoop.conf.Configuration conf)` Deprecated. since 1.5.0; Retrieve the relevant block size from `getAccumuloConfiguration(JobContext)` and configure hadoop's io.seqfile.compress.blocksize with the same value. No longer needed, as `RFile` does not use this field.
`static void`	`setBlockSize(org.apache.hadoop.conf.Configuration conf, int blockSize)` Deprecated. since 1.5.0; Use `setFileBlockSize(Job, long)`, `setDataBlockSize(Job, long)`, or `setIndexBlockSize(Job, long)` instead.
`static void`	`setCompressionType(org.apache.hadoop.mapreduce.Job job, String compressionType)` Sets the compression type to use for data blocks.
`static void`	`setDataBlockSize(org.apache.hadoop.mapreduce.Job job, long dataBlockSize)` Sets the size for data blocks within each file. Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.
`static void`	`setFileBlockSize(org.apache.hadoop.mapreduce.Job job, long fileBlockSize)` Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
`static void`	`setFileType(org.apache.hadoop.conf.Configuration conf, String type)` Deprecated. since 1.5.0; This method does nothing. Only 'rf' type is supported.
`static void`	`setIndexBlockSize(org.apache.hadoop.mapreduce.Job job, long indexBlockSize)` Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file.
`static void`	`setReplication(org.apache.hadoop.mapreduce.Job job, int replication)` Sets the file system replication factor for the resulting file, overriding the file system default.
`static void`	`setZooKeeperInstance(org.apache.hadoop.conf.Configuration conf, String instanceName, String zooKeepers)` Deprecated. since 1.5.0; This `OutputFormat` does not communicate with Accumulo. If this is needed, subclasses must implement their own configuration.

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - log
```
protected static final org.apache.log4j.Logger log
```
- Constructor Detail
  - AccumuloFileOutputFormat
```
public AccumuloFileOutputFormat()
```
- Method Detail
  - getAccumuloConfiguration
```
protected static org.apache.accumulo.core.conf.AccumuloConfiguration getAccumuloConfiguration(org.apache.hadoop.mapreduce.JobContext context)
```
    This helper method provides an AccumuloConfiguration object constructed from the Accumulo defaults, and overridden with Accumulo properties that have been stored in the Job's configuration.
    
    Parameters:
    context - the Hadoop context for the configured job
    Since:
    
    1.5.0
  - setCompressionType
```
public static void setCompressionType(org.apache.hadoop.mapreduce.Job job,
                      String compressionType)
```
    Sets the compression type to use for data blocks. Specifying a compression may require additional libraries to be available to your Job.
    
    Parameters:
    job - the Hadoop job instance to be configured
    compressionType - one of "none", "gz", "lzo", or "snappy"
    Since:
    
    1.5.0
  - setDataBlockSize
```
public static void setDataBlockSize(org.apache.hadoop.mapreduce.Job job,
                    long dataBlockSize)
```
    Sets the size for data blocks within each file.
    Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.
    Making this value smaller may increase seek performance, but at the cost of increasing the size of the indexes (which can also affect seek performance).
    
    Parameters:
    job - the Hadoop job instance to be configured
    dataBlockSize - the block size, in bytes
    Since:
    
    1.5.0
  - setFileBlockSize
```
public static void setFileBlockSize(org.apache.hadoop.mapreduce.Job job,
                    long fileBlockSize)
```
    Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
    
    Parameters:
    job - the Hadoop job instance to be configured
    fileBlockSize - the block size, in bytes
    Since:
    
    1.5.0
  - setIndexBlockSize
```
public static void setIndexBlockSize(org.apache.hadoop.mapreduce.Job job,
                     long indexBlockSize)
```
    Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file. This can affect the performance of queries.
    
    Parameters:
    job - the Hadoop job instance to be configured
    indexBlockSize - the block size, in bytes
    Since:
    
    1.5.0
  - setReplication
```
public static void setReplication(org.apache.hadoop.mapreduce.Job job,
                  int replication)
```
    Sets the file system replication factor for the resulting file, overriding the file system default.
    
    Parameters:
    job - the Hadoop job instance to be configured
    replication - the number of replicas for produced files
    Since:
    
    1.5.0
  - getRecordWriter
```
public org.apache.hadoop.mapreduce.RecordWriter<Key,Value> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                    throws IOException
```
    Specified by:
    
    getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<Key,Value>
    
    Throws:
    
    IOException
  - handleBlockSize
```
@Deprecated
protected static void handleBlockSize(org.apache.hadoop.conf.Configuration conf)
```
    Deprecated. since 1.5.0; Retrieve the relevant block size from getAccumuloConfiguration(JobContext) and configure hadoop's io.seqfile.compress.blocksize with the same value. No longer needed, as RFile does not use this field.
  - setFileType
```
@Deprecated
public static void setFileType(org.apache.hadoop.conf.Configuration conf,
                          String type)
```
    Deprecated. since 1.5.0; This method does nothing. Only 'rf' type is supported.
  - setBlockSize
```
@Deprecated
public static void setBlockSize(org.apache.hadoop.conf.Configuration conf,
                           int blockSize)
```
    Deprecated. since 1.5.0; Use setFileBlockSize(Job, long), setDataBlockSize(Job, long), or setIndexBlockSize(Job, long) instead.
  - setZooKeeperInstance
```
@Deprecated
public static void setZooKeeperInstance(org.apache.hadoop.conf.Configuration conf,
                                   String instanceName,
                                   String zooKeepers)
```
    Deprecated. since 1.5.0; This OutputFormat does not communicate with Accumulo. If this is needed, subclasses must implement their own configuration.
  - getInstance
```
@Deprecated
protected static Instance getInstance(org.apache.hadoop.conf.Configuration conf)
```
    Deprecated. since 1.5.0; This OutputFormat does not communicate with Accumulo. If this is needed, subclasses must implement their own configuration.

Class AccumuloFileOutputFormat

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Methods inherited from class java.lang.Object

Field Detail

log

Constructor Detail

AccumuloFileOutputFormat

Method Detail

getAccumuloConfiguration

setCompressionType

setDataBlockSize

setFileBlockSize

setIndexBlockSize

setReplication

getRecordWriter

handleBlockSize

setFileType

setBlockSize

setZooKeeperInstance

getInstance