public class AccumuloFileOutputFormat extends org.apache.hadoop.mapred.FileOutputFormat<Key,Value>
Key
), as this is an important requirement of Accumulo data files.
The output path to be created must be specified via FileOutputFormat.setOutputPath(JobConf, Path)
. This is inherited from
FileOutputFormat.setOutputPath(JobConf, Path)
. Other methods from FileOutputFormat
are not supported and may be ignored or cause failures.
Using other Hadoop configuration options that affect the behavior of the underlying files directly in the Job's configuration may work, but are not directly
supported at this time.
Modifier and Type | Field and Description |
---|---|
protected static org.apache.log4j.Logger |
log |
Constructor and Description |
---|
AccumuloFileOutputFormat() |
Modifier and Type | Method and Description |
---|---|
protected static org.apache.accumulo.core.conf.AccumuloConfiguration |
getAccumuloConfiguration(org.apache.hadoop.mapred.JobConf job)
Deprecated.
since 1.7.0 This method returns a type that is not part of the public API and is not guaranteed to be stable. The method was deprecated to
discourage its use.
|
org.apache.hadoop.mapred.RecordWriter<Key,Value> |
getRecordWriter(org.apache.hadoop.fs.FileSystem ignored,
org.apache.hadoop.mapred.JobConf job,
String name,
org.apache.hadoop.util.Progressable progress) |
static void |
setCompressionType(org.apache.hadoop.mapred.JobConf job,
String compressionType)
Sets the compression type to use for data blocks.
|
static void |
setDataBlockSize(org.apache.hadoop.mapred.JobConf job,
long dataBlockSize)
Sets the size for data blocks within each file.
Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group. |
static void |
setFileBlockSize(org.apache.hadoop.mapred.JobConf job,
long fileBlockSize)
Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
|
static void |
setIndexBlockSize(org.apache.hadoop.mapred.JobConf job,
long indexBlockSize)
Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow
index hierarchy within the file.
|
static void |
setReplication(org.apache.hadoop.mapred.JobConf job,
int replication)
Sets the file system replication factor for the resulting file, overriding the file system default.
|
static void |
setSampler(org.apache.hadoop.mapred.JobConf job,
SamplerConfiguration samplerConfig)
Specify a sampler to be used when writing out data.
|
checkOutputSpecs, getCompressOutput, getOutputCompressorClass, getOutputPath, getPathForCustomFile, getTaskOutputPath, getUniqueName, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath, setWorkOutputPath
@Deprecated protected static org.apache.accumulo.core.conf.AccumuloConfiguration getAccumuloConfiguration(org.apache.hadoop.mapred.JobConf job)
job
- the Hadoop context for the configured jobpublic static void setCompressionType(org.apache.hadoop.mapred.JobConf job, String compressionType)
job
- the Hadoop job instance to be configuredcompressionType
- one of "none", "gz", "lzo", or "snappy"public static void setDataBlockSize(org.apache.hadoop.mapred.JobConf job, long dataBlockSize)
Making this value smaller may increase seek performance, but at the cost of increasing the size of the indexes (which can also affect seek performance).
job
- the Hadoop job instance to be configureddataBlockSize
- the block size, in bytespublic static void setFileBlockSize(org.apache.hadoop.mapred.JobConf job, long fileBlockSize)
job
- the Hadoop job instance to be configuredfileBlockSize
- the block size, in bytespublic static void setIndexBlockSize(org.apache.hadoop.mapred.JobConf job, long indexBlockSize)
job
- the Hadoop job instance to be configuredindexBlockSize
- the block size, in bytespublic static void setReplication(org.apache.hadoop.mapred.JobConf job, int replication)
job
- the Hadoop job instance to be configuredreplication
- the number of replicas for produced filespublic static void setSampler(org.apache.hadoop.mapred.JobConf job, SamplerConfiguration samplerConfig)
job
- The Hadoop job instance to be configuredsamplerConfig
- The configuration for creating sample data in the output file.public org.apache.hadoop.mapred.RecordWriter<Key,Value> getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress) throws IOException
getRecordWriter
in interface org.apache.hadoop.mapred.OutputFormat<Key,Value>
getRecordWriter
in class org.apache.hadoop.mapred.FileOutputFormat<Key,Value>
IOException
Copyright © 2011–2017 The Apache Software Foundation. All rights reserved.