Class AccumuloFileOutputFormat

java.lang.Object
org.apache.hadoop.mapred.FileOutputFormat<Key,Value>
org.apache.accumulo.core.client.mapred.AccumuloFileOutputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.OutputFormat<Key,Value>

@Deprecated(since="2.0.0") public class AccumuloFileOutputFormat extends org.apache.hadoop.mapred.FileOutputFormat<Key,Value>
Deprecated.
since 2.0.0; Use org.apache.accumulo.hadoop.mapred instead from the accumulo-hadoop-mapreduce.jar
This class allows MapReduce jobs to write output in the Accumulo data file format.
Care should be taken to write only sorted data (sorted by Key), as this is an important requirement of Accumulo data files.

The output path to be created must be specified via FileOutputFormat.setOutputPath(JobConf, Path). This is inherited from FileOutputFormat.setOutputPath(JobConf, Path). Other methods from FileOutputFormat are not supported and may be ignored or cause failures. Using other Hadoop configuration options that affect the behavior of the underlying files directly in the Job's configuration may work, but are not directly supported at this time.

  • Nested Class Summary

    Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileOutputFormat

    org.apache.hadoop.mapred.FileOutputFormat.Counter
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final org.apache.log4j.Logger
    Deprecated.
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    Deprecated.
     
  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.hadoop.mapred.RecordWriter<Key,Value>
    getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress)
    Deprecated.
     
    static void
    setCompressionType(org.apache.hadoop.mapred.JobConf job, String compressionType)
    Deprecated.
    Sets the compression type to use for data blocks.
    static void
    setDataBlockSize(org.apache.hadoop.mapred.JobConf job, long dataBlockSize)
    Deprecated.
    Sets the size for data blocks within each file.
    Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.
    static void
    setFileBlockSize(org.apache.hadoop.mapred.JobConf job, long fileBlockSize)
    Deprecated.
    Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
    static void
    setIndexBlockSize(org.apache.hadoop.mapred.JobConf job, long indexBlockSize)
    Deprecated.
    Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file.
    static void
    setReplication(org.apache.hadoop.mapred.JobConf job, int replication)
    Deprecated.
    Sets the file system replication factor for the resulting file, overriding the file system default.
    static void
    setSampler(org.apache.hadoop.mapred.JobConf job, SamplerConfiguration samplerConfig)
    Deprecated.
    Specify a sampler to be used when writing out data.

    Methods inherited from class org.apache.hadoop.mapred.FileOutputFormat

    checkOutputSpecs, getCompressOutput, getOutputCompressorClass, getOutputPath, getPathForCustomFile, getTaskOutputPath, getUniqueName, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath, setWorkOutputPath

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • log

      protected static final org.apache.log4j.Logger log
      Deprecated.
  • Constructor Details

    • AccumuloFileOutputFormat

      public AccumuloFileOutputFormat()
      Deprecated.
  • Method Details

    • setCompressionType

      public static void setCompressionType(org.apache.hadoop.mapred.JobConf job, String compressionType)
      Deprecated.
      Sets the compression type to use for data blocks. Specifying a compression may require additional libraries to be available to your Job.
      Parameters:
      job - the Hadoop job instance to be configured
      compressionType - one of "none", "gz", "bzip2", "lzo", "lz4", "snappy", or "zstd"
      Since:
      1.5.0
    • setDataBlockSize

      public static void setDataBlockSize(org.apache.hadoop.mapred.JobConf job, long dataBlockSize)
      Deprecated.
      Sets the size for data blocks within each file.
      Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.

      Making this value smaller may increase seek performance, but at the cost of increasing the size of the indexes (which can also affect seek performance).

      Parameters:
      job - the Hadoop job instance to be configured
      dataBlockSize - the block size, in bytes
      Since:
      1.5.0
    • setFileBlockSize

      public static void setFileBlockSize(org.apache.hadoop.mapred.JobConf job, long fileBlockSize)
      Deprecated.
      Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
      Parameters:
      job - the Hadoop job instance to be configured
      fileBlockSize - the block size, in bytes
      Since:
      1.5.0
    • setIndexBlockSize

      public static void setIndexBlockSize(org.apache.hadoop.mapred.JobConf job, long indexBlockSize)
      Deprecated.
      Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file. This can affect the performance of queries.
      Parameters:
      job - the Hadoop job instance to be configured
      indexBlockSize - the block size, in bytes
      Since:
      1.5.0
    • setReplication

      public static void setReplication(org.apache.hadoop.mapred.JobConf job, int replication)
      Deprecated.
      Sets the file system replication factor for the resulting file, overriding the file system default.
      Parameters:
      job - the Hadoop job instance to be configured
      replication - the number of replicas for produced files
      Since:
      1.5.0
    • setSampler

      public static void setSampler(org.apache.hadoop.mapred.JobConf job, SamplerConfiguration samplerConfig)
      Deprecated.
      Specify a sampler to be used when writing out data. This will result in the output file having sample data.
      Parameters:
      job - The Hadoop job instance to be configured
      samplerConfig - The configuration for creating sample data in the output file.
      Since:
      1.8.0
    • getRecordWriter

      public org.apache.hadoop.mapred.RecordWriter<Key,Value> getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress) throws IOException
      Deprecated.
      Specified by:
      getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat<Key,Value>
      Specified by:
      getRecordWriter in class org.apache.hadoop.mapred.FileOutputFormat<Key,Value>
      Throws:
      IOException