Interface FileOutputFormatBuilder.OutputOptions<T>

Enclosing interface:
FileOutputFormatBuilder

public static interface FileOutputFormatBuilder.OutputOptions<T>
Options for builder
Since:
2.0
  • Method Details

    • compression

      FileOutputFormatBuilder.OutputOptions<T> compression(String compressionType)
      Sets the compression type to use for data blocks, overriding the default. Specifying a compression may require additional libraries to be available to your Job.
      Parameters:
      compressionType - one of "none", "gz", "bzip2", "lzo", "lz4", "snappy", or "zstd"
    • dataBlockSize

      FileOutputFormatBuilder.OutputOptions<T> dataBlockSize(long dataBlockSize)
      Sets the size for data blocks within each file.
      Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.

      Making this value smaller may increase seek performance, but at the cost of increasing the size of the indexes (which can also affect seek performance).

      Parameters:
      dataBlockSize - the block size, in bytes
    • fileBlockSize

      FileOutputFormatBuilder.OutputOptions<T> fileBlockSize(long fileBlockSize)
      Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.
      Parameters:
      fileBlockSize - the block size, in bytes
    • indexBlockSize

      FileOutputFormatBuilder.OutputOptions<T> indexBlockSize(long indexBlockSize)
      Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file. This can affect the performance of queries.
      Parameters:
      indexBlockSize - the block size, in bytes
    • replication

      FileOutputFormatBuilder.OutputOptions<T> replication(int replication)
      Sets the file system replication factor for the resulting file, overriding the file system default.
      Parameters:
      replication - the number of replicas for produced files
    • sampler

      Specify a sampler to be used when writing out data. This will result in the output file having sample data.
      Parameters:
      samplerConfig - The configuration for creating sample data in the output file.
    • summarizers

      Specifies a list of summarizer configurations to create summary data in the output file. Each Key Value written will be passed to the configured Summarizer's.
      Parameters:
      summarizerConfigs - summarizer configurations
    • store

      void store(T job)
      Finish configuring, verify and serialize options into the Job or JobConf