Interface FileOutputFormatBuilder.OutputOptions<T>
- Enclosing interface:
- FileOutputFormatBuilder
public static interface FileOutputFormatBuilder.OutputOptions<T>
Options for builder
- Since:
- 2.0
-
Method Summary
Modifier and TypeMethodDescriptioncompression
(String compressionType) Sets the compression type to use for data blocks, overriding the default.dataBlockSize
(long dataBlockSize) Sets the size for data blocks within each file.
Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.fileBlockSize
(long fileBlockSize) Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.indexBlockSize
(long indexBlockSize) Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file.replication
(int replication) Sets the file system replication factor for the resulting file, overriding the file system default.sampler
(SamplerConfiguration samplerConfig) Specify a sampler to be used when writing out data.void
Finish configuring, verify and serialize options into the Job or JobConfsummarizers
(SummarizerConfiguration... summarizerConfigs) Specifies a list of summarizer configurations to create summary data in the output file.
-
Method Details
-
compression
Sets the compression type to use for data blocks, overriding the default. Specifying a compression may require additional libraries to be available to your Job.- Parameters:
compressionType
- one of "none", "gz", "bzip2", "lzo", "lz4", "snappy", or "zstd"
-
dataBlockSize
Sets the size for data blocks within each file.
Data blocks are a span of key/value pairs stored in the file that are compressed and indexed as a group.Making this value smaller may increase seek performance, but at the cost of increasing the size of the indexes (which can also affect seek performance).
- Parameters:
dataBlockSize
- the block size, in bytes
-
fileBlockSize
Sets the size for file blocks in the file system; file blocks are managed, and replicated, by the underlying file system.- Parameters:
fileBlockSize
- the block size, in bytes
-
indexBlockSize
Sets the size for index blocks within each file; smaller blocks means a deeper index hierarchy within the file, while larger blocks mean a more shallow index hierarchy within the file. This can affect the performance of queries.- Parameters:
indexBlockSize
- the block size, in bytes
-
replication
Sets the file system replication factor for the resulting file, overriding the file system default.- Parameters:
replication
- the number of replicas for produced files
-
sampler
Specify a sampler to be used when writing out data. This will result in the output file having sample data.- Parameters:
samplerConfig
- The configuration for creating sample data in the output file.
-
summarizers
Specifies a list of summarizer configurations to create summary data in the output file. Each Key Value written will be passed to the configuredSummarizer
's.- Parameters:
summarizerConfigs
- summarizer configurations
-
store
Finish configuring, verify and serialize options into the Job or JobConf
-