public class InputTableConfig extends Object implements org.apache.hadoop.io.Writable
Constructor and Description |
---|
InputTableConfig() |
InputTableConfig(DataInput input)
Creates a batch scan config object out of a previously serialized batch scan config object.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object o) |
InputTableConfig |
fetchColumns(Collection<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columns)
Restricts the columns that will be mapped over for this job for the default input table.
|
Collection<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> |
getFetchedColumns()
Returns the columns to be fetched for this configuration
|
List<IteratorSetting> |
getIterators()
Returns the iterators to be set on this configuration
|
List<Range> |
getRanges()
Returns the ranges to be queried in the configuration
|
SamplerConfiguration |
getSamplerConfiguration() |
int |
hashCode() |
boolean |
isOfflineScan()
Determines whether a configuration has the offline table scan feature enabled.
|
void |
readFields(DataInput dataInput) |
InputTableConfig |
setAutoAdjustRanges(boolean autoAdjustRanges)
Controls the automatic adjustment of ranges for this job.
|
InputTableConfig |
setIterators(List<IteratorSetting> iterators)
Set iterators on to be used in the query.
|
InputTableConfig |
setOfflineScan(boolean offlineScan)
Enable reading offline tables.
|
InputTableConfig |
setRanges(List<Range> ranges)
Sets the input ranges to scan for all tables associated with this job.
|
void |
setSamplerConfiguration(SamplerConfiguration samplerConfiguration)
Set the sampler configuration to use when reading from the data.
|
InputTableConfig |
setUseIsolatedScanners(boolean useIsolatedScanners)
Controls the use of the
IsolatedScanner in this job. |
InputTableConfig |
setUseLocalIterators(boolean useLocalIterators)
Controls the use of the
ClientSideIteratorScanner in this job. |
boolean |
shouldAutoAdjustRanges()
Determines whether a configuration has auto-adjust ranges enabled.
|
boolean |
shouldUseIsolatedScanners()
Determines whether a configuration has isolation enabled.
|
boolean |
shouldUseLocalIterators()
Determines whether a configuration uses local iterators.
|
void |
write(DataOutput dataOutput) |
public InputTableConfig()
public InputTableConfig(DataInput input) throws IOException
input
- the data input of the serialized batch scan configIOException
public InputTableConfig setRanges(List<Range> ranges)
ranges
- the ranges that will be mapped overpublic InputTableConfig fetchColumns(Collection<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columns)
columns
- a pair of Text
objects corresponding to column family and column qualifier. If the column qualifier is null, the entire column family is
selected. An empty set is the default and is equivalent to scanning the all columns.public Collection<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> getFetchedColumns()
public InputTableConfig setIterators(List<IteratorSetting> iterators)
iterators
- the configurations for the iteratorspublic List<IteratorSetting> getIterators()
public InputTableConfig setAutoAdjustRanges(boolean autoAdjustRanges)
By default, this feature is enabled.
autoAdjustRanges
- the feature is enabled if true, disabled otherwisesetRanges(java.util.List)
public boolean shouldAutoAdjustRanges()
setAutoAdjustRanges(boolean)
public InputTableConfig setUseLocalIterators(boolean useLocalIterators)
ClientSideIteratorScanner
in this job. Enabling this feature will cause the iterator stack
to be constructed within the Map task, rather than within the Accumulo TServer. To use this feature, all classes needed for those iterators must be
available on the classpath for the task.
By default, this feature is disabled.
useLocalIterators
- the feature is enabled if true, disabled otherwisepublic boolean shouldUseLocalIterators()
setUseLocalIterators(boolean)
public InputTableConfig setOfflineScan(boolean offlineScan)
To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.
Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath. The accumulo-site.xml may need to be on the mapper's classpath if HDFS or the Accumulo directory in HDFS are non-standard.
One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.
There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.
By default, this feature is disabled.
offlineScan
- the feature is enabled if true, disabled otherwisepublic boolean isOfflineScan()
setOfflineScan(boolean)
public InputTableConfig setUseIsolatedScanners(boolean useIsolatedScanners)
IsolatedScanner
in this job.
By default, this feature is disabled.
useIsolatedScanners
- the feature is enabled if true, disabled otherwisepublic boolean shouldUseIsolatedScanners()
setUseIsolatedScanners(boolean)
public void setSamplerConfiguration(SamplerConfiguration samplerConfiguration)
public SamplerConfiguration getSamplerConfiguration()
public void write(DataOutput dataOutput) throws IOException
write
in interface org.apache.hadoop.io.Writable
IOException
public void readFields(DataInput dataInput) throws IOException
readFields
in interface org.apache.hadoop.io.Writable
IOException
Copyright © 2011–2017 The Apache Software Foundation. All rights reserved.