Interface ScannerBase
- All Superinterfaces:
AutoCloseable
,Iterable<Map.Entry<Key,
Value>>
- All Known Subinterfaces:
BatchDeleter
,BatchScanner
,Scanner
- All Known Implementing Classes:
ClientSideIteratorScanner
,IsolatedScanner
,org.apache.accumulo.core.clientImpl.ScannerOptions
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic enum
Consistency level for the scanner. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Add a server-side scan iterator.void
Clears the current classloader context set on this scannervoid
Clears the columns to be fetched (useful for resetting the scanner for reuse).void
Clears sampler configuration making a scanner read all data.void
Clears scan iterators prior to returning a scanner to the pool.void
close()
Closes any underlying connections on the scanner.default void
fetchColumn
(CharSequence colFam, CharSequence colQual) Adds a column to the list of columns that will be fetched by this scanner.void
fetchColumn
(IteratorSetting.Column column) Adds a column to the list of columns that will be fetch by this scanner.void
fetchColumn
(org.apache.hadoop.io.Text colFam, org.apache.hadoop.io.Text colQual) Adds a column to the list of columns that will be fetched by this scanner.default void
fetchColumnFamily
(CharSequence colFam) Adds a column family to the list of columns that will be fetched by this scanner.void
fetchColumnFamily
(org.apache.hadoop.io.Text col) Adds a column family to the list of columns that will be fetched by this scanner.default void
forEach
(BiConsumer<? super Key, ? super Value> keyValueConsumer) Iterates through Scanner results.Returns the authorizations that have been set on the scannerlong
getBatchTimeout
(TimeUnit timeUnit) Returns the timeout to fill a batch in the given TimeUnit.Returns the name of the current classloader context set on this scannerGet the configured consistency levellong
getTimeout
(TimeUnit timeUnit) Returns the setting for how long a scanner will automatically retry when a failure occurs.iterator()
Returns an iterator over an accumulo table.void
removeScanIterator
(String iteratorName) Remove an iterator from the list of iterators.void
setBatchTimeout
(long timeOut, TimeUnit timeUnit) This setting determines how long a scanner will wait to fill the returned batch.void
setClassLoaderContext
(String classLoaderContext) Sets the name of the classloader context on this scanner.void
Set the desired consistency level for this scanner.default void
setExecutionHints
(Map<String, String> hints) Set hints for the configuredScanPrioritizer
andScanDispatcher
.void
setSamplerConfiguration
(SamplerConfiguration samplerConfig) Setting this will cause the scanner to read sample data, as long as that sample data was generated with the given configuration.void
setTimeout
(long timeOut, TimeUnit timeUnit) This setting determines how long a scanner will automatically retry when a failure occurs.stream()
Stream the Scanner results sequentially from this scanner's iteratorvoid
updateScanIteratorOption
(String iteratorName, String key, String value) Update the options for an iterator.Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Method Details
-
addScanIterator
Add a server-side scan iterator.- Parameters:
cfg
- fully specified scan-time iterator, including all options for the iterator. Any changes to the iterator setting after this call are not propagated to the stored iterator.- Throws:
IllegalArgumentException
- if the setting conflicts with existing iterators
-
removeScanIterator
Remove an iterator from the list of iterators.- Parameters:
iteratorName
- nickname used for the iterator
-
updateScanIteratorOption
Update the options for an iterator. Note that this does not change the iterator options during a scan, it just replaces the given option on a configured iterator before a scan is started.- Parameters:
iteratorName
- the name of the iterator to changekey
- the name of the optionvalue
- the new value for the named option
-
fetchColumnFamily
void fetchColumnFamily(org.apache.hadoop.io.Text col) Adds a column family to the list of columns that will be fetched by this scanner. By default when no columns have been added the scanner fetches all columns. To fetch multiple column families call this function multiple times.This can help limit which locality groups are read on the server side.
When used in conjunction with custom iterators, the set of column families fetched is passed to the top iterator's seek method. Custom iterators may change this set of column families when calling seek on their source.
- Parameters:
col
- the column family to be fetched
-
fetchColumnFamily
Adds a column family to the list of columns that will be fetched by this scanner. By default when no columns have been added the scanner fetches all columns. To fetch multiple column families call this function multiple times.This can help limit which locality groups are read on the server side.
When used in conjunction with custom iterators, the set of column families fetched is passed to the top iterator's seek method. Custom iterators may change this set of column families when calling seek on their source.
- Parameters:
colFam
- the column family to be fetched- Since:
- 2.0.0
-
fetchColumn
void fetchColumn(org.apache.hadoop.io.Text colFam, org.apache.hadoop.io.Text colQual) Adds a column to the list of columns that will be fetched by this scanner. The column is identified by family and qualifier. By default when no columns have been added the scanner fetches all columns.WARNING. Using this method with custom iterators may have unexpected results. Iterators have control over which column families are fetched. However iterators have no control over which column qualifiers are fetched. When this method is called it activates a system iterator that only allows the requested family/qualifier pairs through. This low level filtering prevents custom iterators from requesting additional column families when calling seek.
For an example, assume fetchColumns(A, Q1) and fetchColumns(B,Q1) is called on a scanner and a custom iterator is configured. The families (A,B) will be passed to the seek method of the custom iterator. If the custom iterator seeks its source iterator using the families (A,B,C), it will never see any data from C because the system iterator filtering A:Q1 and B:Q1 will prevent the C family from getting through. ACCUMULO-3905 also has an example of the type of problem this method can cause.
tl;dr If using a custom iterator with a seek method that adds column families, then may want to avoid using this method.
- Parameters:
colFam
- the column family of the column to be fetchedcolQual
- the column qualifier of the column to be fetched
-
fetchColumn
Adds a column to the list of columns that will be fetched by this scanner. The column is identified by family and qualifier. By default when no columns have been added the scanner fetches all columns. See the warning onfetchColumn(Text, Text)
- Parameters:
colFam
- the column family of the column to be fetchedcolQual
- the column qualifier of the column to be fetched- Since:
- 2.0.0
-
fetchColumn
Adds a column to the list of columns that will be fetch by this scanner.- Parameters:
column
- theIteratorSetting.Column
to fetch- Since:
- 1.7.0
-
clearColumns
void clearColumns()Clears the columns to be fetched (useful for resetting the scanner for reuse). Once cleared, the scanner will fetch all columns. -
clearScanIterators
void clearScanIterators()Clears scan iterators prior to returning a scanner to the pool. -
iterator
Returns an iterator over an accumulo table. This iterator uses the options that are currently set for its lifetime, so setting options will have no effect on existing iterators.Keys returned by the iterator are not guaranteed to be in sorted order.
-
setTimeout
This setting determines how long a scanner will automatically retry when a failure occurs. By default, a scanner will retry forever.Setting the timeout to zero (with any time unit) or
Long.MAX_VALUE
(withTimeUnit.MILLISECONDS
) means no timeout.- Parameters:
timeOut
- the length of the timeouttimeUnit
- the units of the timeout- Since:
- 1.5.0
-
getTimeout
Returns the setting for how long a scanner will automatically retry when a failure occurs.- Returns:
- the timeout configured for this scanner
- Since:
- 1.5.0
-
close
void close()Closes any underlying connections on the scanner. This may invalidate any iterators derived from the Scanner, causing them to throw exceptions.- Specified by:
close
in interfaceAutoCloseable
- Since:
- 1.5.0
-
getAuthorizations
Authorizations getAuthorizations()Returns the authorizations that have been set on the scanner- Returns:
- The authorizations set on the scanner instance
- Since:
- 1.7.0
-
setSamplerConfiguration
Setting this will cause the scanner to read sample data, as long as that sample data was generated with the given configuration. By default this is not set and all data is read.One way to use this method is as follows, where the sampler configuration is obtained from the table configuration. Sample data can be generated in many different ways, so its important to verify the sample data configuration meets expectations.
// could cache this if creating many scanners to avoid RPCs. SamplerConfiguration samplerConfig = client.tableOperations().getSamplerConfiguration(table); // verify table's sample data is generated in an expected way before using userCode.verifySamplerConfig(samplerConfig); scanner.setSamplerConfiguration(samplerConfig);
Of course this is not the only way to obtain a
SamplerConfiguration
, it could be a constant, configuration, etc.If sample data is not present or sample data was generated with a different configuration, then the scanner iterator will throw a
SampleNotPresentException
. Also if a table's sampler configuration is changed while a scanner is iterating over a table, aSampleNotPresentException
may be thrown.- Since:
- 1.8.0
-
getSamplerConfiguration
SamplerConfiguration getSamplerConfiguration()- Returns:
- currently set sampler configuration. Returns null if no sampler configuration is set.
- Since:
- 1.8.0
-
clearSamplerConfiguration
void clearSamplerConfiguration()Clears sampler configuration making a scanner read all data. After calling this,getSamplerConfiguration()
should return null.- Since:
- 1.8.0
-
setBatchTimeout
This setting determines how long a scanner will wait to fill the returned batch. By default, a scanner wait until the batch is full.Setting the timeout to zero (with any time unit) or
Long.MAX_VALUE
(withTimeUnit.MILLISECONDS
) means no timeout.- Parameters:
timeOut
- the length of the timeouttimeUnit
- the units of the timeout- Since:
- 1.8.0
-
getBatchTimeout
Returns the timeout to fill a batch in the given TimeUnit.- Returns:
- the batch timeout configured for this scanner
- Since:
- 1.8.0
-
setClassLoaderContext
Sets the name of the classloader context on this scanner. See the administration chapter of the user manual for details on how to configure and use classloader contexts.- Parameters:
classLoaderContext
- name of the classloader context- Throws:
NullPointerException
- if context is null- Since:
- 1.8.0
-
clearClassLoaderContext
void clearClassLoaderContext()Clears the current classloader context set on this scanner- Since:
- 1.8.0
-
getClassLoaderContext
String getClassLoaderContext()Returns the name of the current classloader context set on this scanner- Returns:
- name of the current context
- Since:
- 1.8.0
-
setExecutionHints
Set hints for the configuredScanPrioritizer
andScanDispatcher
. These hints are available on the server side viaScanInfo.getExecutionHints()
Depending on the configuration, these hints may be ignored. Hints will never impact what data is returned by a scan, only how quickly it is returned.Using the hint
scan_type=<type>
and documenting all of the types for your application is one strategy to consider. This allows administrators to adjust executor and prioritizer config for your application scan types without having to change the application source code.The default configuration for Accumulo will ignore hints. See
HintScanPrioritizer
andSimpleScanDispatcher
for examples of classes that can react to hints.- Since:
- 2.0.0
-
forEach
Iterates through Scanner results.- Parameters:
keyValueConsumer
- user-defined BiConsumer- Since:
- 2.1.0
-
getConsistencyLevel
ScannerBase.ConsistencyLevel getConsistencyLevel()Get the configured consistency level- Returns:
- consistency level
- Since:
- 2.1.0
-
setConsistencyLevel
Set the desired consistency level for this scanner.- Parameters:
level
- consistency level- Since:
- 2.1.0
-
stream
Stream the Scanner results sequentially from this scanner's iterator- Returns:
- a Stream of the returned key-value pairs
- Since:
- 2.1.0
-