Interface ScannerBase

All Superinterfaces:
AutoCloseable, Iterable<Map.Entry<Key,Value>>
All Known Subinterfaces:
BatchDeleter, BatchScanner, Scanner
All Known Implementing Classes:
ClientSideIteratorScanner, IsolatedScanner, MockBatchDeleter, MockBatchScanner, MockScanner, MockScannerBase, org.apache.accumulo.core.client.impl.ScannerOptions

public interface ScannerBase extends Iterable<Map.Entry<Key,Value>>, AutoCloseable
This class hosts configuration methods that are shared between different types of scanners.
  • Method Details

    • addScanIterator

      void addScanIterator(IteratorSetting cfg)
      Add a server-side scan iterator.
      Parameters:
      cfg - fully specified scan-time iterator, including all options for the iterator. Any changes to the iterator setting after this call are not propagated to the stored iterator.
      Throws:
      IllegalArgumentException - if the setting conflicts with existing iterators
    • removeScanIterator

      void removeScanIterator(String iteratorName)
      Remove an iterator from the list of iterators.
      Parameters:
      iteratorName - nickname used for the iterator
    • updateScanIteratorOption

      void updateScanIteratorOption(String iteratorName, String key, String value)
      Update the options for an iterator. Note that this does not change the iterator options during a scan, it just replaces the given option on a configured iterator before a scan is started.
      Parameters:
      iteratorName - the name of the iterator to change
      key - the name of the option
      value - the new value for the named option
    • fetchColumnFamily

      void fetchColumnFamily(org.apache.hadoop.io.Text col)
      Adds a column family to the list of columns that will be fetched by this scanner. By default when no columns have been added the scanner fetches all columns. To fetch multiple column families call this function multiple times.

      This can help limit which locality groups are read on the server side.

      When used in conjunction with custom iterators, the set of column families fetched is passed to the top iterator's seek method. Custom iterators may change this set of column families when calling seek on their source.

      Parameters:
      col - the column family to be fetched
    • fetchColumn

      void fetchColumn(org.apache.hadoop.io.Text colFam, org.apache.hadoop.io.Text colQual)
      Adds a column to the list of columns that will be fetched by this scanner. The column is identified by family and qualifier. By default when no columns have been added the scanner fetches all columns.

      WARNING. Using this method with custom iterators may have unexpected results. Iterators have control over which column families are fetched. However iterators have no control over which column qualifiers are fetched. When this method is called it activates a system iterator that only allows the requested family/qualifier pairs through. This low level filtering prevents custom iterators from requesting additional column families when calling seek.

      For an example, assume fetchColumns(A, Q1) and fetchColumns(B,Q1) is called on a scanner and a custom iterator is configured. The families (A,B) will be passed to the seek method of the custom iterator. If the custom iterator seeks its source iterator using the families (A,B,C), it will never see any data from C because the system iterator filtering A:Q1 and B:Q1 will prevent the C family from getting through. ACCUMULO-3905 also has an example of the type of problem this method can cause.

      tl;dr If using a custom iterator with a seek method that adds column families, then may want to avoid using this method.

      Parameters:
      colFam - the column family of the column to be fetched
      colQual - the column qualifier of the column to be fetched
    • fetchColumn

      void fetchColumn(IteratorSetting.Column column)
      Adds a column to the list of columns that will be fetch by this scanner.
      Parameters:
      column - the IteratorSetting.Column to fetch
      Since:
      1.7.0
    • clearColumns

      void clearColumns()
      Clears the columns to be fetched (useful for resetting the scanner for reuse). Once cleared, the scanner will fetch all columns.
    • clearScanIterators

      void clearScanIterators()
      Clears scan iterators prior to returning a scanner to the pool.
    • iterator

      Iterator<Map.Entry<Key,Value>> iterator()
      Returns an iterator over an accumulo table. This iterator uses the options that are currently set for its lifetime, so setting options will have no effect on existing iterators.

      Keys returned by the iterator are not guaranteed to be in sorted order.

      Specified by:
      iterator in interface Iterable<Map.Entry<Key,Value>>
      Returns:
      an iterator over Key,Value pairs which meet the restrictions set on the scanner
    • setTimeout

      void setTimeout(long timeOut, TimeUnit timeUnit)
      This setting determines how long a scanner will automatically retry when a failure occurs. By default, a scanner will retry forever.

      Setting the timeout to zero (with any time unit) or Long.MAX_VALUE (with TimeUnit.MILLISECONDS) means no timeout.

      Parameters:
      timeOut - the length of the timeout
      timeUnit - the units of the timeout
      Since:
      1.5.0
    • getTimeout

      long getTimeout(TimeUnit timeUnit)
      Returns the setting for how long a scanner will automatically retry when a failure occurs.
      Returns:
      the timeout configured for this scanner
      Since:
      1.5.0
    • close

      void close()
      Closes any underlying connections on the scanner. This may invalidate any iterators derived from the Scanner, causing them to throw exceptions.
      Specified by:
      close in interface AutoCloseable
      Since:
      1.5.0
    • getAuthorizations

      Authorizations getAuthorizations()
      Returns the authorizations that have been set on the scanner
      Returns:
      The authorizations set on the scanner instance
      Since:
      1.7.0
    • setSamplerConfiguration

      void setSamplerConfiguration(SamplerConfiguration samplerConfig)
      Setting this will cause the scanner to read sample data, as long as that sample data was generated with the given configuration. By default this is not set and all data is read.

      One way to use this method is as follows, where the sampler configuration is obtained from the table configuration. Sample data can be generated in many different ways, so its important to verify the sample data configuration meets expectations.

       
         // could cache this if creating many scanners to avoid RPCs.
         SamplerConfiguration samplerConfig =
           connector.tableOperations().getSamplerConfiguration(table);
         // verify table's sample data is generated in an expected way before using
         userCode.verifySamplerConfig(samplerConfig);
         scanner.setSamplerCongiguration(samplerConfig);
       
       

      Of course this is not the only way to obtain a SamplerConfiguration, it could be a constant, configuration, etc.

      If sample data is not present or sample data was generated with a different configuration, then the scanner iterator will throw a SampleNotPresentException. Also if a table's sampler configuration is changed while a scanner is iterating over a table, a SampleNotPresentException may be thrown.

      Since:
      1.8.0
    • getSamplerConfiguration

      SamplerConfiguration getSamplerConfiguration()
      Returns:
      currently set sampler configuration. Returns null if no sampler configuration is set.
      Since:
      1.8.0
    • clearSamplerConfiguration

      void clearSamplerConfiguration()
      Clears sampler configuration making a scanner read all data. After calling this, getSamplerConfiguration() should return null.
      Since:
      1.8.0
    • setBatchTimeout

      void setBatchTimeout(long timeOut, TimeUnit timeUnit)
      This setting determines how long a scanner will wait to fill the returned batch. By default, a scanner wait until the batch is full.

      Setting the timeout to zero (with any time unit) or Long.MAX_VALUE (with TimeUnit.MILLISECONDS) means no timeout.

      Parameters:
      timeOut - the length of the timeout
      timeUnit - the units of the timeout
      Since:
      1.8.0
    • getBatchTimeout

      long getBatchTimeout(TimeUnit timeUnit)
      Returns the timeout to fill a batch in the given TimeUnit.
      Returns:
      the batch timeout configured for this scanner
      Since:
      1.8.0
    • setClassLoaderContext

      void setClassLoaderContext(String classLoaderContext)
      Sets the name of the classloader context on this scanner. See the administration chapter of the user manual for details on how to configure and use classloader contexts.
      Parameters:
      classLoaderContext - name of the classloader context
      Throws:
      NullPointerException - if context is null
      Since:
      1.8.0
    • clearClassLoaderContext

      void clearClassLoaderContext()
      Clears the current classloader context set on this scanner
      Since:
      1.8.0
    • getClassLoaderContext

      String getClassLoaderContext()
      Returns the name of the current classloader context set on this scanner
      Returns:
      name of the current context
      Since:
      1.8.0