Interface BatchScanner

All Superinterfaces:
AutoCloseable, Iterable<Map.Entry<Key,Value>>, ScannerBase
All Known Implementing Classes:
MockBatchDeleter, MockBatchScanner

public interface BatchScanner extends ScannerBase
In exchange for possibly returning scanned entries out of order, BatchScanner implementations may scan an Accumulo table more efficiently by
  • Looking up multiple ranges in parallel. Parallelism is constrained by the number of threads available to the BatchScanner, set in its constructor.
  • Breaking up large ranges into subranges. Often the number and boundaries of subranges are determined by a table's split points.
  • Combining multiple ranges into a single RPC call to a tablet server.
The above techniques lead to better performance than a Scanner in use cases such as
  • Retrieving many small ranges
  • Scanning a large range that returns many entries
  • Running server-side iterators that perform computation, even if few entries are returned from the scan itself
To re-emphasize, only use a BatchScanner when you do not care whether returned data is in sorted order. Use a Scanner instead when sorted order is important.

A BatchScanner instance will use no more threads than provided in the construction of the BatchScanner implementation. Multiple invocations of iterator() will all share the same resources of the instance. A new BatchScanner instance should be created to use allocate additional threads.

  • Method Details

    • setRanges

      void setRanges(Collection<Range> ranges)
      Allows scanning over multiple ranges efficiently.
      Parameters:
      ranges - specifies the non-overlapping ranges to query
    • close

      void close()
      Description copied from interface: ScannerBase
      Closes any underlying connections on the scanner. This may invalidate any iterators derived from the Scanner, causing them to throw exceptions.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface ScannerBase
    • setTimeout

      void setTimeout(long timeout, TimeUnit timeUnit)
      This setting determines how long a scanner will automatically retry when a failure occurs. By default, a scanner will retry forever.

      Setting the timeout to zero (with any time unit) or Long.MAX_VALUE (with TimeUnit.MILLISECONDS) means no timeout.

      The batch scanner will accomplish as much work as possible before throwing an exception. BatchScanner iterators will throw a TimedOutException when all needed servers timeout.

      Specified by:
      setTimeout in interface ScannerBase
      Parameters:
      timeout - the length of the timeout
      timeUnit - the units of the timeout