Interface Sampler

All Known Implementing Classes:
AbstractHashSampler, RowColumnSampler, RowSampler

public interface Sampler
A function that decides which key values are stored in a tables sample. As Accumulo compacts data and creates rfiles it uses a Sampler to decided what to store in the rfiles sample section. The class name of the Sampler and the Samplers configuration are stored in each rfile. A scan of a tables sample will only succeed if all rfiles were created with the same sampler and sampler configuration.

Since the decisions that Sampler makes are persisted, the behavior of a Sampler for a given configuration should always be the same. One way to offer a new behavior is to offer new options, while still supporting old behavior with a Samplers existing options.

Ideally a sampler that selects a Key k1 would also select updates for k1. For example if a Sampler selects : row='000989' family='name' qualifier='last' visibility='ADMIN' time=9 value='Doe', it would be nice if it also selected : row='000989' family='name' qualifier='last' visibility='ADMIN' time=20 value='Dough'. Using hash and modulo on the key fields is a good way to accomplish this and AbstractHashSampler provides a good basis for implementation.

Since:
1.8.0
  • Method Details

    • init

      void init(SamplerConfiguration config)
      An implementation of Sampler must have a noarg constructor. After construction this method is called once to initialize a sampler before it is used.
      Parameters:
      config - Configuration options for a sampler.
    • accept

      boolean accept(Key k)
      Parameters:
      k - A key that was written to a rfile.
      Returns:
      True if the key (and its associated value) should be stored in the rfile's sample. Return false if it should not be included.
    • validateOptions

      default void validateOptions(Map<String,String> config)
      Parameters:
      config - Sampler options configuration to validate. Validates option and value.