Interface Sampler
- All Known Implementing Classes:
AbstractHashSampler
,RowColumnSampler
,RowSampler
public interface Sampler
A function that decides which key values are stored in a tables sample. As Accumulo compacts data
and creates rfiles it uses a Sampler to decided what to store in the rfiles sample section. The
class name of the Sampler and the Samplers configuration are stored in each rfile. A scan of a
tables sample will only succeed if all rfiles were created with the same sampler and sampler
configuration.
Since the decisions that Sampler makes are persisted, the behavior of a Sampler for a given configuration should always be the same. One way to offer a new behavior is to offer new options, while still supporting old behavior with a Samplers existing options.
Ideally a sampler that selects a Key k1 would also select updates for k1. For example if a
Sampler selects :
row='000989' family='name' qualifier='last' visibility='ADMIN' time=9 value='Doe'
, it
would be nice if it also selected :
row='000989' family='name' qualifier='last' visibility='ADMIN' time=20 value='Dough'
.
Using hash and modulo on the key fields is a good way to accomplish this and
AbstractHashSampler
provides a good basis for implementation.
- Since:
- 1.8.0
-
Method Summary
Modifier and TypeMethodDescriptionboolean
void
init
(SamplerConfiguration config) An implementation of Sampler must have a noarg constructor.default void
validateOptions
(Map<String, String> config)
-
Method Details
-
init
An implementation of Sampler must have a noarg constructor. After construction this method is called once to initialize a sampler before it is used.- Parameters:
config
- Configuration options for a sampler.
-
accept
- Parameters:
k
- A key that was written to a rfile.- Returns:
- True if the key (and its associated value) should be stored in the rfile's sample. Return false if it should not be included.
-
validateOptions
- Parameters:
config
- Sampler options configuration to validate. Validates option and value.
-