Class AbstractHashSampler

All Implemented Interfaces:
Direct Known Subclasses:
RowColumnSampler, RowSampler

public abstract class AbstractHashSampler extends Object implements Sampler
A base class that can be used to create Samplers based on hashing. This class offers consistent options for configuring the hash function. The subclass decides which parts of the key to hash.

This class support two options passed into init(SamplerConfiguration). One option is hasher which specifies a hashing algorithm. Valid values for this option are md5, sha1, and murmur3_32. If you are not sure, then choose murmur3_32.

The second option is modulus which can have any positive integer as a value.

Any data where hash(data) % modulus == 0 will be selected for the sample.

  • Constructor Details

    • AbstractHashSampler

      public AbstractHashSampler()
  • Method Details

    • isValidOption

      protected boolean isValidOption(String option)
      Subclasses with options should override this method and return true if the option is valid for the subclass or if super.isValidOption(opt) returns true.
    • init

      public void init(SamplerConfiguration config)
      Subclasses with options should override this method and call super.init(config).
      Specified by:
      init in interface Sampler
      config - Configuration options for a sampler.
    • hash

      protected abstract void hash(DataOutput hasher, Key k) throws IOException
      Subclass must override this method and hash some portion of the key.
      hasher - Data written to this will be used to compute the hash for the key.
    • accept

      public boolean accept(Key k)
      Specified by:
      accept in interface Sampler
      k - A key that was written to a rfile.
      True if the key (and its associtated value) should be stored in the rfile's sample. Return false if it should not be included.