Class TooManyDeletesSelector

All Implemented Interfaces:

public class TooManyDeletesSelector extends Object implements CompactionSelector
This compaction selector works in concert with the DeletesSummarizer. Using the statistics from DeleteSummarizer this strategy will compact all files in a table when the number of deletes/non-deletes exceeds a threshold.

This strategy has two options. First the "threshold" option allows setting the point at which a compaction will be triggered. This options defaults to ".25" and must be in the range (0.0, 1.0]. The second option is "proceed_zero_no_summary" which determines if the strategy should proceed when a bulk imported file has no summary information.

If the delete summarizer was configured on a table that already had files, then those files will have not summary information. This strategy can still proceed in this situation. It will fall back to using Accumulo's estimated entries per file in this case. For the files without summary information the estimated number of deletes will be zero. This fall back method will underestimate deletes which will not lead to false positives, except for the case of bulk imported files. Accumulo estimates that bulk imported files have zero entries. The second option "proceed_zero_no_summary" determines if this strategy should proceed when it sees bulk imported files that do not have summary data. This option defaults to "false".

Bulk files can be generated with summary information by calling RFile.WriterOptions.withSummarizers(SummarizerConfiguration...)

When using this feature, its important to ensure summary cache is on and the summaries fit in the cache.