Interface Summarizer
- All Known Implementing Classes:
- AuthorizationSummarizer,- CountingSummarizer,- DeletesSummarizer,- EntryLengthSummarizer,- FamilySummarizer,- VisibilitySummarizer
Summarizer.Collector and Summarizer.Combiner objects to generate
 summary information about the data in the file.
 
 In order to merge summary information from multiple files, Accumulo will use this factory to
 create a Summarizer.Combiner object.
 
Below is an example of a very simple summarizer that will compute the number of deletes, total number of keys, min timestamp and max timestamp.
 
   public class BasicSummarizer implements Summarizer {
     public static final String DELETES_STAT = "deletes";
     public static final String MIN_STAMP_STAT = "minStamp";
     public static final String MAX_STAMP_STAT = "maxStamp";
     public static final String TOTAL_STAT = "total";
     @Override
     public Collector collector(SummarizerConfiguration sc) {
       return new Collector() {
         private long minStamp = Long.MAX_VALUE;
         private long maxStamp = Long.MIN_VALUE;
         private long deletes = 0;
         private long total = 0;
         @Override
         public void accept(Key k, Value v) {
           if (k.getTimestamp() < minStamp) {
             minStamp = k.getTimestamp();
           }
           if (k.getTimestamp() > maxStamp) {
             maxStamp = k.getTimestamp();
           }
           if (k.isDeleted()) {
             deletes++;
           }
           total++;
         }
         @Override
         public void summarize(StatisticConsumer sc) {
           sc.accept(MIN_STAMP_STAT, minStamp);
           sc.accept(MAX_STAMP_STAT, maxStamp);
           sc.accept(DELETES_STAT, deletes);
           sc.accept(TOTAL_STAT, total);
         }
       };
     }
     @Override
     public Combiner combiner(SummarizerConfiguration sc) {
       return (stats1, stats2) -> {
         stats1.merge(DELETES_STAT, stats2.get(DELETES_STAT), Long::sum);
         stats1.merge(TOTAL_STAT, stats2.get(TOTAL_STAT), Long::sum);
         stats1.merge(MIN_STAMP_STAT, stats2.get(MIN_STAMP_STAT), Long::min);
         stats1.merge(MAX_STAMP_STAT, stats2.get(MAX_STAMP_STAT), Long::max);
       };
     }
   }
 
 
 Below is an example summarizer that counts the log of the value length.
 
 public class ValueLogLengthSummarizer implements Summarizer {
  @Override
  public Collector collector(SummarizerConfiguration sc) {
    return new Collector(){
      long[] counts = new long[32];
      @Override
      public void accept(Key k, Value v) {
        int idx;
        if(v.getSize() == 0)
          idx = 0;
        else
          idx = IntMath.log2(v.getSize(), RoundingMode.UP);  //IntMath is from Guava
        counts[idx]++;
      }
      @Override
      public void summarize(StatisticConsumer sc) {
        for (int i = 0; i < counts.length; i++) {
          if(counts[i] > 0) {
            sc.accept(""+(1<<i), counts[i]);
          }
        }
      }
    };
  }
  @Override
  public Combiner combiner(SummarizerConfiguration sc) {
    return (m1, m2) -> m2.forEach((k,v) -> m1.merge(k, v, Long::sum));
  }
 }
 
 
 
 The reason a Summarizer is a factory for a Collector and Combiner is to make it very clear in the
 API that Accumulo uses them independently at different times. Therefore its not advisable to
 share internal state between the Collector and Combiner. The example implementation shows that
 the Collectors design allows for very efficient collection of specialized summary information.
 Creating String + Long pairs is deferred until the summarize method is called.
 
Summary data can be used by Compaction Strategies to decide which files to compact.
Summary data is persisted, so ideally the same summarizer class with the same options should always produce the same results. If you need to change the behavior of a summarizer, then consider doing this by adding a new option. If the same summarizer is configured twice with different options, then Accumulo will store and merge each one separately. This can allow old and new behavior to coexists simultaneously.
- 
Nested Class SummaryNested ClassesModifier and TypeInterfaceDescriptionstatic interfaceWhen Accumulo calls methods in this interface, it will callSummarizer.Collector.accept(Key, Value)zero or more times and then callSummarizer.Collector.summarize(Summarizer.StatisticConsumer)once.static interfaceA Combiner is used to merge statistics emitted fromSummarizer.Collector.summarize(Summarizer.StatisticConsumer)and from previous invocations of itself.static interface
- 
Method SummaryModifier and TypeMethodDescriptionFactory method that creates aSummarizer.Collectorbased on configuration.Factory method that creates aSummarizer.Combiner.
- 
Method Details- 
collectorFactory method that creates aSummarizer.Collectorbased on configuration. EachSummarizer.Collectorcreated by this method should be independent and have its own internal state. Accumulo uses a Collector to generate summary statistics about a sequence of key values written to a file.
- 
combinerFactory method that creates aSummarizer.Combiner. Accumulo will only use the created Combiner to merge data fromSummarizer.Collectors created using the sameSummarizerConfiguration.
 
-