Using Iterators

Accumulo Tour: Using Iterators

Tour page 13 of 13

Suppose at the end of each day we record the number of villains a hero has caught that day. Iterators can assist with this. Iterators are server-side programming mechanisms that allow filtering and aggregation operations on data during scans and during compactions.

Let’s begin by adding some data for our hero’s recording recent crime-stopping statistics.

jshell> client.tableOperations().create("GothamCrimeStats");

jshell> Mutation mutation1 = new Mutation("id0001");
mutation1 ==> org.apache.accumulo.core.data.Mutation@1
jshell> mutation1.put("hero", "alias", "Batman");

// last three days of Batman's statistics
jshell> mutation1.put("hero", "villainsCaptured", "2");
jshell> mutation1.put("hero", "villainsCaptured", "1");
jshell> mutation1.put("hero", "villainsCaptured", "5");

jshell> Mutation mutation2 = new Mutation("id0002");
mutation2 ==> org.apache.accumulo.core.data.Mutation@1
jshell> mutation2.put("hero", "alias", "Robin");

// last three days of Robin's statistics
jshell> mutation2.put("hero", "villainsCaptured", "1");
jshell> mutation2.put("hero", "villainsCaptured", "0");
jshell> mutation2.put("hero", "villainsCaptured", "2");

jshell> try (BatchWriter writer = client.createBatchWriter("GothamCrimeStats")) {
   ...>   writer.addMutation(mutation1);
   ...>   writer.addMutation(mutation2);
   ...> }

Let’s scan to see the data.

jshell> try (ScannerBase scan = client.createScanner("GothamCrimeStats", Authorizations.EMPTY)) {
   ...>   System.out.println("Gotham Police Department Crime Statistics:");
   ...>   for(Map.Entry<Key, Value> entry : scan) {
   ...>     System.out.printf("Key : %-52s  Value : %s\n", entry.getKey(), entry.getValue());
   ...>   }
   ...> }

You may notice a problem. We only see the latest entry. That is due to the versioningIterator which is applied to all tables by default. It filters out all but the latest entry when scanning a table.

In order to see a record of past days we can remove the versioningiterator (you could also choose a set number of past entries to display). The iterator is named vers.

To simplify, we will import some additional packages to save some typing.

jshell> import org.apache.accumulo.core.iterators.IteratorUtil.IteratorScope;
jshell> client.tableOperations().removeIterator("GothamCrimeStats", "vers", EnumSet.allOf(IteratorScope.class));

Now let’s scan again.

jshell> try (ScannerBase scan = client.createScanner("GothamCrimeStats", Authorizations.EMPTY)) {
   ...>   System.out.println("Gotham Police Department Crime Statistics:");
   ...>   for(Map.Entry<Key, Value> entry : scan) {
   ...>     System.out.printf("Key : %-52s  Value : %s\n", entry.getKey(), entry.getValue());
   ...>   }
   ...> }
Gotham Police Department Crime Statistics:
Key : id0001 hero:alias [] 1654697915769 false              Value : Batman
Key : id0001 hero:villainsCaptured [] 1654697915769 false   Value : 5
Key : id0001 hero:villainsCaptured [] 1654697915769 false   Value : 1
Key : id0001 hero:villainsCaptured [] 1654697915769 false   Value : 2
Key : id0002 hero:alias [] 1654697915769 false              Value : Robin
Key : id0002 hero:villainsCaptured [] 1654697915769 false   Value : 2
Key : id0002 hero:villainsCaptured [] 1654697915769 false   Value : 0
Key : id0002 hero:villainsCaptured [] 1654697915769 false   Value : 1

You will now see ALL entries added to the table.

Instead of seeing a daily history, let’s instead keep a running total of captured villains. A summingcombiner can be used to accomplish this.

jshell> import org.apache.accumulo.core.iterators.user.SummingCombiner;
jshell> import org.apache.accumulo.core.iterators.LongCombiner

Create an IteratorSetting object. Set the encoding type and indicate the columns to be summed. Also, it is a good idea to check for any iterator conflicts prior to attaching the iterator to the table.

jshell> IteratorSetting scSetting = new IteratorSetting(30, "sum", SummingCombiner.class);
jshell> LongCombiner.setEncodingType(scSetting, LongCombiner.Type.STRING);
jshell> scSetting.addOption("columns", "hero:villainsCaptured");
jshell> client.tableOperations().checkIteratorConflicts("GothamCrimeStats", scSetting, EnumSet.allOf(IteratorScope.class));
jshell> client.tableOperations().attachIterator("GothamCrimeStats", scSetting);

Let’s scan again and see what results we get.

jshell> try ( org.apache.accumulo.core.client.Scanner scan = client.createScanner("GothamCrimeStats", Authorizations.EMPTY)) {
   ...>   for(Map.Entry<Key, Value> entry : scan) {
   ...>     System.out.printf("Key : %-52s  Value : %s\n", entry.getKey(), entry.getValue());
   ...>   }
   ...> }
Key : id0001 hero:alias [] 1654699186182 false              Value : Batman
Key : id0001 hero:villainsCaptured [] 1654699186182 false   Value : 8
Key : id0002 hero:alias [] 1654699186182 false              Value : Robin
Key : id0002 hero:villainsCaptured [] 1654699186182 false   Value : 3

Adding additional statistics will result in a continual update of the relevant statistic.

jshell> mutation1 = new Mutation("id0001");
jshell> mutation1.put("hero", "villainsCaptured", "4");
jshell> mutation2 = new Mutation("id0002");
jshell> mutation2.put("hero", "villainsCaptured", "2");

jshell> try (BatchWriter writer = client.createBatchWriter("GothamCrimeStats")) {
   ...>   writer.addMutation(mutation1);
   ...>   writer.addMutation(mutation2);
   ...> }

jshell> try ( org.apache.accumulo.core.client.Scanner scan = client.createScanner("GothamCrimeStats", Authorizations.EMPTY)) {
   ...>   for(Map.Entry<Key, Value> entry : scan) {
   ...>     System.out.printf("Key : %-52s  Value : %s\n", entry.getKey(), entry.getValue());
   ...>   }
   ...> }
Key : id0001 hero:alias [] 1654779673027 false              Value : Batman
Key : id0001 hero:villainsCaptured [] 1654780041402 false   Value : 12
Key : id0002 hero:alias [] 1654779673027 false              Value : Robin
Key : id0002 hero:villainsCaptured [] 1654780041402 false   Value : 5

< 13 / 13