Shard Example

Accumulo has an iterator called the intersecting iterator which supports querying a term index that is partitioned by document, or “sharded”. This example shows how to use the intersecting iterator through these four programs:

  • - Indexes a set of text files into an Accumulo table
  • - Finds documents containing a given set of terms.
  • - Reads the index table and writes a map of documents to terms into another table.
  • Uses the table populated by to select N random terms per document. Then it continuously and randomly queries those terms.

To run these example programs, create two tables like below.

username@instance> createtable shard
username@instance shard> createtable doc2term

After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code.

$ cd /local/username/workspace/accumulo/
$ find src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.simple.shard.Index instance zookeepers shard username password 30

The following command queries the index to find all files containing ‘foo’ and ‘bar’.

$ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query instance zookeepers shard username password foo bar

In order to run ContinuousQuery, we need to run to populate doc2term.

$ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Reverse instance zookeepers shard doc2term username password

Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds.

$ ./bin/accumulo org.apache.accumulo.examples.simple.shard.ContinuousQuery instance zookeepers shard doc2term username password 5
[public, core, class, binarycomparable, b] 2  0.081
[wordtodelete, unindexdocument, doctablename, putdelete, insert] 1  0.041
[import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1  0.049
[getpackage, testversion, util, version, 55] 1  0.048
[for, static, println, public, the] 55  0.211
[sleeptime, wrappingiterator, options, long, utilwaitthread] 1  0.057
[string, public, long, 0, wait] 12  0.132