Apache Accumulo Export/Import Example

Accumulo provides a mechanism to export and import tables. This README shows how to use this feature.

The shell session below shows creating a table, inserting data, and exporting the table. A table must be offline to export it, and it should remain offline for the duration of the distcp. An easy way to take a table offline without interuppting access to it is to clone it and take the clone offline.

root@test15> createtable table1
root@test15 table1> insert a cf1 cq1 v1
root@test15 table1> insert h cf1 cq1 v2
root@test15 table1> insert z cf1 cq1 v3
root@test15 table1> insert z cf1 cq2 v4
root@test15 table1> addsplits -t table1 b r
root@test15 table1> scan
a cf1:cq1 []    v1
h cf1:cq1 []    v2
z cf1:cq1 []    v3
z cf1:cq2 []    v4
root@test15> config -t table1 -s table.split.threshold=100M
root@test15 table1> clonetable table1 table1_exp
root@test15 table1> offline table1_exp
root@test15 table1> exporttable -t table1_exp /tmp/table1_export
root@test15 table1> quit

After executing the export command, a few files are created in the hdfs dir. One of the files is a list of files to distcp as shown below.

$ hadoop fs -ls /tmp/table1_export
Found 2 items
-rw-r--r--   3 user supergroup        162 2012-07-25 09:56 /tmp/table1_export/distcp.txt
-rw-r--r--   3 user supergroup        821 2012-07-25 09:56 /tmp/table1_export/exportMetadata.zip
$ hadoop fs -cat /tmp/table1_export/distcp.txt
hdfs://n1.example.com:6093/accumulo/tables/3/default_tablet/F0000000.rf
hdfs://n1.example.com:6093/tmp/table1_export/exportMetadata.zip

Before the table can be imported, it must be copied using distcp. After the discp completed, the cloned table may be deleted.

$ hadoop distcp -f /tmp/table1_export/distcp.txt /tmp/table1_export_dest

The Accumulo shell session below shows importing the table and inspecting it. The data, splits, config, and logical time information for the table were preserved.

root@test15> importtable table1_copy /tmp/table1_export_dest
root@test15> table table1_copy
root@test15 table1_copy> scan
a cf1:cq1 []    v1
h cf1:cq1 []    v2
z cf1:cq1 []    v3
z cf1:cq2 []    v4
root@test15 table1_copy> getsplits -t table1_copy
b
r
root@test15> config -t table1_copy -f split
---------+--------------------------+-------------------------------------------
SCOPE    | NAME                     | VALUE
---------+--------------------------+-------------------------------------------
default  | table.split.threshold .. | 1G
table    |    @override ........... | 100M
---------+--------------------------+-------------------------------------------
root@test15> tables -l
accumulo.metadata    =>        !0
accumulo.root        =>        +r
table1_copy          =>         5
trace                =>         1
root@test15 table1_copy> scan -t accumulo.metadata -b 5 -c srv:time
5;b srv:time []    M1343224500467
5;r srv:time []    M1343224500467
5< srv:time []    M1343224500467