Export/Import Example
Accumulo provides a mechanism to export and import tables. This README shows how to use this feature.
The shell session below shows creating a table, inserting data, and exporting the table. A table must be offline to export it, and it should remain offline for the duration of the distcp. An easy way to take a table offline without interuppting access to it is to clone it and take the clone offline.
root@test17> createtable table1
root@test17 table1> insert a cf1 cq1 v1
root@test17 table1> insert h cf1 cq1 v2
root@test17 table1> insert z cf1 cq1 v3
root@test17 table1> insert z cf1 cq2 v4
root@test17 table1> addsplits -t table1 b r
root@test17 table1> scan
a cf1:cq1 [] v1
h cf1:cq1 [] v2
z cf1:cq1 [] v3
z cf1:cq2 [] v4
root@test17> config -t table1 -s table.split.threshold=100M
root@test17 table1> clonetable table1 table1_exp
root@test17 table1> offline table1_exp
root@test17 table1> exporttable -t table1_exp /tmp/table1_export
root@test17 table1> quit
After executing the export command, a few files are created in the hdfs dir. One of the files is a list of files to distcp as shown below.
$ hadoop fs -ls /tmp/table1_export
Found 2 items
-rw-r--r-- 3 user supergroup 162 2012-07-25 09:56 /tmp/table1_export/distcp.txt
-rw-r--r-- 3 user supergroup 821 2012-07-25 09:56 /tmp/table1_export/exportMetadata.zip
$ hadoop fs -cat /tmp/table1_export/distcp.txt
hdfs://n1.example.com:6093/accumulo/tables/3/default_tablet/F0000000.rf
hdfs://n1.example.com:6093/tmp/table1_export/exportMetadata.zip
Before the table can be imported, it must be copied using distcp. After the discp completed, the cloned table may be deleted.
$ hadoop distcp -f /tmp/table1_export/distcp.txt /tmp/table1_export_dest
The Accumulo shell session below shows importing the table and inspecting it. The data, splits, config, and logical time information for the table were preserved.
root@test17> importtable table1_copy /tmp/table1_export_dest
root@test17> table table1_copy
root@test17 table1_copy> scan
a cf1:cq1 [] v1
h cf1:cq1 [] v2
z cf1:cq1 [] v3
z cf1:cq2 [] v4
root@test17 table1_copy> getsplits -t table1_copy
b
r
root@test17> config -t table1_copy -f split
---------+--------------------------+-------------------------------------------
SCOPE | NAME | VALUE
---------+--------------------------+-------------------------------------------
default | table.split.threshold .. | 1G
table | @override ........... | 100M
---------+--------------------------+-------------------------------------------
root@test17> tables -l
accumulo.metadata => !0
accumulo.root => +r
table1_copy => 5
trace => 1
root@test17 table1_copy> scan -t accumulo.metadata -b 5 -c srv:time
5;b srv:time [] M1343224500467
5;r srv:time [] M1343224500467
5< srv:time [] M1343224500467