Class AccumuloOutputFormat
java.lang.Object
org.apache.accumulo.core.client.mapred.AccumuloOutputFormat
- All Implemented Interfaces:
org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,
Mutation>
public class AccumuloOutputFormat
extends Object
implements org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>
This class allows MapReduce jobs to use Accumulo as the sink for data. This
OutputFormat
accepts keys and values of type Text
(for a table name) and Mutation
from the Map
and Reduce functions.
The user must specify the following via static configurator methods:
Other static methods are optional.-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
A base class to be used to createRecordWriter
instances that write to Accumulo. -
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected static Boolean
canCreateTables
(org.apache.hadoop.mapred.JobConf job) Determines whether tables are permitted to be created as needed.void
checkOutputSpecs
(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job) protected static AuthenticationToken
getAuthenticationToken
(org.apache.hadoop.mapred.JobConf job) Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.protected static BatchWriterConfig
getBatchWriterOptions
(org.apache.hadoop.mapred.JobConf job) Gets theBatchWriterConfig
settings.protected static String
getDefaultTableName
(org.apache.hadoop.mapred.JobConf job) Gets the default table name from the configuration.protected static Instance
getInstance
(org.apache.hadoop.mapred.JobConf job) Initializes an AccumuloInstance
based on the configuration.protected static org.apache.log4j.Level
getLogLevel
(org.apache.hadoop.mapred.JobConf job) Gets the log level from this configuration.protected static String
getPrincipal
(org.apache.hadoop.mapred.JobConf job) Gets the principal from the configuration.org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,
Mutation> getRecordWriter
(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress) protected static Boolean
getSimulationMode
(org.apache.hadoop.mapred.JobConf job) Determines whether this feature is enabled.protected static byte[]
getToken
(org.apache.hadoop.mapred.JobConf job) Deprecated.protected static String
getTokenClass
(org.apache.hadoop.mapred.JobConf job) Deprecated.since 1.6.0; UsegetAuthenticationToken(JobConf)
instead.protected static Boolean
isConnectorInfoSet
(org.apache.hadoop.mapred.JobConf job) Determines if the connector has been configured.static void
setBatchWriterOptions
(org.apache.hadoop.mapred.JobConf job, BatchWriterConfig bwConfig) Sets the configuration for for the job'sBatchWriter
instances.static void
setConnectorInfo
(org.apache.hadoop.mapred.JobConf job, String principal, String tokenFile) Sets the connector information needed to communicate with Accumulo in this job.static void
setConnectorInfo
(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token) Sets the connector information needed to communicate with Accumulo in this job.static void
setCreateTables
(org.apache.hadoop.mapred.JobConf job, boolean enableFeature) Sets the directive to create new tables, as necessary.static void
setDefaultTableName
(org.apache.hadoop.mapred.JobConf job, String tableName) Sets the default table name to use if one emits a null in place of a table name for a given mutation.static void
setLogLevel
(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level) Sets the log level for this job.static void
setMockInstance
(org.apache.hadoop.mapred.JobConf job, String instanceName) Deprecated.since 1.8.0; use MiniAccumuloCluster or a standard mock frameworkstatic void
setSimulationMode
(org.apache.hadoop.mapred.JobConf job, boolean enableFeature) Sets the directive to use simulation mode for this job.static void
setZooKeeperInstance
(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers) Deprecated.since 1.6.0; UsesetZooKeeperInstance(JobConf, ClientConfiguration)
instead.static void
setZooKeeperInstance
(org.apache.hadoop.mapred.JobConf job, ClientConfiguration clientConfig) Configures aZooKeeperInstance
for this job.
-
Field Details
-
log
protected static final org.apache.log4j.Logger log
-
-
Constructor Details
-
AccumuloOutputFormat
public AccumuloOutputFormat()
-
-
Method Details
-
setConnectorInfo
public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token) throws AccumuloSecurityException Sets the connector information needed to communicate with Accumulo in this job.WARNING: Some tokens, when serialized, divulge sensitive information in the configuration as a means to pass the token to MapReduce tasks. This information is BASE64 encoded to provide a charset safe conversion to a string, but this conversion is not intended to be secure.
PasswordToken
is one example that is insecure in this way; howeverDelegationToken
s, acquired usingSecurityOperations.getDelegationToken(DelegationTokenConfig)
, is not subject to this concern.- Parameters:
job
- the Hadoop job instance to be configuredprincipal
- a valid Accumulo user name (user must have Table.CREATE permission ifsetCreateTables(JobConf, boolean)
is set to true)token
- the user's password- Throws:
AccumuloSecurityException
- Since:
- 1.5.0
-
setConnectorInfo
public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, String tokenFile) throws AccumuloSecurityException Sets the connector information needed to communicate with Accumulo in this job.Stores the password in a file in HDFS and pulls that into the Distributed Cache in an attempt to be more secure than storing it in the Configuration.
- Parameters:
job
- the Hadoop job instance to be configuredprincipal
- a valid Accumulo user name (user must have Table.CREATE permission ifsetCreateTables(JobConf, boolean)
is set to true)tokenFile
- the path to the password file- Throws:
AccumuloSecurityException
- Since:
- 1.6.0
-
isConnectorInfoSet
Determines if the connector has been configured.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- true if the connector has been configured, false otherwise
- Since:
- 1.5.0
- See Also:
-
getPrincipal
Gets the principal from the configuration.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- the user name
- Since:
- 1.5.0
- See Also:
-
getTokenClass
Deprecated.since 1.6.0; UsegetAuthenticationToken(JobConf)
instead.Gets the serialized token class from either the configuration or the token file.- Since:
- 1.5.0
-
getToken
Deprecated.since 1.6.0; UsegetAuthenticationToken(JobConf)
instead.Gets the serialized token from either the configuration or the token file.- Since:
- 1.5.0
-
getAuthenticationToken
Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.- Parameters:
job
- the Hadoop job instance to be configured- Returns:
- the principal's authentication token
- Since:
- 1.6.0
- See Also:
-
setZooKeeperInstance
@Deprecated public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers) Deprecated.since 1.6.0; UsesetZooKeeperInstance(JobConf, ClientConfiguration)
instead.Configures aZooKeeperInstance
for this job.- Parameters:
job
- the Hadoop job instance to be configuredinstanceName
- the Accumulo instance namezooKeepers
- a comma-separated list of zookeeper servers- Since:
- 1.5.0
-
setZooKeeperInstance
public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, ClientConfiguration clientConfig) Configures aZooKeeperInstance
for this job.- Parameters:
job
- the Hadoop job instance to be configuredclientConfig
- client configuration for specifying connection timeouts, SSL connection options, etc.- Since:
- 1.6.0
-
setMockInstance
@Deprecated public static void setMockInstance(org.apache.hadoop.mapred.JobConf job, String instanceName) Deprecated.since 1.8.0; use MiniAccumuloCluster or a standard mock frameworkConfigures aMockInstance
for this job.- Parameters:
job
- the Hadoop job instance to be configuredinstanceName
- the Accumulo instance name- Since:
- 1.5.0
-
getInstance
Initializes an AccumuloInstance
based on the configuration.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- an Accumulo instance
- Since:
- 1.5.0
- See Also:
-
setLogLevel
public static void setLogLevel(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level) Sets the log level for this job.- Parameters:
job
- the Hadoop job instance to be configuredlevel
- the logging level- Since:
- 1.5.0
-
getLogLevel
protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapred.JobConf job) Gets the log level from this configuration.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- the log level
- Since:
- 1.5.0
- See Also:
-
setDefaultTableName
Sets the default table name to use if one emits a null in place of a table name for a given mutation. Table names can only be alpha-numeric and underscores.- Parameters:
job
- the Hadoop job instance to be configuredtableName
- the table to use when the tablename is null in the write call- Since:
- 1.5.0
-
getDefaultTableName
Gets the default table name from the configuration.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- the default table name
- Since:
- 1.5.0
- See Also:
-
setBatchWriterOptions
public static void setBatchWriterOptions(org.apache.hadoop.mapred.JobConf job, BatchWriterConfig bwConfig) Sets the configuration for for the job'sBatchWriter
instances. If not set, a newBatchWriterConfig
, with sensible built-in defaults is used. Setting the configuration multiple times overwrites any previous configuration.- Parameters:
job
- the Hadoop job instance to be configuredbwConfig
- the configuration for theBatchWriter
- Since:
- 1.5.0
-
getBatchWriterOptions
Gets theBatchWriterConfig
settings.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- the configuration object
- Since:
- 1.5.0
- See Also:
-
setCreateTables
public static void setCreateTables(org.apache.hadoop.mapred.JobConf job, boolean enableFeature) Sets the directive to create new tables, as necessary. Table names can only be alpha-numeric and underscores.By default, this feature is disabled.
- Parameters:
job
- the Hadoop job instance to be configuredenableFeature
- the feature is enabled if true, disabled otherwise- Since:
- 1.5.0
-
canCreateTables
Determines whether tables are permitted to be created as needed.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- true if the feature is disabled, false otherwise
- Since:
- 1.5.0
- See Also:
-
setSimulationMode
public static void setSimulationMode(org.apache.hadoop.mapred.JobConf job, boolean enableFeature) Sets the directive to use simulation mode for this job. In simulation mode, no output is produced. This is useful for testing.By default, this feature is disabled.
- Parameters:
job
- the Hadoop job instance to be configuredenableFeature
- the feature is enabled if true, disabled otherwise- Since:
- 1.5.0
-
getSimulationMode
Determines whether this feature is enabled.- Parameters:
job
- the Hadoop context for the configured job- Returns:
- true if the feature is enabled, false otherwise
- Since:
- 1.5.0
- See Also:
-
checkOutputSpecs
public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job) throws IOException - Specified by:
checkOutputSpecs
in interfaceorg.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,
Mutation> - Throws:
IOException
-
getRecordWriter
public org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Mutation> getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress) throws IOException - Specified by:
getRecordWriter
in interfaceorg.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,
Mutation> - Throws:
IOException
-
getAuthenticationToken(JobConf)
instead.