org.apache.accumulo.core.client.mapred.AccumuloOutputFormat

All Implemented Interfaces:: org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>

public class AccumuloOutputFormat extends Object implements org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>

This class allows MapReduce jobs to use Accumulo as the sink for data. This OutputFormat accepts keys and values of type Text (for a table name) and Mutation from the Map and Reduce functions. The user must specify the following via static configurator methods:

setConnectorInfo(JobConf, String, AuthenticationToken)
setConnectorInfo(JobConf, String, String)
setZooKeeperInstance(JobConf, ClientConfiguration)

Other static methods are optional.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

protected static class

AccumuloOutputFormat.AccumuloRecordWriter

A base class to be used to create RecordWriter instances that write to Accumulo.
Field Summary

Fields

Modifier and Type

Field

Description

protected static final org.apache.log4j.Logger

log
Constructor Summary

Constructors

Constructor

Description

AccumuloOutputFormat()
Method Summary

Modifier and Type

Method

Description

protected static Boolean

canCreateTables(org.apache.hadoop.mapred.JobConf job)

Determines whether tables are permitted to be created as needed.

void

checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job)

protected static AuthenticationToken

getAuthenticationToken(org.apache.hadoop.mapred.JobConf job)

Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.

protected static BatchWriterConfig

getBatchWriterOptions(org.apache.hadoop.mapred.JobConf job)

Gets the BatchWriterConfig settings.

protected static String

getDefaultTableName(org.apache.hadoop.mapred.JobConf job)

Gets the default table name from the configuration.

protected static Instance

getInstance(org.apache.hadoop.mapred.JobConf job)

Initializes an Accumulo Instance based on the configuration.

protected static org.apache.log4j.Level

getLogLevel(org.apache.hadoop.mapred.JobConf job)

Gets the log level from this configuration.

protected static String

getPrincipal(org.apache.hadoop.mapred.JobConf job)

Gets the principal from the configuration.

org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Mutation>

getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress)

protected static Boolean

getSimulationMode(org.apache.hadoop.mapred.JobConf job)

Determines whether this feature is enabled.

protected static byte[]

getToken(org.apache.hadoop.mapred.JobConf job)

Deprecated.
since 1.6.0; Use getAuthenticationToken(JobConf) instead.

protected static String

getTokenClass(org.apache.hadoop.mapred.JobConf job)

Deprecated.
since 1.6.0; Use getAuthenticationToken(JobConf) instead.

protected static Boolean

isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)

Determines if the connector has been configured.

static void

setBatchWriterOptions(org.apache.hadoop.mapred.JobConf job, BatchWriterConfig bwConfig)

Sets the configuration for for the job's BatchWriter instances.

static void

setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, String tokenFile)

Sets the connector information needed to communicate with Accumulo in this job.

static void

setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token)

Sets the connector information needed to communicate with Accumulo in this job.

static void

setCreateTables(org.apache.hadoop.mapred.JobConf job, boolean enableFeature)

Sets the directive to create new tables, as necessary.

static void

setDefaultTableName(org.apache.hadoop.mapred.JobConf job, String tableName)

Sets the default table name to use if one emits a null in place of a table name for a given mutation.

static void

setLogLevel(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level)

Sets the log level for this job.

static void

setMockInstance(org.apache.hadoop.mapred.JobConf job, String instanceName)

Deprecated.
since 1.8.0; use MiniAccumuloCluster or a standard mock framework

static void

setSimulationMode(org.apache.hadoop.mapred.JobConf job, boolean enableFeature)

Sets the directive to use simulation mode for this job.

static void

setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers)

Deprecated.
since 1.6.0; Use setZooKeeperInstance(JobConf, ClientConfiguration) instead.

static void

setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, ClientConfiguration clientConfig)

Configures a ZooKeeperInstance for this job.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- log
  
  protected static final org.apache.log4j.Logger log
Constructor Details
- AccumuloOutputFormat
  
  public AccumuloOutputFormat()
Method Details
- setConnectorInfo
  
  public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token) throws AccumuloSecurityException
  
  Sets the connector information needed to communicate with Accumulo in this job.
  WARNING: Some tokens, when serialized, divulge sensitive information in the configuration as a means to pass the token to MapReduce tasks. This information is BASE64 encoded to provide a charset safe conversion to a string, but this conversion is not intended to be secure. PasswordToken is one example that is insecure in this way; however DelegationTokens, acquired using SecurityOperations.getDelegationToken(DelegationTokenConfig), is not subject to this concern.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  principal - a valid Accumulo user name (user must have Table.CREATE permission if setCreateTables(JobConf, boolean) is set to true)
  
  token - the user's password
  
  Throws:
  
  AccumuloSecurityException
  
  Since:
  
  1.5.0
- setConnectorInfo
  
  public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, String tokenFile) throws AccumuloSecurityException
  
  Sets the connector information needed to communicate with Accumulo in this job.
  Stores the password in a file in HDFS and pulls that into the Distributed Cache in an attempt to be more secure than storing it in the Configuration.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  principal - a valid Accumulo user name (user must have Table.CREATE permission if setCreateTables(JobConf, boolean) is set to true)
  
  tokenFile - the path to the password file
  
  Throws:
  
  AccumuloSecurityException
  
  Since:
  
  1.6.0
- isConnectorInfoSet
  
  protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)
  
  Determines if the connector has been configured.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  true if the connector has been configured, false otherwise
  
  Since:
  
  1.5.0
  
  See Also:
  
  setConnectorInfo(JobConf, String, AuthenticationToken)
- getPrincipal
  
  protected static String getPrincipal(org.apache.hadoop.mapred.JobConf job)
  
  Gets the principal from the configuration.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  the user name
  
  Since:
  
  1.5.0
  
  See Also:
  
  setConnectorInfo(JobConf, String, AuthenticationToken)
- getTokenClass
  
  @Deprecated protected static String getTokenClass(org.apache.hadoop.mapred.JobConf job)
  
  Deprecated.
  since 1.6.0; Use getAuthenticationToken(JobConf) instead.
  
  Gets the serialized token class from either the configuration or the token file.
  
  Since:
  
  1.5.0
- getToken
  
  @Deprecated protected static byte[] getToken(org.apache.hadoop.mapred.JobConf job)
  
  Deprecated.
  since 1.6.0; Use getAuthenticationToken(JobConf) instead.
  
  Gets the serialized token from either the configuration or the token file.
  
  Since:
  
  1.5.0
- getAuthenticationToken
  
  protected static AuthenticationToken getAuthenticationToken(org.apache.hadoop.mapred.JobConf job)
  
  Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  Returns:
  
  the principal's authentication token
  
  Since:
  
  1.6.0
  
  See Also:
  
  setConnectorInfo(JobConf, String, AuthenticationToken)
  
  setConnectorInfo(JobConf, String, String)
- setZooKeeperInstance
  
  @Deprecated public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers)
  
  Deprecated.
  since 1.6.0; Use setZooKeeperInstance(JobConf, ClientConfiguration) instead.
  
  Configures a ZooKeeperInstance for this job.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  instanceName - the Accumulo instance name
  
  zooKeepers - a comma-separated list of zookeeper servers
  
  Since:
  
  1.5.0
- setZooKeeperInstance
  
  public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, ClientConfiguration clientConfig)
  
  Configures a ZooKeeperInstance for this job.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  clientConfig - client configuration for specifying connection timeouts, SSL connection options, etc.
  
  Since:
  
  1.6.0
- setMockInstance
  
  @Deprecated public static void setMockInstance(org.apache.hadoop.mapred.JobConf job, String instanceName)
  
  Deprecated.
  since 1.8.0; use MiniAccumuloCluster or a standard mock framework
  
  Configures a MockInstance for this job.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  instanceName - the Accumulo instance name
  
  Since:
  
  1.5.0
- getInstance
  
  protected static Instance getInstance(org.apache.hadoop.mapred.JobConf job)
  
  Initializes an Accumulo Instance based on the configuration.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  an Accumulo instance
  
  Since:
  
  1.5.0
  
  See Also:
  
  setZooKeeperInstance(JobConf, ClientConfiguration)
- setLogLevel
  
  public static void setLogLevel(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level)
  
  Sets the log level for this job.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  level - the logging level
  
  Since:
  
  1.5.0
- getLogLevel
  
  protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapred.JobConf job)
  
  Gets the log level from this configuration.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  the log level
  
  Since:
  
  1.5.0
  
  See Also:
  
  setLogLevel(JobConf, Level)
- setDefaultTableName
  
  public static void setDefaultTableName(org.apache.hadoop.mapred.JobConf job, String tableName)
  
  Sets the default table name to use if one emits a null in place of a table name for a given mutation. Table names can only be alpha-numeric and underscores.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  tableName - the table to use when the tablename is null in the write call
  
  Since:
  
  1.5.0
- getDefaultTableName
  
  protected static String getDefaultTableName(org.apache.hadoop.mapred.JobConf job)
  
  Gets the default table name from the configuration.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  the default table name
  
  Since:
  
  1.5.0
  
  See Also:
  
  setDefaultTableName(JobConf, String)
- setBatchWriterOptions
  
  public static void setBatchWriterOptions(org.apache.hadoop.mapred.JobConf job, BatchWriterConfig bwConfig)
  
  Sets the configuration for for the job's BatchWriter instances. If not set, a new BatchWriterConfig, with sensible built-in defaults is used. Setting the configuration multiple times overwrites any previous configuration.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  bwConfig - the configuration for the BatchWriter
  
  Since:
  
  1.5.0
- getBatchWriterOptions
  
  protected static BatchWriterConfig getBatchWriterOptions(org.apache.hadoop.mapred.JobConf job)
  
  Gets the BatchWriterConfig settings.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  the configuration object
  
  Since:
  
  1.5.0
  
  See Also:
  
  setBatchWriterOptions(JobConf, BatchWriterConfig)
- setCreateTables
  
  public static void setCreateTables(org.apache.hadoop.mapred.JobConf job, boolean enableFeature)
  
  Sets the directive to create new tables, as necessary. Table names can only be alpha-numeric and underscores.
  By default, this feature is disabled.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  enableFeature - the feature is enabled if true, disabled otherwise
  
  Since:
  
  1.5.0
- canCreateTables
  
  protected static Boolean canCreateTables(org.apache.hadoop.mapred.JobConf job)
  
  Determines whether tables are permitted to be created as needed.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  true if the feature is disabled, false otherwise
  
  Since:
  
  1.5.0
  
  See Also:
  
  setCreateTables(JobConf, boolean)
- setSimulationMode
  
  public static void setSimulationMode(org.apache.hadoop.mapred.JobConf job, boolean enableFeature)
  
  Sets the directive to use simulation mode for this job. In simulation mode, no output is produced. This is useful for testing.
  By default, this feature is disabled.
  
  Parameters:
  
  job - the Hadoop job instance to be configured
  
  enableFeature - the feature is enabled if true, disabled otherwise
  
  Since:
  
  1.5.0
- getSimulationMode
  
  protected static Boolean getSimulationMode(org.apache.hadoop.mapred.JobConf job)
  
  Determines whether this feature is enabled.
  Parameters:
  
  job - the Hadoop context for the configured job
  
  Returns:
  
  true if the feature is enabled, false otherwise
  
  Since:
  
  1.5.0
  
  See Also:
  
  setSimulationMode(JobConf, boolean)
- checkOutputSpecs
  
  public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Specified by:
  
  checkOutputSpecs in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>
  
  Throws:
  
  IOException
- getRecordWriter
  
  public org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.Text,Mutation> getRecordWriter(org.apache.hadoop.fs.FileSystem ignored, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress) throws IOException
  
  Specified by:
  
  getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.Text,Mutation>
  
  Throws:
  
  IOException

Class AccumuloOutputFormat

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

log

Constructor Details

AccumuloOutputFormat

Method Details

setConnectorInfo

setConnectorInfo

isConnectorInfoSet

getPrincipal

getTokenClass

getToken

getAuthenticationToken

setZooKeeperInstance

setZooKeeperInstance

setMockInstance

getInstance

setLogLevel

getLogLevel

setDefaultTableName

getDefaultTableName

setBatchWriterOptions

getBatchWriterOptions

setCreateTables

canCreateTables

setSimulationMode

getSimulationMode

checkOutputSpecs

getRecordWriter