Class AbstractInputFormat<K,V>

java.lang.Object
org.apache.accumulo.core.client.mapred.AbstractInputFormat<K,V>
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<K,V>
Direct Known Subclasses:
AccumuloMultiTableInputFormat, InputFormatBase

public abstract class AbstractInputFormat<K,V> extends Object implements org.apache.hadoop.mapred.InputFormat<K,V>
An abstract input format to provide shared methods common to all other input format classes. At the very least, any classes inheriting from this class will need to define their own RecordReader.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    protected static class 
    An abstract base class to be used to create RecordReader instances that convert from Accumulo Key/Value pairs to the user's K/V types.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final Class<?>
     
    protected static final org.apache.log4j.Logger
     
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    protected static AuthenticationToken
    getAuthenticationToken(org.apache.hadoop.mapred.JobConf job)
    Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.
    static String
    getClassLoaderContext(org.apache.hadoop.mapred.JobConf job)
    Returns the name of the current classloader context set on this scanner
    protected static ClientConfiguration
    getClientConfiguration(org.apache.hadoop.mapred.JobConf job)
    Fetch the client configuration from the job.
    getInputTableConfig(org.apache.hadoop.mapred.JobConf job, String tableName)
    Fetches a InputTableConfig that has been set on the configuration for a specific table.
    getInputTableConfigs(org.apache.hadoop.mapred.JobConf job)
    Fetches all InputTableConfigs that have been set on the given Hadoop job.
    protected static Instance
    getInstance(org.apache.hadoop.mapred.JobConf job)
    Initializes an Accumulo Instance based on the configuration.
    protected static org.apache.log4j.Level
    getLogLevel(org.apache.hadoop.mapred.JobConf job)
    Gets the log level from this configuration.
    protected static String
    getPrincipal(org.apache.hadoop.mapred.JobConf job)
    Gets the user name from the configuration.
    protected static Authorizations
    getScanAuthorizations(org.apache.hadoop.mapred.JobConf job)
    Gets the authorizations to set for the scans from the configuration.
    org.apache.hadoop.mapred.InputSplit[]
    getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
    Gets the splits of the tables that have been set on the job by reading the metadata table for the specified ranges.
    protected static org.apache.accumulo.core.client.impl.TabletLocator
    getTabletLocator(org.apache.hadoop.mapred.JobConf job, String tableId)
    Deprecated.
    since 1.7.0 This method returns a type that is not part of the public API and is not guaranteed to be stable.
    protected static Boolean
    isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)
    Determines if the connector has been configured.
    static void
    setClassLoaderContext(org.apache.hadoop.mapred.JobConf job, String context)
    Sets the name of the classloader context on this scanner
    static void
    setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, String tokenFile)
    Sets the connector information needed to communicate with Accumulo in this job.
    static void
    setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token)
    Sets the connector information needed to communicate with Accumulo in this job.
    static void
    setLogLevel(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level)
    Sets the log level for this job.
    static void
    setMockInstance(org.apache.hadoop.mapred.JobConf job, String instanceName)
    Deprecated.
    since 1.8.0; use MiniAccumuloCluster or a standard mock framework
    static void
    setScanAuthorizations(org.apache.hadoop.mapred.JobConf job, Authorizations auths)
    Sets the Authorizations used to scan.
    static void
    setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers)
    Deprecated.
    static void
    setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, ClientConfiguration clientConfig)
    Configures a ZooKeeperInstance for this job.
    protected static void
    validateOptions(org.apache.hadoop.mapred.JobConf job)
    Check whether a configuration is fully configured to be used with an Accumulo InputFormat.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.hadoop.mapred.InputFormat

    getRecordReader
  • Field Details

    • CLASS

      protected static final Class<?> CLASS
    • log

      protected static final org.apache.log4j.Logger log
  • Constructor Details

    • AbstractInputFormat

      public AbstractInputFormat()
  • Method Details

    • setClassLoaderContext

      public static void setClassLoaderContext(org.apache.hadoop.mapred.JobConf job, String context)
      Sets the name of the classloader context on this scanner
      Parameters:
      job - the Hadoop job instance to be configured
      context - name of the classloader context
      Since:
      1.8.0
    • getClassLoaderContext

      public static String getClassLoaderContext(org.apache.hadoop.mapred.JobConf job)
      Returns the name of the current classloader context set on this scanner
      Parameters:
      job - the Hadoop job instance to be configured
      Returns:
      name of the current context
      Since:
      1.8.0
    • setConnectorInfo

      public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token) throws AccumuloSecurityException
      Sets the connector information needed to communicate with Accumulo in this job.

      WARNING: Some tokens, when serialized, divulge sensitive information in the configuration as a means to pass the token to MapReduce tasks. This information is BASE64 encoded to provide a charset safe conversion to a string, but this conversion is not intended to be secure. PasswordToken is one example that is insecure in this way; however DelegationTokens, acquired using SecurityOperations.getDelegationToken(DelegationTokenConfig), is not subject to this concern.

      Parameters:
      job - the Hadoop job instance to be configured
      principal - a valid Accumulo user name (user must have Table.CREATE permission)
      token - the user's password
      Throws:
      AccumuloSecurityException
      Since:
      1.5.0
    • setConnectorInfo

      public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, String tokenFile) throws AccumuloSecurityException
      Sets the connector information needed to communicate with Accumulo in this job.

      Stores the password in a file in HDFS and pulls that into the Distributed Cache in an attempt to be more secure than storing it in the Configuration.

      Parameters:
      job - the Hadoop job instance to be configured
      principal - a valid Accumulo user name (user must have Table.CREATE permission)
      tokenFile - the path to the token file
      Throws:
      AccumuloSecurityException
      Since:
      1.6.0
    • isConnectorInfoSet

      protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)
      Determines if the connector has been configured.
      Parameters:
      job - the Hadoop context for the configured job
      Returns:
      true if the connector has been configured, false otherwise
      Since:
      1.5.0
      See Also:
    • getPrincipal

      protected static String getPrincipal(org.apache.hadoop.mapred.JobConf job)
      Gets the user name from the configuration.
      Parameters:
      job - the Hadoop context for the configured job
      Returns:
      the user name
      Since:
      1.5.0
      See Also:
    • getAuthenticationToken

      protected static AuthenticationToken getAuthenticationToken(org.apache.hadoop.mapred.JobConf job)
      Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.
      Parameters:
      job - the Hadoop context for the configured job
      Returns:
      the principal's authentication token
      Since:
      1.6.0
      See Also:
    • setZooKeeperInstance

      @Deprecated public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers)
      Deprecated.
      Configures a ZooKeeperInstance for this job.
      Parameters:
      job - the Hadoop job instance to be configured
      instanceName - the Accumulo instance name
      zooKeepers - a comma-separated list of zookeeper servers
      Since:
      1.5.0
    • setZooKeeperInstance

      public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, ClientConfiguration clientConfig)
      Configures a ZooKeeperInstance for this job.
      Parameters:
      job - the Hadoop job instance to be configured
      clientConfig - client configuration containing connection options
      Since:
      1.6.0
    • setMockInstance

      @Deprecated public static void setMockInstance(org.apache.hadoop.mapred.JobConf job, String instanceName)
      Deprecated.
      since 1.8.0; use MiniAccumuloCluster or a standard mock framework
      Configures a MockInstance for this job.
      Parameters:
      job - the Hadoop job instance to be configured
      instanceName - the Accumulo instance name
      Since:
      1.5.0
    • getInstance

      protected static Instance getInstance(org.apache.hadoop.mapred.JobConf job)
      Initializes an Accumulo Instance based on the configuration.
      Parameters:
      job - the Hadoop context for the configured job
      Returns:
      an Accumulo instance
      Since:
      1.5.0
      See Also:
    • setLogLevel

      public static void setLogLevel(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level)
      Sets the log level for this job.
      Parameters:
      job - the Hadoop job instance to be configured
      level - the logging level
      Since:
      1.5.0
    • getLogLevel

      protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapred.JobConf job)
      Gets the log level from this configuration.
      Parameters:
      job - the Hadoop context for the configured job
      Returns:
      the log level
      Since:
      1.5.0
      See Also:
    • setScanAuthorizations

      public static void setScanAuthorizations(org.apache.hadoop.mapred.JobConf job, Authorizations auths)
      Sets the Authorizations used to scan. Must be a subset of the user's authorization. Defaults to the empty set.
      Parameters:
      job - the Hadoop job instance to be configured
      auths - the user's authorizations
      Since:
      1.5.0
    • getScanAuthorizations

      protected static Authorizations getScanAuthorizations(org.apache.hadoop.mapred.JobConf job)
      Gets the authorizations to set for the scans from the configuration.
      Parameters:
      job - the Hadoop context for the configured job
      Returns:
      the Accumulo scan authorizations
      Since:
      1.5.0
      See Also:
    • getTabletLocator

      @Deprecated protected static org.apache.accumulo.core.client.impl.TabletLocator getTabletLocator(org.apache.hadoop.mapred.JobConf job, String tableId) throws TableNotFoundException
      Deprecated.
      since 1.7.0 This method returns a type that is not part of the public API and is not guaranteed to be stable. The method was deprecated to discourage its use.
      Initializes an Accumulo TabletLocator based on the configuration.
      Parameters:
      job - the Hadoop context for the configured job
      Returns:
      an Accumulo tablet locator
      Throws:
      TableNotFoundException - if the table name set on the configuration doesn't exist
      Since:
      1.6.0
    • getClientConfiguration

      protected static ClientConfiguration getClientConfiguration(org.apache.hadoop.mapred.JobConf job)
      Fetch the client configuration from the job.
      Parameters:
      job - The job
      Returns:
      The client configuration for the job
      Since:
      1.7.0
    • validateOptions

      protected static void validateOptions(org.apache.hadoop.mapred.JobConf job) throws IOException
      Check whether a configuration is fully configured to be used with an Accumulo InputFormat.
      Parameters:
      job - the Hadoop context for the configured job
      Throws:
      IOException - if the context is improperly configured
      Since:
      1.5.0
    • getInputTableConfigs

      public static Map<String,InputTableConfig> getInputTableConfigs(org.apache.hadoop.mapred.JobConf job)
      Fetches all InputTableConfigs that have been set on the given Hadoop job.
      Parameters:
      job - the Hadoop job instance to be configured
      Returns:
      the InputTableConfig objects set on the job
      Since:
      1.6.0
    • getInputTableConfig

      public static InputTableConfig getInputTableConfig(org.apache.hadoop.mapred.JobConf job, String tableName)
      Fetches a InputTableConfig that has been set on the configuration for a specific table.

      null is returned in the event that the table doesn't exist.

      Parameters:
      job - the Hadoop job instance to be configured
      tableName - the table name for which to grab the config object
      Returns:
      the InputTableConfig for the given table
      Since:
      1.6.0
    • getSplits

      public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits) throws IOException
      Gets the splits of the tables that have been set on the job by reading the metadata table for the specified ranges.
      Specified by:
      getSplits in interface org.apache.hadoop.mapred.InputFormat<K,V>
      Returns:
      the splits from the tables based on the ranges.
      Throws:
      IOException - if a table set on the job doesn't exist or an error occurs initializing the tablet locator