Class ConfigurableScanServerSelector

java.lang.Object
org.apache.accumulo.core.spi.scan.ConfigurableScanServerSelector
All Implemented Interfaces:
ScanServerSelector

public class ConfigurableScanServerSelector extends Object implements ScanServerSelector
The default Accumulo selector for scan servers. This dispatcher will :
  • Hash each tablet to a per attempt configurable number of scan servers and then randomly choose one of those scan servers. Using hashing allows different client to select the same scan servers for a given tablet.
  • Use a per attempt configurable busy timeout.

This class accepts a single configuration that has a json value. To configure this class set scan.server.selector.opts.profiles=<json> in the accumulo client configuration along with the config for the class. The following is the default configuration value.

"[{\'isDefault\':true,\'maxBusyTimeout\':\'5m\',\'busyTimeoutMultiplier\':8, \'scanTypeActivations\':[], \'attemptPlans\':[{\'servers\':\'3\', \'busyTimeout\':\'33ms\', \'salt\':\'one\'},{\'servers\':\'13\', \'busyTimeout\':\'33ms\', \'salt\':\'two\'},{\'servers\':\'100%\', \'busyTimeout\':\'33ms\'}]}]"

The json is structured as a list of profiles, with each profile having the following fields.
  • isDefault : A boolean that specifies whether this is the default profile. One and only one profile must set this to true.
  • maxBusyTimeout : The maximum busy timeout to use. The busy timeout from the last attempt configuration grows exponentially up to this max.
  • scanTypeActivations : A list of scan types that will activate this profile. Scan types are specified by setting scan_type=<scan_type> as execution on the scanner. See ScannerBase.setExecutionHints(Map)
  • group : Scan servers can be started with an optional group. If specified, this option will limit the scan servers used to those that were started with this group name. If not specified, the set of scan servers that did not specify a group will be used. Grouping scan servers supports at least two use cases. First groups can be used to dedicate resources for certain scans. Second groups can be used to have different hardware/VM types for scans, for example could have some scans use expensive high memory VMs and others use cheaper burstable VMs.
  • attemptPlans : A list of configuration to use for each scan attempt. Each list object has the following fields:
    • servers : The number of servers to randomly choose from for this attempt.
    • busyTimeout : The busy timeout to use for this attempt.
    • salt : An optional string to append when hashing the tablet. When this is set differently for attempts it has the potential to cause the set of servers chosen from to be disjoint. When not set or the same, the servers between attempts will be subsets.

Below is an example configuration with two profiles, one is the default and the other is used when the scan execution hint scan_type=slow is set.

    [
     {
       "isDefault":true,
       "maxBusyTimeout":"5m",
       "busyTimeoutMultiplier":4,
       "attemptPlans":[
         {"servers":"3", "busyTimeout":"33ms"},
         {"servers":"100%", "busyTimeout":"100ms"}
       ]
     },
     {
       "scanTypeActivations":["slow"],
       "maxBusyTimeout":"20m",
       "busyTimeoutMultiplier":8,
       "group":"lowcost",
       "attemptPlans":[
         {"servers":"1", "busyTimeout":"10s"},
         {"servers":"3", "busyTimeout":"30s","salt":"42"},
         {"servers":"9", "busyTimeout":"60s","salt":"84"}
       ]
     }
    ]
 

For the default profile in the example it will start off by choosing randomly from 3 scan servers based on a hash of the tablet with no salt. For the first attempt it will use a busy timeout of 33 milliseconds. If the first attempt returns with busy, then it will randomly choose from 100% or all servers for the second attempt and use a busy timeout of 100ms. For subsequent attempts it will keep choosing from all servers and start multiplying the busy timeout by 4 until the max busy timeout of 4 minutes is reached.

For the profile activated by scan_type=slow it start off by choosing randomly from 1 scan server based on a hash of the tablet with no salt and a busy timeout of 10s. The second attempt will choose from 3 scan servers based on a hash of the tablet plus the salt 42. Without the salt, the single scan servers from the first attempt would always be included in the set of 3. With the salt the single scan server from the first attempt may not be included. The third attempt will choose a scan server from 9 using the salt 84 and a busy timeout of 60s. The different salt means the set of servers that attempts 2 and 3 choose from may be disjoint. Attempt 4 and greater will continue to choose from the same 9 servers as attempt 3 and will keep increasing the busy timeout by multiplying 8 until the maximum of 20 minutes is reached. For this profile it will choose from scan servers in the group lowcost.

Since:
2.1.0