Hadoop Job
The Hadoop job connects to the Hadoop framework, and it enables the distributed processing of large data sets across clusters of commodity servers. You can expand your enterprise business workflows to include tasks running in your Big Data Hadoop cluster from Control-M using the different Hadoop-supported tools, including Pig, Hive, HDFS File Watcher, Map Reduce Jobs, and Sqoop.
The following table describes the Hadoop job attributes:
Attribute |
Description |
---|---|
Connection Profile |
Defines the connection profile for the job. Rules:
Variable Name: %%HDP-ACCOUNT |
Execution Type |
Determines the execution type for Hadoop job execution, as follows:
Variable Name: %%HDP-EXEC_TYPE |
Pre Commands |
Defines the Pre commands performed before job execution (not for HDFS Commands jobs and Oozie Extractor jobs), and the argument for each command. |
Fail the job if the command fails |
Determines whether the entire job fails if any of the Pre commands fail (not for HDFS Commands jobs and Oozie Extractor jobs). |
Post Commands |
Defines the Post commands performed before job execution (not for HDFS Commands jobs and Oozie Extractor jobs), and the argument for each command. |
Fail the job if the command fails |
Determines whether the entire job fails if any of the Post commands fail (not for HDFS Commands jobs and Oozie Extractor jobs). |
DistCp Job Attributes
The following table describes the DistCp job attributes:
Attribute |
Description |
---|---|
Target Path |
Defines the absolute destination path. Variable Name: %%HDP-DISTCP_TARGET_PATH |
Source Path |
Defines the source paths. Variable Name: %%HDP-DISTCP_SOURCE_PATH-Nxxx_ARG |
Command Line Options |
Defines the sets of attributes and values that are added to the command line. Variable Names:
|
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job outputA tab in the job properties pane in the Monitoring domain that shows the output of a job, which indicates whether a job ended OK, and used, for example, with jobs that check file location. |
Distributed Shell Job Attributes
The following table describes the Distributed Shell job attributes:
Attribute |
Description |
---|---|
Shell Type |
Determines what the Distributed Shell job runs, as follows:
Variable Name: %%HDP-SHELL_TYPE |
Command |
Defines the shell command entry to run for the job execution. Variable Name: %%HDP-SHELL_COMMAND |
Script Full Path |
Defines the full path to the script file which is executed. The script file is located in the HDFS. Variable Name: %%HDP-SHELL_SCRIPT_FULL_PATH |
Shell Script Arguments |
Defines the shell script arguments. Variable Name: %%HDP-SHELL-Nxxx-ARG |
More Options |
Opens more attributes. |
Files/Archives |
Defines the full path to the file or archive to upload as a dependency to the HDFS working directory. Variable Names:
|
Options |
Defines the additional option (Name and Value) to set when executing the job. Variable Names:
|
Environment Variables |
Defines the environment variables for the shell script/command. Variable Name: %%HDP-SHELL_ENV_VARIABLE-Nxxx-ARG |
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job output. |
HDFS Commands Job Attributes
The following table describes the HDFS Commands job attributes:
Attribute |
Description |
---|---|
Command |
Defines the command for the argument to be performed with job execution. Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-CMD |
Arguments |
Defines the argument used by the command. Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-ARG |
HDFS File Watcher Job Attributes
The following table describes the HDFS File Watcher job attributes:
Attribute |
Description |
---|---|
File name full path |
Defines the full path of the file being watched. Variable Name: %%HDP-HDFS_FILE_PATH |
Min detected size |
Determines the minimum file size in bytes to meet the criteria and finish the job as OK. If the file arrives, but the size is not met, the job continues to watch the file. Variable Name: %%HDP-MIN_DETECTED_SIZE |
Max time to wait |
Determines the maximum number of minutes to wait for the file to meet the watching criteria. If criteria are not met (file did not arrive, or minimum size was not reached) the job fails after this maximum number of minutes. Variable Name: %%HDP-MAX_WAIT_TIME |
File Name Variable |
Defines the variable name that is used in succeeding jobs. Variable Name: %%HDP-FW_DETECTED _FILE_NAME_VAR |
Impala Job Attributes
The following table describes the Impala job attributes:
Attribute |
Description |
---|---|
Source |
Determines the source type to run the queries, as follows:
Variable Name: %%HDP-IMPALA_QUERY_SOURCE |
Query File Full Path |
Defines the location of the file used to run the queries. Variable Name: %%HDP-IMPALA_QUERY_FILE_PATH |
Query |
Defines the query command used to run the queries. Variable Name: %%HDP-IMPALA_OPEN_QUERY |
Command Line Options |
Defines the sets of attributes and values that are added to the command line. Variable Name: %%HDP-HDP-IMPALA_CMD_OPTION-Nxxx-ARG |
Hive Job Attributes
The following table describes the Hive job attributes:
Attribute |
Description |
---|---|
Full path to Hive script |
Defines the full path to the Hive script on the Hadoop host. Variable Name: %%HDP-HIVE_SCRIPT_NAME |
Script Parameters |
Defines the list of parameters for the script. Variable Names:
|
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job output. |
Java-Map-Reduce Job Attributes
The following table describes the Java Map-Reduce job attributes:
Attribute |
Description |
---|---|
Full path to Jar |
Defines the full path to the jar containing the Map Reduce Java program on the Hadoop host. Variable Name: %%HDP-JAVA_JAR_NAME |
Main Class |
Defines the class that is included in the jar containing a main function and the map reduce implementation. Variable Name: %%HDP-JAVA_MAIN_CLASS |
Arguments |
Defines the argument used by the command. Variable Name: %%HDP-JAVA_Nxxx_ARG |
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job output. |
Oozie Job Attributes
The following table describes the Oozie job attributes:
Attribute |
Description |
---|---|
Job Properties File |
Defines the job properties file path. Variable Name: %%HDP-OOZIE_JOB_PROPERTIES_FILE |
Job Properties (Add/Overwrite) |
Defines the Oozie job properties. A set of properties is comprised of the following:
You can add new properties or override property values defined in the Job Properties File. |
Rerun from point of failure |
Determines whether to rerun an Oozie job from the point of its failure. |
Pig Job Attributes
The following table describes the Pig job attributes:
Attribute |
Description |
---|---|
Full Path to Pig Program |
Defines the full path to the Pig program on the Hadoop host. Variable Name: %%HDP-PIG_PROG_NAME |
Pig Program Parameters |
Defines the list of program parameters. |
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job output. |
Properties |
Defines a list of properties (Name and Value) to be executed with the job. These properties override the Hadoop defaults. |
Archives |
Defines the location of the Hadoop archives. |
Files |
Defines the location of the Hadoop files. |
Spark Job Attributes
The following table describes the Spark job attributes:
Attribute |
Description |
---|---|
Program Type |
Determines the Spark program type, as follows:
Variable Name: %%HDP-SPARK_PROG_TYPE |
Full Path to Script |
Defines the full path to the python script to execute. Variable Name: %%HDP-SPARK_FULL_PATH_TO_PYTHON_SCRIPT |
Application Jar File |
Defines the path to the jar including your application and all the dependencies. Variable Name: %%HDP-SPARK_APP_JAR_FULL_PATH |
Main Class to Run |
Defines the main class of the application. Variable Name: %%HDP-SPARK_MAIN_CLASS_TO_RUN |
Application Arguments |
Defines the attribute arguments that are added at the end of the Spark command line either after the main class for Java / Scala Applications or after the script of the Python Script. Variable Name: %%HDP-SPARK_Nxxx_ARG |
Command Line Options |
Defines the sets of attributes and values that are added to the command line. Variable Names:
|
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job output. |
Sqoop Job Attributes
The following table describes the Sqoop job attributes:
Attribute |
Description |
---|---|
Command Editor |
Defines any valid Sqoop command necessary for job execution. Sqoop can only be used for job execution if defined in Sqoop connection attributes. HDP-SQOOP_COMMAND |
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job output. |
Properties |
Defines a list of properties (Name and Value) to be executed with the job. These properties override the Hadoop defaults. |
Archives |
Defines the location of the Hadoop archives. |
Files |
Defines the location of the Hadoop files. |
Streaming Job Attributes
The following table describes the Streaming job attributes:
Attribute |
Description |
---|---|
Input Path |
Defines the input file for the Mapper step. Variable Name: %%HDP-INPUT_PATH |
Output Path |
Defines the HDFS output path for the Reducer step. Variable Name: %%HDP-OUTPUT_PATH |
Mapper Command |
Defines the command that runs as a mapper. Variable Name: %%HDP-MAPPER_COMMAND |
Reducer Command |
Defines the command that runs as a reducer. Variable Name: %%HDP-REDUCER_COMMAND |
Streaming Options |
Defines the sets of attributes (Name and Value) that are added to the end of the Streaming command line. Variable Names:
|
Generic Options |
Defines the sets of attributes (Name and Value) that are added to the Streaming command line. Variable Names:
|
Append Yarn aggregated logs to output |
Determines whether to add Yarn aggregated logs to the job output. |
Tajo Job Attributes
The following table describes the Tajo job attributes:
Attribute |
Description |
---|---|
Command Source |
Determines the source of the Tajo command, as follows:
|
Full File Path |
Defines the file path of the input file that runs the Tajo command. |
Open Query |
Defines the query. Variable Name: %%HDP-TAJO_OPEN_QUERY |