Big Data Analytics Jobs

The following sections provide information about the parameters and attributes of jobs that work with Big Data Analytics platforms and services.

Azure Databricks Job

Azure Databricks is a cloud-based data analytics platform that enables you to process large workloads of data.

The following table describes Azure Databricks job attributes.

Attribute	Description
Connection profile	Defines the connection profile for the job. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Spaces Variable Name: %%AZURE-ACCOUNT
Databricks Job ID	Determines the ID of the job created in your Databricks workspace.
Parameters	Defines task parameters to override when the job runs, according to the Databricks convention. The list of parameters must begin with the name of the parameter type. For example: "notebook_params":{"param1":"val1", "param2":"val2"} "jar_params": ["param1", "param2"] For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided through the Azure Databricks documentation. For no parameters, specify the following value: "params": {}
Idempotency Token	(Optional) Defines a token to use to rerun job runs that timed out in Databricks. Values: Control-M-Idem_%%ORDERID — With this token, upon rerun, Control-M invokes the monitoring of the existing job run in Databricks. Default. Any other value — Replaces the Control-M idempotency token. When you rerun a job using a different token, Databricks creates a new job run with a new unique run ID.
Status Polling Frequency	(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 30

Azure HDInsight Job

Azure HDInsight enables you to run an Apache Spark batch job for big data analytics.

The following table describes Azure HDInsight job parameters:

Attribute	Description
Connection Profile	Defines the name of a connection profile to use to connect to the Azure HDInsight workspace.
Parameters	Determines which parameters are passed to the Apache Spark Application during job execution, in JSON format (name:value pairs). This JSON must include the file and className elements.
Polling Intervals	Determines the number of seconds to wait before the Apache Spark batch job is verified. Default: 10 seconds
Status Polling Interval	Determines whether logs from Apache Spark appear in the job output.

Databricks Job

The Databricks job enables you to integrate jobs created in the Databricks environment with your existing Control-M workflows. The following table describes Databricks job parameters:

Attribute	Description
Connection Profile	Determines which connection profile to use to connect to the Databricks workspace.
Databricks Job ID	Determines the job ID created in your Databricks workspace.
Parameters	Defines task parameters to override when the job runs, according to the Databricks convention. The list of parameters must begin with the name of the parameter type. For example: "notebook_params":{"param1":"val1", "param2":"val2"} "jar_params": ["param1", "param2"] For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided through the Azure Databricks documentation. For no parameters, specify the following value: "params": {}
Idempotency Token	(Optional) Defines a token to use to rerun job runs that timed out in Databricks. Values: Control-M-Idem_%%ORDERID — With this token, upon rerun, Control-M invokes the monitoring of the existing job run in Databricks. Default. Any other value — Replaces the Control-M idempotency token. When you rerun a job using a different token, Databricks creates a new job run with a new unique run ID.
Status Polling Frequency	(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 30

Snowflake Job

Snowflake is a cloud computing platform that you can use for data storage, processing, and analysis.

The following table describes the Snowflake job type attributes.

Attribute	Action	Description
Connection Profile	N/A	Defines the connection profile for the job. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Spaces
Database	N/A	Determines the database that the job uses.
Schema	N/A	Determines the schema that the job uses. A schema is an organizational model that describes the layout and definition of fields and tables, and their relationships to each other, in a database.
Action	N/A	Determines one of the following Snowflake actions to perform: SQL Statement: Runs any number of Snowflake-supported SQL statements, such as queries, calling or creating procedures, database maintenance tasks, and creating and editing tables. Copy from Query: Copies a queried database and schema into an existing or new file in cloud storage. Copy from Table: Copies from an existing table. Create Table and Query: Creates a table, populated by a query, in the specified database and schema. Create Snowpipe: Creates a Snowpipe and saves it to a file in cloud storage. Start or Pause Snowpipe: Starts or pauses an existing Snowpipe. Stored Procedure: Calls an existing procedure and its arguments. Snowpipe Load Status: Monitors the status of a Snowpipe for a set period of time.
Snowflake SQL Statement	SQL Statement	Determines one or more Snowflake-supported SQL commands. Rule: Must be written in a single line, with strings separated by one space only.
Statement Timeout	All Actions	Determines the maximum number of seconds to run the job in Snowflake.
Show More Options	All Actions	Determines whether the following job-defining attributes are displayed: Parameters Role Bindings Warehouse
Parameters	All Actions	Defines Snowflake-provided parameters that let you control how data is presented. Copy `{ "param1":"value1", "param2":"value2" }`
Role	All Actions	Determines the Snowflake role used for this Snowflake job. A role is an entity that can be assigned privileges on secure objects. You can be assigned one or more roles from a limited selection.
Bindings	All Actions	Defines the values to bind to the variables used in the Snowflake job, in JSON format. For more information on bindings, see the Snowflake documentation. The following JSON script defines two binding variables: Copy `"1": { "type": "FIXED", "value": "123" } "2": { "type": "TEXT", "value": "String" }`
Warehouse	All Actions	Determines the warehouse used in the Snowflake job. A warehouse is a cluster of virtual machines that processes a Snowflake job.
Show Output	All Actions	Determines whether to show a full JSON response in the log output.
Status Polling Frequency	All Actions	Determines the number of seconds to wait before checking the status of the job. Default: 20
Query to Location	Copy from Query	Defines the cloud storage location.
Query Input	Copy from Query	Defines the query used for copying the data.
Storage Integration	Copy from Query Copy from Table	Defines the storage integration object.
Overwrite	Copy from Query Copy from Table	Determines whether to overwrite an existing file in the cloud storage, as follows: Yes No
File Format	Copy from Query Copy from Table Create Snowpipe	Determines one of the following file formats for the saved file: JSON CSV
Copy Destination	Copy from Table	Defines where the JSON or CSV file is saved. You can save to Amazon Web Services, Google Cloud Platform, or Microsoft Azure. s3://<bucket name>/
From Table	Copy from Table	Defines the name of the copied table.
Create Table Name	Create Table and Query	Defines the name of the new or existing table where the data is queried.
Query	Create Table and Query	Defines the query used for the copied data.
Snowpipe Name	Create Snowpipe Start or Pause Snowpipe Snowpipe Load Status	Defines the name of the Snowpipe. A Snowpipe loads data from files when they are ready, or staged.
Copy into Table	Create Snowpipe	Defines the table that the data is copied into.
Copy Data from Stage	Create Snowpipe	Defines the stage from where the data is copied.
Start or Pause Snowipe	Start or Pause Snowpipe	Determines whether to start or pause the Snowpipe, as follows: Start Snowpipe Pause Snowpipe
Stored Procedure Name	Stored Procedure	Defines the name of the stored procedure.
Procedure Argument	Stored Procedure	Defines the value of the argument in the stored procedure.
Table Name	Snowpipe Load Status	Defines the table that is monitored when loaded by the Snowpipe.
Stage Location	Snowpipe Load Status	Defines the cloud storage location. A stage is a pointer that indicates where data is stored, or staged. s3://CloudStorageLocation/
Days Back	Snowpipe Load Status	Determines the number of days to monitor the Snowpipe load status.
Status File Cloud Location Path	Snowpipe Load Status	Defines the cloud storage location where a CSV file log is created. The CSV file log details the load status for each Snowpipe.
Storage Integration	Snowpipe Load Status	Defines the Snowflake configuration for the cloud storage location, defined in the previous attribute−Status File Cloud Location Path. S3_INT