Big Data Analytics Jobs
The following sections provide information about the parameters and attributes of jobs that work with Big Data Analytics platforms and services.
Azure Databricks Job
Azure Databricks is a cloud-based data analytics platform that enables you to process large workloads of data.
The following table describes Azure Databricks job attributes.
Attribute |
Description |
---|---|
Connection profile |
Defines the connection profile for the job. Rules:
|
Databricks Job ID |
Determines the ID of the job created in your Databricks workspace. |
Parameters |
Defines task parameters to override when the job runs, according to the Databricks convention. The list of parameters must begin with the name of the parameter type. For example:
For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided through the Azure Databricks documentation. For no parameters, specify the following value:
|
Idempotency Token |
(Optional) Defines a token to use to rerun job runs that timed out in Databricks. Values:
|
Status Polling Frequency |
(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 30 |
Azure HDInsight Job
Azure HDInsight enables you to run an Apache Spark batch job for big data analytics.
The following table describes Azure HDInsight job parameters:
Attribute |
Description |
---|---|
Connection Profile |
Defines the name of a connection profile to use to connect to the Azure HDInsight workspace. |
Parameters |
Determines which parameters are passed to the Apache Spark Application during job execution, in JSON format (name:value pairs). This JSON must include the file and className elements. |
Polling Intervals |
Determines the number of seconds to wait before the Apache Spark batch job is verified. Default: 10 seconds |
Status Polling Interval |
Determines whether logs from Apache Spark appear in the job output. |
Databricks Job
The Databricks job enables you to integrate jobs created in the Databricks environment with your existing Control-M workflows. The following table describes Databricks job parameters:
Attribute |
Description |
---|---|
Connection Profile |
Determines which connection profile to use to connect to the Databricks workspace. |
Databricks Job ID |
Determines the job ID created in your Databricks workspace. |
Parameters |
Defines task parameters to override when the job runs, according to the Databricks convention. The list of parameters must begin with the name of the parameter type. For example:
For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided through the Azure Databricks documentation. For no parameters, specify the following value:
|
Idempotency Token |
(Optional) Defines a token to use to rerun job runs that timed out in Databricks. Values:
|
Status Polling Frequency |
(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 30 |
Snowflake Job
Snowflake is a cloud computing platform that you can use for data storage, processing, and analysis.
The following table describes the Snowflake job type attributes.
Attribute |
Action |
Description |
---|---|---|
Connection Profile |
N/A |
Defines the connection profile for the job. Rules:
|
Database |
N/A |
Determines the database that the job uses. |
Schema |
N/A |
Determines the schema that the job uses. A schema is an organizational model that describes the layout and definition of fields and tables, and their relationships to each other, in a database. |
Action |
N/A |
Determines one of the following Snowflake actions to perform:
|
Snowflake SQL Statement |
SQL Statement |
Determines one or more Snowflake-supported SQL commands. Rule: Must be written in a single line, with strings separated by one space only. |
Statement Timeout |
All Actions |
Determines the maximum number of seconds to run the job in Snowflake. |
Show More Options |
All Actions |
Determines whether the following job-defining attributes are displayed:
|
Parameters |
All Actions |
Defines Snowflake-provided parameters that let you control how data is presented. Copy
|
Role |
All Actions |
Determines the Snowflake role used for this Snowflake job. A role is an entity that can be assigned privileges on secure objects. You can be assigned one or more roles from a limited selection. |
Bindings |
All Actions |
Defines the values to bind to the variables used in the Snowflake job, in JSON format. For more information on bindings, see the Snowflake documentation. The following JSON script defines two binding variables: Copy
|
Warehouse |
All Actions |
Determines the warehouse used in the Snowflake job. A warehouse is a cluster of virtual machines that processes a Snowflake job. |
Show Output |
All Actions |
Determines whether to show a full JSON response in the log output. |
Status Polling Frequency |
All Actions |
Determines the number of seconds to wait before checking the status of the job. Default: 20 |
Query to Location |
Copy from Query |
Defines the cloud storage location. |
Query Input |
Copy from Query |
Defines the query used for copying the data. |
Storage Integration |
|
Defines the storage integration object. |
Overwrite |
|
Determines whether to overwrite an existing file in the cloud storage, as follows:
|
File Format |
|
Determines one of the following file formats for the saved file:
|
Copy Destination |
Copy from Table |
Defines where the JSON or CSV file is saved. You can save to Amazon Web Services, Google Cloud Platform, or Microsoft Azure. s3://<bucket name>/ |
From Table |
Copy from Table |
Defines the name of the copied table. |
Create Table Name |
Create Table and Query |
Defines the name of the new or existing table where the data is queried. |
Query |
Create Table and Query |
Defines the query used for the copied data. |
Snowpipe Name |
|
Defines the name of the Snowpipe. A Snowpipe loads data from files when they are ready, or staged. |
Copy into Table |
Create Snowpipe |
Defines the table that the data is copied into. |
Copy Data from Stage |
Create Snowpipe |
Defines the stage from where the data is copied. |
Start or Pause Snowipe |
Start or Pause Snowpipe |
Determines whether to start or pause the Snowpipe, as follows:
|
Stored Procedure Name |
Stored Procedure |
Defines the name of the stored procedure. |
Procedure Argument |
Stored Procedure |
Defines the value of the argument in the stored procedure. |
Table Name |
Snowpipe Load Status |
Defines the table that is monitored when loaded by the Snowpipe. |
Stage Location |
Snowpipe Load Status |
Defines the cloud storage location. A stage is a pointer that indicates where data is stored, or staged. s3://CloudStorageLocation/ |
Days Back |
Snowpipe Load Status |
Determines the number of days to monitor the Snowpipe load status. |
Status File Cloud Location Path |
Snowpipe Load Status |
Defines the cloud storage location where a CSV file log is created. The CSV file log details the load status for each Snowpipe. |
Storage Integration |
Snowpipe Load Status |
Defines the Snowflake configuration for the cloud storage location, defined in the previous attribute−Status File Cloud Location Path. S3_INT |