Control-M/Agent Cluster Configuration
The following procedures describe how to configure clusters on Control-M/Server:
Control-M with Active/Active (Load Balancing) Clusters
Control-M does not support the use of network load balancers or broadcast IP addressing, to describe an active/active cluster. Control-M/Server must be able to connect to a definitive address on a Control-M/Agent computer that runs the job. For this reason, the following configuration is recommended for an active/active cluster:
-
Each node in the cluster should have a Control-M/Agent installed that listens on a non-load balanced, or broadcast IP, address. The Server-to-Agent port should be reachable without going through any network load balancer or port address translation.
-
Discover each agent through Control-M/Server.
-
Create a node group for the application. This is the name that should be used when scheduling jobs for this application. We recommend using the virtual name or the application name for familiarity with schedulers.
-
Update or create your job definitions to refer to the node group that was created in the previous step.
Control-M with Active/Passive (High Availability) Clusters
When you implement Control-M/Agent on a UNIX cluster, a dedicated Control‑M Agent is installed within each resource group to which Control-M should submit jobs. When a single application is running on the cluster, a single Control‑M Agent should be installed. When multiple applications are running on the cluster, Control-M submits jobs to those applications using different Control‑M Agents.
The file system on which Control-M/Agent is installed should be located on the shared disk. This file system should always be mounted to the same node as the application to which Control-M submits jobs. This file system can be
-
the same file system as the application file system
-
a different file system, as long as both file systems are always active on the same node (if they are not members in the same application resource group)
Each Agent should be configured to use the application virtual host name for the communication with Control-M/Server. When submitting jobs to this Agent, the NODEID parameter value for the jobs should be the virtual host name.
Before starting the implementation of Control-M/Agent on a UNIX cluster, first identify the file system where the Agent should be installed, and determine the resource group where the agent should be installed.
Creating Control-M/Agent UNIX Accounts
This procedure describes how the
Control-M/Agent is installed into the same file system as Control-M/Server (referred to in the example: /export2), using the same virtual network name as Control-M/Server (referred to in the example: vhctmxxx). The same procedure can be used if Control-M/Agent is installed for any other external application.
Begin
-
Create two user accounts as shown in the following example, one on each node.
useradd -g controlm -s /bin/tcsh -m -d /export2/agxxxctm agxxxctm
This command should be invoked by a user with administrative permissions
-
Both users must have identical names (referred to in the example as: agxxxctm) and identical user IDs (UID).
-
Both user home directories should point to the same location on a shared disk (referred to in the example as: /export2/agxxxctm).
Installing Control-M/Agent
Begin
-
Install Control-M/Agent on the relevant file system on the shared disk according to the instructions provided in Agent Installation.
-
Install the latest Fix Pack to apply the most recent software updates.
-
Run the Control-M/Agent configuration utility (either ctmag or ctmagcfg) to configure the logical Agent name. In the configuration utility, select Logical Agent Name from the Advanced menu. The logical agent name should contain the virtual network name.
-
In the Control-M/Agent configuration menu, define the Control-M/Server host name as authorized to submit jobs to this Control-M/Agent. If Control-M/Server is installed on a cluster, only the virtual network name of Control-M/Server (referred to in the example: vhctmxxx) should be specified.
Missing Jobs
Each time a job is submitted, a process is created which monitors the job, and reports about its completion. This process is called Agent Monitor (AM). With each job, when the AM is started, two files are created for the job: a status file and a "procid" file.
In a normal scenario, the AM detects the job completion, updates the "procid" file and sends a trigger to the Agent Tracker (AT) about its completion. The AT then sends the update to Control-M/Server.
In a failover scenario, while the job is still executing, the agent process is stopped and the agent file system is unmounted from the first host. In this case the job can keep running, but the "procid" file will not be updated when the job completes (the agent file system will be mounted to the backup node). Therefore, when the agent is started on the backup node, and the next AT track time arrives, it will find the original "procid" file but it will not find the actual process. This is why the job is marked as disappeared.
As an optional workaround, you can define a JLOST ON statement for the jobs that run on the clustered agent (Statement=*, Code=JLOST) and execute a DO RERUN command. In this case, the jobs will be automatically restarted (rerun) on the backup server when Control-M/Server determines that they have disappeared.
You must enter value greater than 0 in the MAX RERUN parameter in order for the job to be resubmitted.
Monitoring Control-M/Agent Processes
When monitoring Control-M/Agent processes on a cluster, use the following process names for cluster monitoring definitions:
Control-M/Agent component |
Process name |
---|---|
Control‑M/Agent Listener |
p_ctmag |
Control‑M/Agent Tracker |
p_ctmat |
Control‑M/Agent Router |
p_ctmar |
Control-M/Agent Tracker-Worker |
p_ctmatw |
Control-M/Agent Remote Utilities Listener |
p_ctmru |
Control-M/Agent SSH connection pool |
sshcourier.jar |
Control-M/Agent Recovery (Windows only) |
p_ctmam |
The Control-M/Agent Router (p_ctmar) is only active when working in persistent connection mode. When working in transient connection mode, only the Control M/Agent Listener (p_ctmag) and Tracker (p_ctmat) are active.
On UNIX, you might see more than one p_ctmag (one for each job).
Control-M/Agent Cluster Environment on Windows
Note the following:
-
Install Control-M/Agent, as described in Installing Control-M/Agent on Windows.
The Control-M/Agent and File Watcher cluster resources are installed and online.
-
Multiple Agents can be installed on the same virtual server group or in separate virtual server groups.
-
Control-M/Agents that share the same IP and Network name resources must be associated with separate Control-M/Servers.
-
Disk, IP, and Network Name resources must be online in the virtual server group where Control-M/Agent is installed.
-
Automatic installation and automatic upgrade of Control-M/Agent is not supported for Microsoft Windows cluster environments.