I had a request this morning to monitor a bespoke application, the Tyrell Control Messaging application.

So I thought i’d document the application monitoring process for you all to see.

The server names and IP addresses have been sanitised to protect the innocent.

Monitoring is required to let the engineers know that there has been an application failure and provide an automated fix.

So here is how I did it.

Find the node that the service runs on. Click “Service Control Manager”.

Note down each of the services you want to monitor.

Hovering over the service gives you the name of the service.

To monitor the Tyrell Guardian click on “Start monitoring this service”.

Add the rest of the services to the created application template that automatically appears when you add the first component monitor.

Be sure to rename the component names to something useful to match the services being monitored.

Once all the component monitors are in place, enter a suitable template description to let users know what the service application monitor is monitoring.

Click “Submit”.

Now go back to the node and an application template has been applied to the node.

Click on the TYRELL – CONTROL MESSAGING application and the application details – summary for the application can be seen.

This application monitoring template can now be utilised to put in a clever alert to restart the service based upon the threshold status saving an engineer from receiving a call at midnight to tell him/her that the control messaging application has failed.

Creating a Control Messaging Alert to notify engineers when the control messaging application goes down and to restart the service.

Go into “Manage Alerts”.

Click on “Add New Alert”.

Enter in the alert properties.

Do not enable the alert yet.

The Trigger condition specifies a specific application and refers to a status of Node UP, Application DOWN. This trigger will then fire an action as specified.

The reset condition is set to reset the alert when trigger action is no longer true.

Time of day is set to always enabled.

Then the trigger actions are set.

  1. EMAIL the infrastructure team.
  2. Log an event in Solarwinds.
  3. Attempt to restart the stopped services.

APM\APMServiceControl.exe ${N=SwisEntity;M=ComponentAlert.ComponentID} -c=RESTART

An escalation level 2 has been added to email a manager when this application does not come back up within an hour.

Email and event log are specified. Email is set to send to infrastructure team and event log records the event in Solarwinds.

The reset action is the same as the trigger action.

Finally enable the alert.

7. Summary of Alert Configuration

Please review the alert configuration before saving…

Name of alert:

CONTROL MESSAGING ALERT

Description of alert:

This alert will write to the event log and email the alerts mailbox when the CONTROL MESSAGING application goes down and when the application comes back up.

Type of Property to monitor

Application

Enabled(On/Off):

ON

Evaluation Frequency of alert:

Every minute

Severity of alert:

Critical

Alert Custom Properties: (1)

ResponsibleTeam: 

Alert owner (user who created this alert):

emt\atimberley

Alert Limitation Category

No Limitation

Trigger Condition:

Alert on all objects where:
Application – Instance of Application – is – swis://SOLARWINDS/Orion/Orion.Nodes/NodeID=164/Applications/ApplicationID=28
The actual trigger condition:
All child conditions must be satisfied (AND) 
Node – Status – is not equal to – Down 
Application Alerting Properties – Application Availability – is equal to – Down

Reset Condition:

When the trigger condition is no longer true

Time of Day schedule:

Alert is always enabled

Trigger Action:

Escalation Level 1

1. CONTROL MESSAGING Email/Page (Application “${N=SwisEntity;M=ApplicationAlert.ApplicationName}” on “${N=SwisEntity;M=Node.Caption}” is currently ${N=SwisEntity;M=ApplicationAlert.ApplicationAvailability})

2. CONTROL MESSAGING : Application ${N=SwisEntity;M=ApplicationAlert.ApplicationName} on Node ${N=SwisEntity;M=Node.Caption} is ${N=SwisEntity;M=ApplicationAlert.ApplicationAvailability}

3. RESTART CONTROL MESSAGING

Escalation Level 2

1. Send an Email/Page (CONTROL MESSAGING Alert ${N=Alerting;M=AlertName} at ${N=Alerting;M=AlertTriggerTime;F=DateTime})

Reset Action:

1. CONTROL MESSAGING Email/Page (Application “${N=SwisEntity;M=ApplicationAlert.ApplicationName}” on “${N=SwisEntity;M=Node.Caption}” is currently ${N=SwisEntity;M=ApplicationAlert.ApplicationAvailability})

2. CONTROL MESSAGING NetPerfMon Event Log : Application ${N=SwisEntity;M=ApplicationAlert.ApplicationName} on Node ${N=SwisEntity;M=Node.Caption} is ${N=SwisEntity;M=ApplicationAlert.ApplicationAvailability}