How To Be An Infrastructure Monitoring Detective.
The bad old days.
When there was no such thing as infrastructure monitoring, it was all about logs, everything was manual. Support processes were tedious and long winded often resulting in site visits or lengthy remote desktop sessions.
“The last to know support desk” was a common occurrence. Technical support teams were fire-fighting problems as they occurred often leaving them vulnerable and often causing their core business to look elsewhere for support.
The Art of the Alert.
Today we are more efficient. We know what is happening as it happens, often in real time.
Notifications sent to NOC teams start with relevant concise alerts.
Multiple alerts can sometimes be sent out for the same issue this can lead to confusion. Determining that a group of alerts are part of a larger issue is a skill that stops a monitoring engineer jumping to conclusions and takes real detective work.
Piece the alerts together, look at the thresholds, examine the events and a picture starts to form.
An application has gone down, there is high network latency, a MySQL database key writes/read efficiency is low, An IP address is in a state of conflict.
Look at the alerts received, think about the relationship of those alerts to the node, the thresholds that have resulted in that alert and environment being monitored. Compare dates/times, nodes, applications and reported metrics to make your diagnosis.
Solarwinds provides great tools, like real-time process explorer/service control manager and real-time event log viewer all available to the infrastructure monitoring engineer if required.
These tools are the last resort as Solarwinds monitoring contains predefined views that list vital statistics, cpu, memory and disk utilisation.
With relevant information at your fingertips, there is no need to login to a server or a network node, all the information is readily accessible through the Solarwinds web console.
A useful dashboard is subjective, often controversial and depends mainly upon the responsibilities assigned to the team or individuals monitoring an environment.
A matrix listing the names of the nodes and the functions of the nodes is the most common display methodology for displaying an environment. This kind of matrix display can show hundreds of nodes easily.
Custom static dashboards are useful for targeting a specific environmental factor.
A static dashboard can be useful for those who need to know that a set environment is performing well, or who want to monitor specific metrics and need those metrics displayed at all times.
There are many different types of custom static dashboard such as simple traffic lights, to advanced network diagrams detailing connection information and node information.
The Appstack environment view within Solarwinds is a good way to filter/sort problems within an environment without configuring any extra dashboard views. The appstack can be filtered easily making it manageable, dynamic and useful for those with limited responsibilities.
With a self-explanatory dashboard, relevant alerting and an experienced infrastructure monitoring team it is possible to pick up an alert, see a dashboard status change, and quickly triage an application/node issue before the end users notice there is an issue.
Monitoring gives you power and control, its a safety net to free up your time for project work and development.
No More Firefighting…