Designing a SolarWinds Monitoring System

I was asked to write a document of 2000 words to provide a proposal to an organisation in need of a new monitoring system.

Here is my solution :

Proposed Approach

The new SolarWinds monitoring system will replace all other monitoring systems and give the Service team full visibility and control of all required environments.

Configuration and training for all aspects of the modular SolarWinds monitoring environment will be provided.

  1. Examination of the current monitoring platforms.
  2. Examination of hardware, operating systems, SQL software and SolarWinds software, allowing us to identify whether an in place upgrade or a new server build will be required.
  3. Discuss and agree customer priorities, defining objectives for monitoring. Questions listed in the requirement specification should form the basis of defining the objectives allowing us to create a hardware/software specification to achieve those objectives.
  4. Establish communication update procedure in accordance with customer standards:
    1. Daily standup meeting or daily email updates to update customer staff about progress and any upcoming changes.
    2. Familiarise with customer change process.
  5. Establish and agree a time line to complete each line item.
  6. Assign tollgate meetings to each phase of the project.
  7. Plan meeting with service and ITOC Design teams to discuss process / changes for specific technical work:
    1. Upgrading/installing operating system.
    2. SQL.
    3. Hardware requirement.
    4. Solarwinds products.
    5. Configuration.
  8. Implementation of technical solution.
  9. Implementation of configuration plan.
  10. Disaster recovery plan.
  11. Training

Requirement Specification

Examine existing customer monitoring solutions, plan to replace current monitoring systems with fully resilient, upgraded Solarwinds monitoring platform.

We will start by asking key questions below.

Existing monitoring systems questions:

  1. Who is using the monitoring solution? / Number of simultaneous users? (Performance of monitoring environment.)(Contact list for change control) Networking Team, customer services teams, ES ITOC users.
  2. Infrastructure size / how many elements? Additional pollers will be required if over 12000 elements.
  3. What is currently being monitored? (Applications, databases, servers, network devices, traffic, connectivity/supernets/subnets, transactions, logs, ip addresses, client devices and users.) This will define the type of pollers required.
  4. Polling frequency? System requirements increase if polling frequency increases.
  5. What Solarwinds software are customer currently licensed for?
  6. What versions of Microsoft Software and Solarwinds Software are running on the pollers and the SQL server?

Hardware and Software requirements:

Solarwinds Polling
Server Running Orion Services
Physical server or virtual machine (VM)
CPU: 4 cores + 1 core for each additional module.
RAM: 8 GB + 2 GB for each additional module
Storage: 150 GB, 15,000 RPM1 x 1 Gb dedicated NIC.
Microsoft Windows Server 2019 or 2016, Standard or Datacenter Edition.
The SolarWinds Orion installer installs IIS and .NET 4.6.2 or later if they are not already on your server.
Additional pollers should also be configured as above.
SolarWinds Orion SQL database
server
Physical server recommendedDual/quad core processor or better.
128 GB RAM.
Hardware RAID Controller (software RAID not supported)
Disk Subsystem Arrays configured as per standard Solarwinds install OS RAID1, DATA & PAGEFILE RAID1, DATABASE RAID1+0.1 x 1 Gb dedicated NIC.
SQL Server 2017, 2016, or 2014, Standard or Enterprise EditionSolarWinds recommends SQL Server 2016 SP1 or later.
Solarwinds Additional Web Server IIS Physical server or virtual machine (VM)
CPU: 4 cores
RAM: 8 GB
Storage: 40 GB, 15,000 RPM
1 x 1 Gb dedicated NIC.
Hardware RAID Controller (software RAID not supported).
Microsoft Windows Server 2019 or 2016, Standard or Datacenter Edition.
Disk Subsystem Arrays configured as per standard Solarwinds install OS RAID1, DATA & PAGEFILE RAID1.

All specifications detailed in this document are based upon a large monitoring environment using Solarwinds Success center best practices.

https://support.solarwinds.com/Success_Center/Orion_Platform/Orion_Documentation/Orion_Platform_Administrator_Guide/Orion_multi-module_system_guidelines

Design Specification (Proposal)

Fully Resilient, scalable, HA Solarwinds Production Monitoring Environment

The Solarwinds diagram above displays a scalable, fully HA resilient, Solarwinds environment. I have included optional features within the environment that may not be required:

  • Additional Poller and remote poller for over 12000 elements and appropriate HA poller and remote HA poller.
  • Logs and Event Manager Appliance to incorporate SIEM functionality into Solarwinds.
  • Extra WPM servers for additional synthetic transaction testing.
  • Database performance analyzer to allow performance analysis of key databases.

To scale this environment, add additional polling engines and respective HA pollers. Additional polling engines will be installed at each remote office if required.

SQL databases, pollers and additional function servers should be backed up as part of an organizational backup service.

Solarwinds Development Environment

A virtualised single Solarwinds testing environment running with appropriate licensing for functional testing of Solarwinds monitoring procedures. No additional pollers or HA are required as this is a development environment and can be restored from snapshot at any time. Hardware and Software specifications based upon Solarwinds best practices for modular deployments.

Install and Upgrade Technical Solution

  1. Hardware and Software including respective licensing will be specified based upon monitoring requirements, agreed and ordered. Access to Solarwinds customer portal is required to download latest software versions for each module.
  2. Clearly identify cost effective method for SQL licensing, each Microsoft Windows Server license to be deployed and Solarwinds module licensing. Confirm Solarwinds MVP discount.
  3. Diagrams and documentation drawn up for change control.
  4. If using a new Solarwinds database (not importing the old one) then before old Solarwinds database is disconnected examine the Orion database for network auto discoveries, custom pollers, bespoke alerts and custom dashboards that may be lost during the upgrade process.
  5. Define connectivity requirements from pollers to customer environment.
  6. Define connectivity requirements from main poller to SQL server. (Network change required).
  7. Check connectivity is working between all pollers, SQL and customer environment. Liase with networking team to resolve possible issues.
  8. Install physical or virtual servers based upon design specification diagram. (Changes, diagrams, documentation required).
  9. Upgrade operating systems or build new server. (Change required).
  10. Download and Install latest Solarwinds software as per customer requirements from customer portal.
  11. Run Configuration Wizard, document process. (Change required).
  12. Assign virtual templates if not physical and build development Solarwinds environment.
  13. Define and document user accounts / connection profiles to be used with Solarwinds, network devices. SQL database account. Solarwinds polling account.

SolarWinds Monitoring Configuration Plan

  1. Check Login to Solarwinds.
  2. Add licensed modules to Solarwinds.
  3. Setup and confirm auto upgrade for future Solarwinds editions.
  4. Setup users (obtain user list). Specify permissions for each user dependent upon job role. Identify NOC team / customer Services permissions.
  5. Setup default email address for alerting (catch all default monitoring address).
  6. Configure HA and confirm that HA is functioning correctly.
  7. Configure distributed and remote workload between additional Solarwinds pollers based upon poller and node location.
  8. Run network discovery tool using subnet list within Solarwinds to bulk add and import nodes.
  9. Define a weekly network discovery scan to detect functional changes to nodes.
  10. Add database and application templates in SAM.
  11. Confirm that node thresholds are set correctly with customer Services and NPM is polling statistics correctly.
  12. Define IPAM IP ranges. Check that data received is accurate.
  13. Setup UDT, DHCP and DNS server links if required.
  14. Setup WPM transactions for synthetic user testing of applications and user journey performance analysis. Create views and reports to show where bottlenecks and problems occur.
  15. Define netflows and create netflow dashboard view. May require network device changes.
  16. Configure additional modules if required :
    1. SRM for storage array monitoring using SMI-S.
    2. Network and server configuration monitor along with access rights manager. If audit control and switch / router config is required.
    3. DPA for database performance analysis.
    4. VMAN for virtualised VMWare or HyperV host monitoring.
    5. Log Manager or Log and Event Manager to detect server problems via logs.
    6. Patch Manager to assist in deployment of software packages.
    7. Web Help Desk if ServiceNow is not being used as a help desk system.
    8. APM for application performance enhancement.
  17. Create groups of nodes to identify key areas. Use these groups for initial traffic light (RAG) dashboard view providing visibility and access to all key areas through one dashboard.
  18. Define views for teams.
  19. Define alerting requirements based upon events and resulting trigger actions including automated fixes for incidents.
  20. Define proactive capacity management reports and alerting.
  21. Define alerting responsibility and distribution lists for alerting.
  22. Setup availability reports / service level monitoring, for groups, nodes, applications and databases.
  23. Setup performance dashboards and reporting for applications.
  24. Setup alerting workshop/training sessions with customer Services.
  25. Provide modular training for each implemented Solarwinds product. Demonstrate Engineers toolset that enables fault finding within Solarwinds. Demonstrate scenarios and possible ways that Solarwinds can help increase efficiency and proactively prevent failure of a production environment.
  26. Implement, alert, report and monitoring request templates for future monitoring requests.
  27. ServiceNow or other service desk integration for alerting / automated raising / closing of incidents via Solarwinds events.
  28. Establish a maintenance plan / window, daily, weekly, monthly for Solarwind poller and database health and performance. Orion deployment health checklist.
  29. Disaster recovery plan.

Responsibilities and Team Structure

  1. Adam will work with Solarwinds and the service team directly to plan, design and implement a fully HA resilient, future proof monitoring system.
  2. Adam will evaluate current monitoring solutions meeting with service and ITOC design team to discuss requirements.
  3. The design and plan for the environment will be based on Solarwinds best practice specifications.
  4. Adam will provide training and a suitably knowledgeable Solarwinds engineer will assist in the development of the monitoring environment and ongoing customer requirements. Adam has many standard operating procedure documents for common tasks that can be provided to staff to help with understanding of Solarwinds processes.
  5. Adam will put together a disaster recovery document, with documentation about how to recover the Solarwinds environment from backups.

Estimated Timeframe (3 Months)

Each step of the project is given a timeframe within the three month period.

Value for Money

Key factors keeping costs down when designing a monitoring environment:

  1. Solarwinds licensing audit. Examine number of licenses required for each product and license respectively. Evaluate number of elements required for each product and license. As an MVP Adam has access to discounts for Solarwinds products.
  2. Type of SQL licensing (Core or User CALS). Provide enough licenses for everyone to login to Solarwinds.
  3. Hardware costs. Confirm that servers are suitably specified to provide future proofing and a high quality service to users.
  4. Microsoft Windows Server licensing cost.
  5. Adam will be onsite, providing assistance to the service team and carrying out the work. Adam has MVP support with Solarwinds and has access to key support teams at Solarwinds to resolve any technical problems much faster than using ordinary support procedures.
  6. Phasing out existing disparate non-Solarwinds monitoring systems will reduce licensing and hardware costs and decrease the time taken to fix issues.
  7. Increased efficiency by giving engineers better visibility of key systems reducing overall fix times.
  8. Consolidation of Solarwinds licensing / monitoring / better performance will save time / money.

Risks and Dependencies

Risks

  1. Downtime of monitoring systems during install of operating systems / SQL database software if in place upgrades. (1 day)
  2. If new hardware and software are purchased the above risk is eliminated.
  3. Solarwinds software downtime during upgrade. (3 hours)
  4. Phasing out old monitoring systems. Solarwinds can be setup to monitor before old systems are phased out. No Risk. (Time to be agreed)
  5. Communication of changes and work being carried out must be communicated to all departments who may be affected by new monitoring systems, especially when phasing out old monitoring systems.
  6. Training and access will be provided when replacement of old monitoring systems goes ahead. The first phase of the project will enable evaluation of old monitoring software and give us an awareness of who may be affected and if there are any further risks.

Dependencies

  1. Solarwinds pollers require access to all network devices and all servers via SNMP or WMI ports.
  2. Solarwinds SQL server must have access via SQL port to main Solarwinds poller.
  3. Hardware and software must be ordered in a timely fashion so that project is not delayed.

There is so much more we could add to a document like this, remember I was given a word limit of 2000 words.

So if you know of a company that is struggling with their monitoring infrastructure, maybe Acmtix combined with Solarwinds can help.