Neverfail Engine For System Center Operations Manager - Technical Manual

Neverfail Engine For System Center Operations Manager - Technical Manual

Neverfail Engine Management Pack

Neverfail offers a Neverfail Engine Management Pack to provide the required configuration information for monitoring and reporting on Neverfail Engine and protected application operations. When used in conjunction with Microsoft System Center Operations Manager (SCOM) and the Neverfail SCOM Plug-in, users can monitor the performance of Neverfail Engine and use the data from SCOM reports to optimize both server and Neverfail Engine performance.

Introduction to the Neverfail Engine Management Pack

The Neverfail Engine Management Pack provides the configuration information necessary to monitor the health, performance, and availability of Neverfail Engine servers running on Windows Server 2008R2, 2012, 2012R2, 2016 and 2019 operating systems. The Neverfail Engine Management Pack provides the necessary information to allow SCOM to collect server performance data that can be used for routine management and configuration modifications to optimize servers.

 The Neverfail Engine Management Pack download supports the following operating systems:

  • Windows Server® 2019
  • Windows Server® 2016
  • Windows Server® 2002 R2
  • Windows Server® 2012
  • Windows Server® 2008 R2

This diagram provides an overview of the SCOM Agent communications for monitoring Neverfail Engine and protected applications.

scom_diag.png

 

Figure 1: SCOM Agent Communications

This diagram provides an overview of the relationships between the classes and identifies the hosting objects, contained objects, containing objects and inheritance characteristics of the Neverfail Engine Management Pack.

fig_2.png

 Figure 2: The Neverfail Engine Management Pack Service Model

The Neverfail SCOM Plugin supports the following versions of SCOM:

  • System Center Operations Manager 2007
  • System Center Operations Manager 2007 R2
  • System Center Operations Manager 2012
  • System Center Operations Manager 2012 R2
  • System Center Operations Manager 2016
  • System Center Operations Manager 2019

Contents of the Neverfail SCOM Pack

The SCOM solution pack is bundled with Neverfail Continuity Engine and it contains the following components:

  • Neverfail.Engine.mp -  management pack file to be imported on the SCOM Server
  • SCOMNFPlugin.dll - plug-in file, automatically deployed on the SCOM managed server

The Neverfail SCOM Solution pack is named SCOMNFPlugin.201.5.<build-number> and it is available on the Neverfail CE Management Service (EMS) node, in C:\ProgramData\Neverfail\VAD\catalog\plugins\ folder.

Changes in this version

EN-4102 : Management pack was renamed from Neverfail.Heartbeat.mp to Neverfail.Engine.mp

  • EN-4102 : Everything was renamed from Neverfail.Heartbeat to Neverfail.Engine
  • EN-4102 : Neverfail.Engine.Discover.All time discovery decreased from 4 hours to 15 minutes
  • EN-4102 : Multiple Display Alerts strings were corrected
  • EN-705 : Queue or WAN Smart Performance data is now collected in SCOM.

Supported Configurations

The Neverfail Engine Management Pack provides support for Neverfail Engine v8.5 Update 5 and later in the following configurations:

  • High Availability (Pair) in a LAN
  • Disaster Recovery (Pair) in a WAN
  • High Availability (LAN)+ Disaster Recovery (WAN) (Tertiary)

 

Getting Started

This section provides the prerequisites to importing the Management Pack, steps to perform after importing the Management Pack, and information about customization.

Import Neverfail SCOM Management Pack on the SCOM server

Neverfail SCOM Management Pack (Neverfail.Engine.mp) can be imported on the SCOM server anytime after installing EMS (before or after installing Engine on the managed server). 

Procedure

  • Prerequisite: The managed server (where Engine will/was installed) must be configured in the SCOM Operations Console to Allow this agent to act as a proxy and discover managed objects on other computers.
  • Extract the Neverfail.Engine.mp from the EMS node (location specified above)
  • On the SCOM server, import the Neverfail SCOM Management pack

The Neverfail SCOM Plug-in works as a component of the Neverfail Engine Management Pack. When Microsoft System Center Operations Manager is deployed to monitor the health of Neverfail Engine, the Neverfail SCOM Plug-in is used to manage the Microsoft Monitoring Agent (HealthService) service ensuring that anytime the active server identity is changed within the Neverfail Engine cluster, the Neverfail SCOM Plug-in automatically stops the HealthService service on the node that was previously active and starts it on the node that will become active. Additionally, the Neverfail SCOM Plug-in changes the HealthService service Startup type on the passive node to Manual and on the active node to Automatic ensuring that the HealthService service is always running on the active server regardless whether Neverfail Engine itself is running or not when the active server node is being restarted.

After You Import the Management Pack

 After the Neverfail Engine Management Pack has been imported, perform the following tasks:

  1. Review the Performance Monitors for enablement.
  2. Set the any overrides for frequency of discovery.

 Low-Privilege Environments

By default, the Neverfail Engine Management Pack will use the agent action account to perform discoveries and to run monitors, rules and tasks. The agent action account can run as Local System or as a named account. If running as the Local System account the user will have the privileges required to run tasks, rules and discovery actions provided by the Neverfail Engine Management Pack.

 Note: If you configure the agent to use a low-privilege account Neverfail discoveries, monitors, rules and tasks will not run successfully.

 

Understanding Management Pack Operations

 Neverfail Engine operations are divided into two areas, Discovery and Monitoring:

Discovery

Each object in the Neverfail Engine Management pack must be discovered by SCOM before it can be viewed by the user. To enable this, the Neverfail Engine Management Pack contains two Discoveries - Neverfail.Engine.Computer.Discover and Neverfail.Engine.Discover.All.

  • The Engine.Computer.Discover allows for discovery of the Neverfail.Engine.Computer class. This is the first discovery that runs and will query the registry of each computer in the network to determine if Neverfail Engine is installed. If Neverfail Engine is found, it will check the version. If the version found matches the version listed in the Management Pack, an object will be created.
  • The Engine.Discover.All discovery targets each Neverfail.Engine.Computer class found in the previous discovery. For each computer it runs a single script to retrieve all data on the servers within the Neverfail Cluster.

The following objects are created:

  • Engine.ClusterGroup
  • Engine.Server
  • Engine.ProtectedApplication
  • Engine.NFChannel
  • Engine.NFChannel.Link
  • Engine.WebServices
  • Engine.Server.FileSystemSyncStatus

Note: The Neverfail Engine Management Pack schedules both discoveries to run once every 15 minutes.

The Neverfail.Engine.ComputerDiscovery identifies new Neverfail Engine installations and does not need to run frequently. The Neverfail.Engine.Discover.All discovery identifies changes in Neverfail Engine configuration - for example the addition or deletion of Neverfail Engine servers or channels; or changes to Neverfail Engine server roles (active/passive). This discovery requires running a single script on the SCOM Agent.  When determining how often to run discovery, it is important to balance the additional memory consumption required by SCOM with the need for more frequent discovery. For example, if discovery is run with the default configuration (once every 15 minutes), when a failover/switchover occurs, the Monitors in the Neverfail Engine Management Pack will update the overall status of the Neverfail Engine objects promptly (and display any Alerts or Events that have been generated) however it will take up to 15 minutes before the SCOM Operations Console accurately reflects the new roles of the Neverfail Engine servers. 

 

Additionally, instances of the following relationships are created:

  • Engine.ClusterGroup.Contains.ProtectedApplications
  • Engine.ClusterGroup.Contains.Neverfail.Engine.NFChannel
  • Engine.ClusterGroup.Contains.Engine.Servers
  • Engine.ClusterGroup.Contains.WebServices
  • Engine.Server.Hosts.Neverfail.Engine.ProtectedApplications
  • Engine.Server.Hosts.Neverfail.Engine.FileSystemSyncStatus
  • Engine.NFChannel.Hosts.Neverfail.Engine.NFChannel.Link

Monitoring

With a combination of Availability and Performance monitors the Neverfail Engine Management Pack monitors all discovered objects for Health and performance related issues with the monitors running in varying intervals between 3 and 5 minutes. Unit monitors roll-up in state views to produce near real-time health state views that can be used to determine the current health of all Neverfail services. Event views provide a historical view of the components health in situations where historical context is required to provide insight into Neverfail-related problems or when the user wants to see the health status over an extended period of time.

 

Neverfail Engine Management Pack Monitors

Unit Monitors

 Unit monitors gather state information from contained objects and provide and report the overall state of the object to the SCOM Console.

Table 1: Unit Monitors

Monitor Type Alert Description

State Monitor for

Engine Server

Availability Yes

Monitors the Status of the Engine Server. The SCOM state will indicate Warning/Degraded for the following controller states:

  • Service Shutdown
  • Not Replicating
  • Stopping Replication
  • Service Shutting Down
  • Switching Active Server
  • Disconnecting from Peer Server
  • Not Participating
  • Server Not Responding

Previously Active Awaiting Peer Server (following an unclean shutdown) 

The SCOM state will indicate Critical for the following:

  • Lost Active Server
  • Active Following Failover

Note: For all other events, the SCOM state will indicate OK

State Monitor for Protected Applications

Availability Yes

Monitors the availability State of the protected application by looking at the Health and State values of the protected application.

  • If the Health is OK and the state is Stopped or Stopping, then indicate a Warning/Degraded
  • If the Health is OK and the State is Starting or Started then indicate an OK
  • If Health is Critical then indicate a Critical
  • For any other Health state (for example Potential problem, warning, unknown or unmonitored) indicate a Warning/Degraded

State Monitor for Neverfail Channel's Physical Link

Availability Yes

Monitors whether the Neverfail Channel Physical link is connected or not.

Web Service Monitor Availability Yes This monitor checks whether the WebServices service is running or not.
If it is not running a Critical alert is generated. A manual recovery action is also implemented which restarts the service. This recovery action can be run by the user in the SCOM Health Explorer.

 

 

Dependency Monitors

Dependency monitors allow the state of an object to affect the state of a monitor containing the object thereby allowing for a roll-up of states.

Table 2: Dependency Monitors

Monitor

Type

Alert

Description

 Neverfail.Engine.CGdependsPA_DependencyMonitor

 Dependency

 No

This monitor updates the state of the Cluster Group to reflect the state of the Protected Application. The health state is determined by the worst state of any protected application in the cluster.

Neverfail.Engine.CGdependsHB_DependencyMonitor

Dependency

No

This monitor updates the state of the Cluster Group to reflect the state of the Engine Servers. The health state is determined by the worst state of any Engine Server in the  cluster.

Neverfail.Engine.NFChannel.depends_Link.Monitor

Dependency

No

This monitor updates the state of the Neverfail Channel to reflect the state of the Channel Links. The health state is determined by the best state of any link within the Neverfail Channel.

 

Neverfail.Engine.CGdependsWebServices_DependencyMonitor

 

 Dependency

 No

This monitor updates the state of the Cluster Group to reflect the state of the Web Services

 

Health Roll-ups 

The following diagram illustrates how health states of components roll-up.

 fig3.png

Figure 3: Neverfail Health Roll-ups

 

Performance Monitors

The following is a list of performance monitors that target the Neverfail.Engine.Server class. These monitors are disabled by default. To enable these monitors, you must override the threshold values with values that are suitable for your environment.

 Table 3: Performance Monitors

Monitor

Description

 Neverfail.Engine.Server.Monitor.CurrentThroughPut

A three-state performance monitor which monitors the current throughput of the Engine Server. 

 Neverfail.Engine.Server.Monitor.MaxThroughPut

 A three-state performance monitor which monitors the maximum throughput of the Engine Server.

Neverfail.Engine.Server.Monitor.RecoveryPointMS

A three-state performance monitor which monitors the recovery point interval for the Engine Server.

Neverfail.Engine.Server.WinMonitor.AvgDiskQLen

A two-state performance monitor which monitors the Logical Disk, Average Disk Queue Length.

Neverfail.Engine.Server.WinMonitor.AvgDiskSecRead

A two-state performance monitor which monitors the Logical Disk: Avg Disk/sec Read.

Neverfail.Engine.Server.WinMonitor.AvgDiskSecTransfer

A two-state performance monitor which monitors the Logical Disk: Avg Disk/sec Transfer.

Neverfail.Engine.Server.WinMonitor.CurrentDiskQueueLength

A two-state performance monitor which monitors the Logical Disk: Current Disk Queue Length.

Neverfail.Engine.Server.WinMonitor.LogicalDisk.AvgDiskSecWrite

A two-state performance monitor which monitors the Logical Disk: Avg Disk/sec Write performance counter.

Neverfail.Engine.Server.WinMonitor.LogicalDisk.DiskReadSec

A two-state performance monitor which monitors the Logical Disk: Disk Reads/sec performance counter.

Neverfail.Engine.Server.WinMonitor.LogicalDisk.DiskWritesSec

A two-state performance monitor, which monitors the Logical Disk: Disk Writes/sec performance counter.

Neverfail.Engine.Server.WinMonitor.LogicalDisk.PercetFreeSpace

A three-state performance monitor, which monitors the Logical Disk % Free Space performance counter.

Neverfail.Engine.Server.WinMonitor.Processor.PercentProcessorTime

A two-state performance monitor, which monitors the Processor: % Processor Time performance counter.

 

Collection Rules

Rules collect historical data from sources such as Event Logs, Log Files, and Perfmon that data is stored In the Operations Manager database.

Note: Rules from different data sources are run at different times to minimize impact on system resources during the run. Additionally, rules that use the same data source are "cooked-down" to ensure that the collection script is run only once for all related rules.

Neverfail Channel Performance Data Collection Rules

The Neverfail Channel Performance Data rules target Neverfail.Engine.NFChannel and use a script to retrieve the information.

Table 4: Neverfail Channel Performance Data Collection Rules

Name

Description

 Neverfail.Engine.CollectChannel.Q.ReceiveQOldestEntry

Collects the Channel's Receive Queue-oldest entry performance data.

Neverfail.Engine.CollectChannel.Q.ReceiveQSizeBytes

Collects the Receive Queue Size in bytes

Neverfail.Engine.CollectChannel.Q.SendQOldestEntry

Collects the Send Queue oldest entry performance counter.

 Neverfail.Engine.CollectChannel.Q.SendQSizeBytes

Collects the Send Queue Size in Bytes performance counter.

 Neverfail.Engine.CollectChannel.WAX.AvgCompressionFactor

Collects the WAX Average Compression Factor

Neverfail.Engine.CollectChannel.WAX.AvgThruPut

Collects the WAX Average Throughput

Neverfail.Engine.CollectChannel.WAX.CurrentCompressionFactor

Collects the WAX Current Compression Factor

Neverfail.Engine.CollectChannel.WAX.CurrentThruPut

Collects the WAX Current Throughput

Neverfail.Engine.CollectChannel.WAX.CurVolDataProcessed

Collects the WAX Current Volume Data Processed

Neverfail.Engine.CollectChannel.WAX.VolDataProcessed

Collects the WAX Average Volume Data Processed

 

Physical Link Performance Data Collection Rules

The Physical Link Performance Data Collection rules provides information about the throughput of the physical link and uses a script to retrieve the information.

 Table 5: Physical Link Performance Data Collection Rules

Name

Description

Neverfail.Engine.CollectLink.BytesReceived

Collects the Bytes Received on the physical link.

Neverfail.Engine.CollectLink.BytesSent

Collects the Bytes Sent on the physical link.

 

Engine Server Performance Data Collection Rules

The Engine Server Performance Data Rules provide information about the physical Engine server.

 Table 6: Engine Server Performance Data Collection Rules

Name

Description

Neverfail.Engine.CollectServer.CurrentThroughPut

Collects Current Throughput performance data

Neverfail.Engine.CollectServer.MaxThroughPut

Collects Max Throughput performance data

Neverfail.Engine.CollectServer.LogicalDiskC.AvgDiskQLen

Collects the AvgDiskQLen for the Logical Disk windows performance counter.

 

 Neverfail.Engine.CollectServer.LogicalDiskC.AvgDiskSecRead

Collects the AvgDiskSecRead for the Logical Disk windows performance counter.

Neverfail.Engine.CollectServer.LogicalDiskC.AvgDiskSecTransfer

Collects the AvgDiskSecTransfer for the Logical Disk Windows performance counter.

Neverfail.Engine.CollectServer.LogicalDiskC.AvgDiskSecWrite

Collects the AvgDiskSecWrite for the Logical Disk Windows performance counter.

Neverfail.Engine.CollectServer.LogicalDiskC.CurrentDiskQueueLength

Collects the CurrentDiskQueueLength for the Logical Disk Windows performance counter.

Neverfail.Engine.CollectServer.LogicalDiskC.DiskBytesSec

Collects the DiskBytesSec for the Logical Disk Windows performance counter.

Neverfail.Engine.CollectServer.LogicalDiskC.DiskReadsSec

Collects the DiskReadsSec for the Logical Disk Windows performance counter.

Neverfail.Engine.CollectServer.LogicalDiskC.DiskWritesSec

Collects the DiskWritesSec for the Logical Disk Windows performance counter.

Neverfail.Engine.CollectServer.LogicalDiskC.FreeMegabytes Collects the FreeMegabytes for the Logical Disk Windows performance counter.
Neverfail.Engine.CollectServer.LogicalDiskC.PercentFreeSpace Collects the %FreeSpace for the Logical Disk Windows performance counter.
Neverfail.Engine.CollectServer.Processor_Total.PercentProcessorTime Collects the %ProcessorTimefor the Processor Windows performance counter.
Neverfail.Engine.CollectServer.RecoveryPointMS Collects the Recovery Point M/s performance counter from Engine.

 

 

Event Collection Rules

The administrator must configure the frequency at which the rules run based upon the configuration and environment. Administrators should balance the consumption of system resources against the need for immediate collection of events. If the configuration and environment allows, Neverfail recommends that Event Collection Rules are configured to run every 5 minutes.

Table 7: Event Collection Rules

Name

Target

Description

Neverfail.Engine.ServerEvents.Collection

Neverfail.Engine.Server

A script-based rule that retrieves all events associated with the Neverfail Server object.

Neverfail.Engine.NFChannelEvents.Collection

Neverfail.Engine.NFChannel

A script-based rule that retrieves all events associated with the Neverfail Channel objects.

Neverfail.Engine.PAEvents.Collection

Neverfail.Engine.ProtectedApplications

A script-based rule that retrieves all events associated with the Protected Applications

Neverfail.Engine.ServerShuttingDown.Collection

Neverfail.Engine.Server

A rule used to check for an Engine Shutting Down event and generates a corresponding alert.

Neverfail.Engine.ImpendingLicenseExpiry.Collection

Neverfail.Engine.Server

A rule used to check for an Impending License Expiry event and generates a corresponding alert.

Neverfail.Engine.FailoverEvent.Collection

Neverfail.Engine.Server

A rule used to check for Failover events and generates a corresponding alert.

Neverfail.Engine.AutoSwitchoverEvent.Collection

Neverfail.Engine.Server

A rule used to check for Auto-switchovers and generates a corresponding alert.

 

Linked Reports

The following linked reports are provided.

Table 8: Linked Reports

Name

Target

Description

Neverfail Channel Performance Report

Neverfail.Engine.NFChannel

Reports Channel Performance over a configurable period of time.

WAN Access Performance Report

Neverfail.Engine.NFChannel

Reports WAN Smart Performance over a configurable period of time.

Protected Application Availability Report

Neverfail.Engine.ProtectedApplication

Reports availability of Protected

Engine Server Availability Report

Neverfail.Engine.Server

Reports availability of Neverfail 

 

Engine over a configurable period of time.

 

Neverfail Recovery

The Neverfail Recovery task is used to perform an action on the Neverfail server such as restart a service. 

Table 9: Neverfail Recovery

Name

Target

Description

Neverfail.Engine.WebServicesRecovery

Neverfail.Engine.Computer

Restarts the Neverfail WebServices service. This recovery is not run automatically on failure of the monitor, but must be run manually by the user.

 

Troubleshooting

Use of Management IPs  addresses (Backdoor IP addresses) or Alternate IP addresses on a passive server may result in erroneous or duplicate Alerts being raised and may cause problems in communications between the SCOM Operations Console and the SCOM Agent.

 

Applies To

Neverfail SCOM Plug-in v201.5.2

Neverfail Continuity Engine 8.5 Update 5 and later