Rules Associated With Neverfail for SQL Server Plug-in

Follow

Summary

This Knowledgebase article provides information about the rules associated with Neverfail for SQL Server Plug-in.


More Information

Rules are implemented by plug-ins and cannot be created by users. Each plug-in contains a default set of Rules with options that may be modified by the user. Rules may be enabled or disabled by the users, regardless if the rule is disabled by default or not.

Typically a rule has the following parameters:

  • Condition - specifies the cases in which the rule is triggered (for example if a performance counter is below a certain threshold).
  • Duration - duration represents the amount of time a rule must satisfy the condition, at every check, before it is triggered.
  • Interval - is the length of time between rule checks.
  • Failure Actions - specify what Neverfail Heartbeat should do in case the rule is triggered (the condition evaluates to true). There are three such actions: Log Warning, Restart Applications and Switchover.

Available rules

  • DB Files Allocated Space

This rule will warn the user if the allocated space for database logs and data files may run out of space. The warning is issued if the rule finds a database that doesn't auto-grow and uses more than a specified value (80% by default) of the allocated space. The rule uses the following counters:

Database: Log File(s) Size(KB) - Returns the cumulative size of all the log files in the database. This size can grow if you have not set a maximum size for the log in ‘tempdb'. This counter is dependent on specific SQL instances.

Database: Log File(s) Used (KB) - Returns the cumulative used size of all log files in the database. A large active portion of the log in ‘tempdb’ can be a warning sign that a long transaction is preventing log cleanup. This counter is dependent on specific SQL instances.

  • Databases Availability

This rule sequentially checks that all databases with an ONLINE status answer to queries in a timely fashion. This rule is triggered if one or more databases timeout or cannot connect to any SQL Server instances installed on the machine. The rule returns a list of the databases that timed out or the duration of the check, in milliseconds, if none are found.

This rule handles multiple SQL Server instances and all their databases except the system 'tempdb' databases. In case of failure the same action is triggered for all instances.
If unable to log on to any of the SQL Server instances, their databases will not be checked, but the rule will warn that some instances were skipped and will provide a list of those instances.

This rule connects to the SQL server instances using Windows Authentication and runs the “sp_tables” system stored procedure for every database listed in master.mdf . The timeout interval is set to 10 seconds, the interval is 60 seconds and the failure action is Log Warning. All these settings are configurable by selecting the rule and clicking the Edit button.

  • Databases Online Status

This rule will take all databases that exist in SQL Server and check their status. If any off the databases don't have an ONLINE status, it is the result of the database being inaccessible and a warning is issued. This rule connects to the SQL server instances using Windows Authentication.

  • Default Instance Buffer Cache Hit Ratio

This rule monitors the SQL Server: Buffer Manger – Buffer Cache Hit Ratio performance counter. The counter shows the percentage of pages that were found in the buffer pool without having to incur a read from disk. By default the rule is enabled and checks that the counter is above 90% for every check during a 30 minutes period. The condition check is performed every 5 minutes and the failure actions are Log Warning. This rule connects to the SQL server instances using Windows Authentication.

  • Default Instance Free Pages

This rule monitors the SQL Server: Buffer Manger – Free pages performance counter. This counter shows the total number of pages on all free lists. By default the rule is enabled and it checks that the counter is above 640 by default for every check during a 30 minutes period. The condition check is performed every 5 hours and the failure actions are Log Warning.

  • Default Instance Total Server Memory

This rule monitors the SQL Server: Memory Manger – Total Server Memory performance counter. This counter shows the total amount of dynamic memory the server is currently consuming. By default the rule is disabled because the threshold is specific to the machine it runs on. The rule checks that the counter is below the threshold for every check during a 30 minutes period. The condition check is performed every minute and the failure actions are Log Warning.

  • Default Instance Batch Requests/sec

This rule monitors the SQL Server: SQL Statistics – Batch Requests/sec. This counter shows the number of SQL batch requests received by server. Generally speaking, over 1000 batch requests per second indicates a very busy SQL Server, and could mean that if you are not already experiencing a CPU bottleneck, that you may soon. By default the rule is enabled. The rule checks that the counter is below 1000 requests per second. The condition check is performed every 5 minutes for a 30 minutes period and the failure actions are Log Warning.

  • Default Instance Average Wait Time (ms)

This rule monitors the SQL Server: Locks – Average Wait Time (sec). The counter shows the average amount of wait time for each lock request that resulted in a wait. By default the rule is enabled. This rule checks that the counter is below 500 milliseconds. The condition check is performed every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • Default Instance Number of Deadlocks/sec

This rule monitors the SQL Server: Locks – Number of Deadlocks/sec. The counter shows the number of lock requests that resulted in a deadlock. By default the rule is enabled. This rule checks that the counter is below 1. The condition check is done every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • Default Instance Full Scans/sec

This rule monitors the SQL Server: Access Methods – Full Scans/sec. The counter shows the number of unrestricted full scans. These can either be base table or full index scans. By default the rule is enabled. This rule checks that the counter is below 100. The condition check is done every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • Default Instance User connections

This rule monitors the SQL Server: General Statistics – User Connections. The counter shows the number of users connected to the system. By default the rule is disabled because a threshold for the counter is not established. The condition check is done every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • Default Instance Cache Hit Ratio

This rule monitors the SQL Server: Catalog Metadata – Cache Hit Ratio. For SQL 2000 instances, the counter is in the SQL Server: Cache Manager category. The counter shows the ratio between catalog metadata cache hits and lookups. By default the rule is enabled. This rule checks that the counter is below 85%. The condition check is done every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • First Instance Working Set

This rule monitors the Process - Working Set for instance ‘sqlservr’ performance counter. The counter shows the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use.  When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before leaving main memory.

By default the rule is disabled because the threshold is specific to the machine it runs on. This rule checks that the counter is above the threshold for every check during a 30 minute period. The condition check is done every 5 minutes and the failure actions are Log Warning.

  • First Instance Processor Time

This rule monitors the Process - % Processor Time for instance ‘sqlservr’ performance counter. The counter shows the percentage of elapsed time that all process threads used by the processor to execute instructions. An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions are included in this count. Beware that the counter shows the sum of processor percentages. So for a 8 processor machine running at 60% each, we will get 480% as the value for the counter. For a single processor going over 80% for a longer period of time indicates problems. The rule is disabled because it depends on the number of processors. The condition check is done every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • First Instance Page File Bytes

This rule monitors the Process – Page File Bytes for instance sqlserv performance counter. The counter shows the current amount of virtual memory, in bytes, that this process has reserved for use in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and the lack of space in paging files can prevent other processes from allocating memory. If there is no paging file, this counter reflects the current amount of virtual memory that the process has reserved for use in physical memory. This rule is disabled because it uses server specific values. The condition check is done every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • First Instance Virtual Bytes

This rule monitors the Process – Virtual Bytes for instance sqlserv performance counter. The counter shows the current size, in bytes, of the virtual address space the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite, and the process can limit its ability to load libraries. Virtual Bytes alerts can be safely ignored because it is normal for SQL Server to consume as much virtual memory as it can. The rule is disabled because it uses server specific values. The condition check is done every 5 minutes for a 30 minute period and the failure actions are Log Warning.

  • The following rules are the same as their default instance counterparts, but can be applied to a named SQL Server instance present on the machine.

Named Instance Buffer Cache Hit Ratio
Named Instance Free Pages
Named Instance Total Server Memory
Named Instance Batch Requests/sec
Named Instance Average Wait Time (ms)
Named Instance Number of Deadlocks/sec
Named Instance Full Scans/sec
Named Instance User connections
Named Instance Cache Hit Ratio

  • The following rules are the same as their first instance counterparts, but can be applied to a second SQL Server instance present on the machine.

Second Instance Working Set
Second Instance Processor Time
Second Instance Page File Bytes
Second Instance Virtual Bytes


Applies To

All versions


Related Information

None

KBID-2514

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.