This Knowledgebase article provides information about Neverfail Heartbeat Out of Disk behavior.
Active Server CommsMgr Logs
The queue on the active server is used to store intercepted data prior to sending across the channel to the passive server. Queue build-ups here indicate communication problems with the Secondary server, or insufficient bandwidth for the data being replicated. The queue stats are displayed in the Neverfail Heartbeat Management Client on the System -> Status & Control tab.
These updates are stored in memory or to disk in the default location (c:\neverfail\r2\log). The maximum size on disk is configurable, MaxDiskUsage, is by default 1GB on v6.0 and earlier and 10 GB on v6.2 and later. Both these settings can be configured via Configure Server wizard when Neverfail Heartbeat has been stopped.
The queue will be written out to disk if the active server is replicating and:
- The passive server was never connected.
- The channel suddenly disconnects and the configured number of heartbeats is very large.
In either case, the following could happen:
- The MaxDiskUsage will be reached; an alert NFChannelExceededMaxDiskUsageException will be logged.
"Exception in CommsMgr [L9] Exceeded the maximum disk usage(NFChannelExceeded MaxDiskUsageException)"
- If available space on the drive is less than the MaxDiskUsage then NFChannelIOException will be logged.
"Exception in CommsMgr [M4] Cannot open log file 2004-09-07-203.log(NFChannelCannotOpenIOException) because there is not enough space on the disk (IOException)"
In both these situations, Neverfail Heartbeat will:
- Cease to log updates to the data.
- Discard all existing logs.
- Upon channel reconnection, a NFChannelLostMessageEvent will be generated.
- A Full System Check will be initiated to get the system back in sync.
Passive Server CommsMgr Logs
When data is received on the passive server, it is stored in the passive server (safe) queue until Apply is ready to handle updating the protected file. Depending upon system load, this will either be in memory or to disk. Under normal operating conditions, this queue should remain small. The passive server (safe) queue stats are displayed in the Management Client on the System -> Status & Control tab.
A build up in the queue may indicate a problem applying updates to the protected files. Common causes are:
- Hardware / software problems with the disk subsystem.
- Under spec'd equipment, for example disk drives on the passive server are far slower than the active server disks.
- Applications running on the passive server blocking updates.
When this queue gets very large, the protected application will begin to slow down to avoid overloading the passive server with updates. If the configured limit is reached, Neverfail will raise a NFChannelExceededMaxDiskUsageException on the passive server. The Neverfail server should be shutdown opting to leave the application running on the active server. The application's performance will return to normal.
The passive server's hardware should be investigated for problems.
- Start by checking the Windows Application and System logs for Errors and Warnings regarding impending hardware failure, or other problems.
- Device Manager may show problems with drivers or RAID controllers malfunctioning.
- Alternatively, run system diagnostic checks that are supplied with the hardware.
Passive Server Lacks Protected Disk Space
This occurs when the active server has more disk space than the passive server. Protected data cannot be written to the passive server, so updates will fail. Apply will raise a Disk Full or Quota Exceeded exception in the log saying that it cannot create files. The system will attempt to stop.
"Error","Disk Full Or Quota Exceeded","[N27]Failed to write information for the file: D:\protected\some file.txt to the disk. Either the disk is full or the quota (for the SYSTEM account) has been exceeded."
Interception Will Not Start
Note: This applies to Neverfail Heartbeat V4.4 and earlier only. If there is no available space on the C:\ drive for NFPagfil.sys to write to, then the following error will be written to the Event Log during start up:
Exception in Controller: [U11]Could not start replication. (ControlException)
because Error starting Interceptor (ControlException)
because DRIVER ERROR: Driver.allocBuffer failed: There is not enough space on the disk.
Neverfail Heartbeat will not be able to startup until there is free space on the C:\ drive.
Employ the following to prevent Out of Disk Behavior:
- Ensure that the available disk space on a drive exceeds MaxDiskUsage. The default configuration requires 1GB of disk space for v6.0 and earlier and 10GB for v6.2 and later on the Neverfail installation drive.
- The passive server disk space should equal that of the active server.
Consider hosting the Neverfail log directory on its own disk, or one that does not host:
- Protected Application data.
- The Windows System folder.