This Knowledgebase article provides information about the effects of rebooting a Neverfail Heartbeat active server.
Under normal operating conditions, Neverfail Heartbeat should exit gracefully during a reboot, resulting in the server restarting as an active server. This is because the service is requested to stop via the OS during the shutdown. The OS has a configured time to wait for graceful exit of applications/services during a shutdown and provided this timeout is not exceeded, the Neverfail service will shutdown cleanly.
If the time to shutdown Neverfail and the protected application exceeds the configured timeout, then the OS will force the Neverfail service to terminate resulting in an unclean shutdown. The server will reboot and return as a passive server because it did not exit cleanly, resulting in two passive servers. For information about how to resolve two passive servers, please see Knowledgebase article #984 - 'Resolve Two Passive Servers'.
Investigating the Cause
Check the NFLog during the exit sequence and look for the "Beginning op CtrlOpExit" or "EXITING" entry. Scroll down and you should see the Applications stopping service by service. Errors or delays in services stopping should be visible. Note, this is only available in Neverfail Heartbeat 4.1 or later as preceding versions will require you to check the application logs themselves. The cause of these delays should be investigated as they are outside the scope of Neverfail. For example, a likely candidate in Exchange is the SMTP service, which may take a very long time to stop when the queues are very large.
These problems often manifest themselves during a switch as an application timeout. Again, check either the Windows Application logs and/or Neverfail Heartbeat logs. Record exactly how long the application takes to stop in minutes. This should be the same with and without Neverfail running.
Large amounts of data in the active server (unsafe queue)/interceptor buffer. If a large amount of replication traffic was created shortly before the scheduled reboot, it may be waiting to be written to the passive server across the channel and needs to be completed before Neverfail can exit. Look for lines like
DEFERRING STOP nnnn BUFFERS AWAITING RELEASE
Where nnnn is the number of buffers. A very large number will indicate a substantial amount of information that is being held in the buffer. If maintenance and or housekeeping activity is scheduled to start before the reboot, reschedule either the Exchange/SQL Server maintenance or the reboot so that the system stabilizes prior to reboot. Look for other repetitive messages from the interceptor or the comms manager that may indicate a system load that needs to be processed prior to system exit.
If Neverfail Heartbeat is experiencing problems, then errors/exceptions will be logged in NFLog.txt during the shutdown of the Neverfail Heartbeat server.
The final course of action would be to schedule the Neverfail service to shutdown prior to the reboot to ensure the shutdown completed successfully.