This Knowledgebase article provides information about what to do if a server crashes during a switchover.
In the unlikely event that one server in a Neverfail pair fails during Switchover, the surviving server will always end up active. Depending on where the command to switchover originated, and the identity of the server that failed, the surviving server may show its system status as 'Replicating' or 'Not Replicating' in the System > Status & Control panel of the Neverfail Heartbeat Management Client.
The protected application will continue to run, UNLESS, the switchover was triggered by failure of the protected application AND the surviving server is the one where the application failure occurred. This is true for user-initiated switchovers, and for switchovers driven by a failure of a protected application.
Note: In the description which follows, the server that failed or crashed is referred to as the 'failed' server, while the server which continued to operate is referred to as the 'surviving' server, even if the protected application is not operable on that server.
- BEFORE attempting to reconnect the server pair, the reason for the crash should be determined and corrective action taken. Do not simply reboot the failed server and attempt to reconnect it. It is important to identify the cause of any failure first, and it is not recommended to reconnect the failed server before the surviving server has completed its transition to the 'active' role.
- When the failed server is *initially* rebooted, Neverfail Heartbeat will be in one of two states:
- The Neverfail Server R2 service is running
- The Neverfail Server R2 service is not running
If the Neverfail Server R2 service is running, and you log in to Windows locally on the failed server, you will see a warning message:
'Cannot start replication because previous run did not shutdown properly. Check configuration
Click OK to dismiss the warning message. This action will automatically stop the Neverfail Server R2 service.
If the Neverfail Server R2 service is running, and you log in to Windows remotely via Terminal Services (Remote Desktop in Windows 2003) on the failed server, you won't see the warning message, which is only displayed locally; so it is necessary to stop the Neverfail Server R2 service via the Windows Service Control Manager or using the 'net stop' command.
- Once you are sure the Neverfail Server R2 service is not running on the failed server, you should shut down Neverfail Heartbeat on the surviving server, leaving any protected applications running. This will prevent the build up of large on-disk update queues on the surviving server, while allowing continuity of service from the protected applications. To ensure any protected applications continue to run when Neverfail Heartbeat is shut down, connect a Neverfail Heartbeat Management Client to the server pair, and click on Shutdown on the 'Status & Control' panel, choosing the option 'Exit Neverfail Heartbeat without stopping Application'.
- If the protected application is not running after the failure occurs (for example, because the original switchover was triggered by an application failure), the reason for the application failure must be determined. Ensure Neverfail Heartbeat is shut down on the surviving server before attempting to diagnose the problem, and that the Neverfail Application Module Monitor service is stopped. It is safe to stop the Neverfail Application Module Monitor service manually for diagnostic purposes. Once the problem has been identified and corrected, continue with the recovery procedure.
- On the failed server, run the Configure Server wizard by right-clicking the Neverfail Heartbeat system tray icon and choosing 'Configure Server Wizard' from the menu. Run through the Wizard and set the server role to 'passive'. Click Finish to save the changes.
- Start Neverfail Heartbeat on the passive server by right-clicking the Neverfail Heartbeat system tray icon and choosing 'Start Neverfail Heartbeat'. Repeat on the active server. The server pair will connect and synchronize data.