This Knowledgebase article provides information about the purpose and operation of the UPS Auto Switchover Script.
A shutdown command received by Windows from the UPS management software is usually sent to the Neverfail R2 Service as a graceful user shutdown. If this shutdown command has been received locally on the active server, the Neverfail service will stop, without switching roles between the two servers. In pre 6.x versions of Neverfail Heartbeat, it will shutdown heartbeat on the remote server as well.
If the active server is powered through a UPS and the UPS management software allows a script to be executed in case of power failure, Neverfail Heartbeat can be asked to switchover to the other server, before shutting down.
This solution checks whether the current server is the active server and, if so, it “will attempt” an auto-switchover to the remote server in a Pair deployment.
In a Trio deployment where the Primary and Secondary servers are located in the same LAN and the Tertiary server is remote, the script will attempt to make the Tertiary server active and bypass the check of whether the local server is active. For more information about use of this script in a Trio configuration, see the Trio Deployment section below.
The difference between a switchover and an auto-switchover is that in case of an auto-switchover, once the newly active server finishes the startup procedure, it will stop replication. In this way, it will ensure that the role of the servers will not change after the script has executed successfully. The newly active server will continue servicing clients, and the previously active server will be shutdown gracefully by the UPS management software. Beginning with Neverfail Heartbeat version 6.x, the previously active server will also stop the heartbeat service automatically during the auto-switchover. With Neverfail Heartbeat version 6.x onwards, auto-switchover is only possible between the Primary and Secondary servers.
Once the power is restored, user intervention is required to resynchronize the server cluster. IF the UPS Script completed successfully, the previously active server will come up as passive. In the event heartbeat was shutdown at the remote site (the default in pre 6.x versions), it must be manually restarted.
The UPS Auto Switchover Script
The script is located in the Neverfail Extranet -> Utilities -> Products/Downloads section. Please see Knowledgebase article #1005 - Versions of the UPS Auto Switchover Script , for details of the various versions of the script that may be.
The script works based on the error level returned by the commands issued. The first query sent to the Neverfail engine is whether the server is active or not. If the server responds as not active, the script will exit with an exit code “1”. If the server responds as active, the script will continue the execution.
The next step is the issue of an auto-switchover locally on the server. One of the advantages of the auto-switchover command is that it will trigger an alert, highlighting the situation. If email alerts are configured, the auto-switchover message will be emailed as well.
The script will send a message, depending on the version of Heartbeat installed:
Pre 6.x versions:
Power Failure - UPS initiated auto-switchover. The %remote server% server will become active and replication will be stopped .
%remote server% will display the value of the server that should become active following the auto-switchover.
For example, if the Primary server is currently active and it looses power, the message will state Power Failure - UPS initiated auto-switchover. The Secondary server will become active and replication will be stopped .
Version 6.x and later
Auto Switchover Started. Power Failure - UPS initiated auto-switchover resulting in %remote server% being made active and heartbeat service shutting down on %local server%
%remote server% and %local server% indicate each the new and previously active servers, correspondingly.
For example, if active server was the Primary, the message will state: Auto Switchover Started. Power Failure - UPS initiated auto-switchover resulting in SECONDARY being made active and heartbeat service shutting down on PRIMARY .
Note: For Trio Deployments, the only message sent is that the Tertiary server was made active. There will be no reference to UPS auto-switchover.
If the auto-switchover command was possible and successful, the command will return an error level of “0” and the script will have an overall exit code “0” (success). This means that the newly active server is operational, the local server is passive, replication is stopped, the local server can be shutdown safely, and the clients will still be serviced by the remote server.
If the auto-switchover or MakeActive commands are not possible, the File State Manger will “veto” the commands. The message Power Failure – UPS initiated auto-switchover… will still be shown, but the auto-switchover command will not be finalized as a result of being canceled by the File State Manager. The cancellation will be shown as an error in the Server Overview , and on the Logs page of the Neverfail Heartbeat Management Client as Switchover vetoed by the File State Manager… In this case, the script will have an exit code of “1”, which means that it was unsuccessful.
Note: For auto-switchovers triggered from the Secondary active, for versions 6.x and later, the script will not revert ExitCode 0, even if the script was successful.
The following provides the information necessary to configure the Auto Switchover Script.
For a switchover to be possible the servers should be “Connected” with the Neverfail Cluster synchronized. There are several situations when an auto-switchover will not be possible. If, at the moment the command is received by Neverfail the above conditions are not met, the auto-switchover will be canceled. If the status of the pair is Out of Synch , Unchecked , Unchecked and Busy Processing , Synchronized and Busy Processing , or the servers are not connected, the auto-switchover command will fail, and the script will return exit code “1”.
The script implements a timeout period for the command to be successful. The timeout selected should be sufficient for all local protected services to be stopped on the formerly active server and started on the newly active server, and is dependent on the implementation. Please note that this timeout is used to set the overall success of the script. If within this timeout period a successful auto-switchover event has not been received (either it was not possible or the time was to short for the services to stop and start on the other server), the status of the script will be failed (exit code “1”).
Before implementing the script, open the script file in 'Edit' mode and set the default timeout (600000 milliseconds – 10 minutes) to a value grater than the time needed for a complete switchover in your implementation. This value should be selected considering also the UPS battery time offered by your hardware. The ideal value would be equal to the sum of the start and stop scripts timeouts for all protected applications.
To configure the script, follow the steps below:
- Download and extract the contents of the .zip file to a local destination on the server.
- From the folder matching your installed version, copy the script batch file into the <Neverfail install dir>\R2\Bin folder .
- Edit the script’s default timeout to a value that should allow a complete switchover in your scenario. (default is 600000 milisec).
- In the UPS management software select the correct path to the script and set the Working Dir to point to the <Neverfail install dir>\R2\Bin folder.
- Configure the script to run under the User Account Neverfail Heartbeat was installed with.
For Neverfail Heartbeat version 6.x Pair deployments only:
Note: Do not alter any other variables on any other versions.
- Edit the Identity variable within the script to match the current server’s identity: if the script is installed on the Primary, leave Primary, otherwise change to Secondary
If the UPS Power management software offers a *monitoring function* for the success of the script you may use it to re-execute the switch in case of failure (exit code “1”).
- If the server is Passive – it will return exit code “1” with no harm made.
- If the status of the server changed in the mean time, achieving “Synchronized”, the auto-switchover would be possible and will be issued.
- If the auto-switchover is still not possible, the script will return exit code “1” with no harm made.
- If the MakeActive for Tertiary was not possible, the script will also return exit code “0”.
- If the auto-switchover is not possible for any of the reasons listed above, the active server will remain active and it *will* be shutdown. This script is intended to work if the server is active and the server pair is “Synchronized”.
- If the timeout period is configured too short, and the wait for the auto-switchover command to return successfully expires, the overall status of the script will be failed and the exit code will be “1” even if the auto-switchover was possible and succeeded afterwards.
- Administrator intervention is required to resynchronize the server pair. The previously active server is shutdown and the newly active server has Neverfail Heartbeat stopped.
- The use of this script is recommended in cases of WAN implementations, where a local UPS powers the current active server.