SolarWinds - Resolve Two Active Servers

Follow

Summary

This Knowledge Base article provides information about the symptoms, causes, and resolutions of two active servers.


More Information

The occurrence of two active servers is not by design and when detected, should be resolved immediately. When there are two identical active servers live on the same network, SolarWinds refers to the condition as split-brain syndrome.

Symptoms

Split-brain syndrome can be identified by the following symptoms:

  1. Both servers in the pair will be running and in an active state. This should be visible on the Taskbar icon as P / A (Primary and active) and S / A (Secondary and active).
  2. An IP address conflict may be detected on a server pair running SolarWinds Orion Failover Engine on the principal IP address.
  3. A name conflict may be detected on a server pair running Orion Failover Engine. Typically, in a WAN environment, the Primary and Secondary servers connect to the network using different IP addresses. This will mean that no IP address conflict will occur. However, if the servers are running with the same name then a name conflict may result. This will only happen if both servers are visible to each other across the WAN.
  4. Clients cannot connect to the server running Orion Failover Engine.

Cause

Two active servers (split-brain syndrome) can be caused by a number of issues.  The most common causes of two active servers are:

  • Loss of the SolarWinds Channel connection (most common in a WAN environment).
  • The active server being to busy to respond to heartbeats.
  • Misconfiguration of the Orion Failover Engine software.

It is important to determine the cause of the split-brain syndrome and resolve the issue to prevent reoccurrences of the issue.

Resolution

Once split-brain syndrome has occurred, the server with the most up-to-date data must be identified.

Note: If the wrong server is identified at this point, it can result in data loss. Care should be taken to reinstate the correct server.

The following can help identify the server with the most up-to-date data:

  1. Check the date and time of files on both servers. The most up-to-date server should be made the active server.
  2. From a client PC on a LAN, run 'nbtstat -A 192.168.1.1' where the IP address should be the principal IP address of your server. This can help identify the MAC address of the server currently visible to client machines.

Note: If the two active servers have both been servicing clients, perhaps at different WAN locations, one and only one server can be made active. Both servers will contain recent data, which cannot be merged using Orion Failover Engine. One server must be made active and one server passive in order to restart replication. Once replication is restarted, ALL data on the passive server will be overwritten by the data on the active server. It may be possible to extract the up-to-date data manually from the passive server prior to restarting replication. Please consult the Microsoft knowledge Base for information regarding various tools that may be used for this purpose. For further information, please contact your SolarWinds Support Representative.

How to resolve two active servers (split-brain syndrome):

  1. Identify the server with the most up-to-date data or the server you would most like to make active.
  2. Shutdown Orion Failover Engine on both servers (if it is running).
  3. On the server you would like to make passive, right- click the Taskbar icon, and select Server Configuration wizard.
  4. Click the Machine tab and set the server role to 'passive'. Note: Do not change the Identity of the server e.g. Primary/Secondary.
  5. Click Finish to accept the changes. Reboot this server.
  6. Start Orion Failover Engine (if required) and check that the Taskbar icon now reflects the changes by showing P / - (Primary and Passive) or S / - (Secondary and Passive).
  7. On the active server, Right-click the taskbar icon and select Server Configuration wizard.
  8. Click the Machine tab and check the server role is set to 'active'. Note: Do not change the Identity of the server e.g. Primary/Secondary.
  9. Click Finish to accept the changes. Reboot this server. Note: As the server restarts, it will connect to the passive server and start replication. Once this happens data on the passive server will be overwritten by the data on the active server. Please see above for further information on how to check which server contains the most up-to-date data.
  10. Start Orion Failover Engine (if required) and check that the Taskbar icon now reflects the changes by showing P / A (Primary and active) or S / A (Secondary and active)
  11. Log into the SolarWinds Orion Failover Manager.
  12. Check that the servers have connected and replication has started.


Applies To

All Versions


Related Information

SWREFID - 1935

KBID-1935

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.