Continuity Engine Troubleshooting - Channel Drops

Continuity Engine Troubleshooting - Channel Drops

This article discusses unexpected channel drops. Under normal operations, Neverfail Continuity Engine maintains continuous communications between servers using the Neverfail Channel. When communications between servers fail, the condition is referred to as a channel drop.
Channel Drops

Learning objectives 

At the completion of this session you should be able to:
  1. Identify the symptoms of a channel drop.
  2. Identify the causes of a channel drop.
  3. Recall how to correct a channel drop.

Overview

They Neverfail Channel can operate on a private network segment, separate from the public network, or on the same network segment. Continuity Engine isolates replication traffic from public traffic and allows for continuous communications between servers, which is essential for Continuity Engine to function properly.

Symptoms

Should Neverfail Channel loose communications between servers for a period which exceeds the configured Failover timeout, a passive server will attempt to assume the active role. This is by design, as Continuity Engine assumes that the current active server has failed.
However, if the active server is in fact still operational and only the Channel connection itself has failed, the passive server may believe that the active server is no longer in operation and wrongly attempt to assume an active role. 
This condition, known as False Failover, can cause Split-Brain Syndrome, where a passive server becomes active in attempts to provide services to the users while the original active server is still functioning. 

Causes

When Channel drops are experienced, it is generally the result of communications problems. These can be caused by:
  1. Faulty crossover cables if connected directly or a faulty switch, if used.
  2. A faulty network card.
  3. Improper configuration of Neverfail Continuity Engine.
  4. Firewalls that incorrectly identify the channel replication traffic as an attack and subsequently blocked to traffic on ports used by Continuity Engine.
  5. The active server being too busy to respond to the Heartbeat message from a passive server. 

Resolutions

To correct the Channel drop, you should first identify the cause. Check each of the following for faults:
  1. Check that patch cables are not faulty. Use different patch cables to perform this test. 
  2. Check that there are no other applications using Neverfail Continuity Engine ports, and that firewalls have not been manually or automatically configured to block traffic on those ports. 
  3. Ensure that latest network card drivers are installed on channel NICs.
  4. Make sure the channels are configured with the correct IP addressing. 
  5. Make sure that the channel IP addresses are set correctly within Neverfail Configure Server Wizard.
  6. Confirm that routing is working correctly. This can also be used in LAN environments to confirm channel routes. Use the Windows diagnostic tools pathping and traceroute to identify excessive packet loss or inefficient routing.
  7. In a WAN environment, check for sudden bursts in network traffic from other applications. If Quality of Service (QoS) is operating on the network, check that Neverfail Channel communications are not restricted. 
  8. If the channels connect across a VPN link in a WAN, the VPN may have dropped, timed-out, or prevented automatic reconnection. Therefore, verify that a valid VPN connection exists. 
More information about channel drops is located in the Neverfail Knowledge Base.

Wrap Up

This article discussed details about unexpected channel drops. Remember these key points:
  1. The symptoms of channel drops:
    1. False Failovers.
    2. Split-Brain Syndrome. 
  2. The causes of channel drops:
    1. Faulty patch cables or switches.
    2. Faulty NICs.
    3. Improper configuration.
    4. Firewalls blocking channel traffic.
    5. Excessive loads on the active server. 
  3. How to troubleshoot and resolve a channel drop:
    1. Replacing patch cables and switches.
    2. Checking never fail port use.
    3. Ensuring nic drivers are current. 
    4. Verifying the IP addressing.
    5. Reviewing routing configuration.
    6. Ensuring adequate bandwidth.

    • Related Articles

    • Continuity Engine Troubleshooting - Synchronization Failures

      Neverfail Continuity Engine provides protection to your applications by replicating data to a passive server. Continuity Engine attempts to synchronize protected data on all servers and continually replicates changes to that data. This article ...
    • Continuity Engine Troubleshooting - MaxDiskUsage Errors

      This artucke introduces you to MaxDiskUsage errors when using Neverfail Continuity Engine. Continuity Engine generates MaxDiskUsage errors when either the send or receive queues are full.  MaxDIskUsage Errors Learning objectives  At the completion of ...
    • Continuity Engine Troubleshooting - Application Slowdowns

      This artcle discusses application slowdowns that you may encounter under routine operations. Neverfail Continuity Engine is designed to provide robust continuous application support and a slowdown of protected applications is considered an abnormal ...
    • Continuity Engine Troubleshooting - Invalid License

      This adticle introduces you to license related problems that you may encounter durig routine operations. Invalid License Learning objectives  At the completion of this session you should be able to: identify the symptoms of an invalid Neverfail ...
    • Neverfail Continuity Engine Cloning and Recloning limitations: disconnected Engine cluster

      Summary This Knowledgebase article provides details and workaround procedure for the following situation: after the cloning (initial Secondary or Tertiary deployment) or passive servers recloning the Neverfail Engine cluster is not connected via the ...