Neverfail Heartbeat Cascade Operational Guidelines

Follow

Summary

This Knowledgebase article provides operational guidelines when deploying Neverfail Heartbeat Cascade.


More Information

To ensure the successful operation of Neverfail Cascade, please consider the following factors before installing and running Cascade on a live server:

  • Cascade Bandwidth - This is the bandwidth between the secondary and tertiary sites and determines the Cascade Queue Drain rate. If the bandwidth is 10Mbps then you can expect a drain rate of approximately 5Mbps. The higher the drain rate the slower the buildup of application data in the Cascade queue.
  • Cascade Queue Growth rate - This is the rate data is sent across the Neverfail Channel between the Neverfail Heartbeat server pair.

The following factors influence the load on the channel, and therefore Cascade:

  • Application load - As the application load increases, the Neverfail Channel bandwidth (Mbps) usage will increase [refer to stats in the Neverfail Heartbeat Management Client].
  • Overhead of additional sync/verify traffic resulting from Full System Checks etc.
  • Max Cascade Queue Size (Secondary server) - Data is written here and then copied to the Tertiary server. If the Cascade Queue Growth rate exceeds the Cascade Queue Drain Rate (Mbps) then this queue will grow over time.

Note: Sustained periods of application load or full system checks will result in large passive server (unsafe) queues on the Secondary server. The longer the queue the greater the potential for lost data in the event of Primary site failure. The Cascade queue represents the amount of application data that could be potentially lost during a Primary site failure.

Guidelines

  1. Average Application load < Cascade bandwidth
    1. Cascade Queue

      Note that the physical disk space required does not equal the size of the queue entered in the Cascade GUI. The queue size reported in the Cascade GUI on the Secondary Server represents the amount of application data in the queue. If compression is enabled, then the size of the queue on disk will be considerably smaller. Compression rates vary depending upon the application, however a factor of 4 to 5 can be expected with SQLServer and Exchange. The Cascade Tertiary Log will record the squish factor (compression ratio).
    2. Full System Check (FSC)

      If FSCs need to be performed when the application is running then the size of the required Cascade queue needs to be estimated to ensure successful completion of the Full System Check task. Failure to do so will result in Cascade queue overflows. To clear this event requires the Neverfail Heartbeat replication to be stopped and started - resulting in the Full System Check being performed again.

      Cq = Cascade Q Size (MB)
      Ds = Size of protected data set on the Primary (MB)
      T = Time to empty Cq at load @ XMbps (s).
      Refer to Guideline 4 to estimate this time if the load is not 0Mbps.
      Cl = Channel Load (Mbps) from application & full system checks etc
  2. Cq = (Cl * T) + Ds
    1. Application Load

      If an application load is known to peak above the average channel data replication rate at predictable times of day then it is advisable to estimate the max queue size required to accommodate this load. When calculating the required queue size, use the worst-case scenario.

      CqDR = Cascade Queue Drain Rate (Mbps)
      CqGR = Cascade Queue Growth Rate (Mbps)
      RCqS = Required Cascade Queue Size (MB)
      DAs = Duration of Application spike (s)
  3. RCqS = (CqGR - CqDR)/8 * DAs
    1. Example: If the application causes replicated traffic to spike for 4 hrs at a bandwidth of 10Mbps, then the required Cascade queue size would be:

      RCqS = (10 - 5)/8 * (4 * 60 * 60)
      = 9000 MB

      Once the peak load has finished the effective drain rate will determine how long it will take to deplete the queue. The queue will drain faster if the load drops to zero, however the effective drain rate will be zero if the Cascade Queue Growth Rate = Cascade Queue Drain Rate.

      CqDR = Cascade Queue Drain Rate (Mbps)
      CqGR = Cascade Queue Growth Rate (Mbps)
      CqS = Cascade Queue Size (MB)
      TECq = Time to Empty Cascade Queue (s)
  4. TECq = CqS * 8 / (CqDR - CqGR)
    1. Example: If a queue built up to 9000MB and then the application load dropped to 3Mbps and the Cascade Queue Drain rate was 5Mbps:

      TECq = 9000 * 8 / (5 - 3)
      = 36000 s

      It will take approximately 10hrs to drain the queue at this load.

      If however the application load dropped to zero because users stopped using the system

      TECq = 9000 * 8 / (5 - 0)
      = 4 hrs

      It would take about 4 hrs for the queue to empty.

Preventing 'Out-of-Disk Space' Problems

Please carefully consider the location of the Cascade queue folder, the Neverfail Send/Rcv queues, and the maximum sizes of these queues.

Neverfail Heartbeat and Cascade

  1. Installed on the same disk:

    Ensure that there is adequate space available to accommodate Neverfail's MaxDiskUsage [default = 1GB] and the Cascade queue [default = 2GB]. During operation, the Cascade queue is likely to consume significant amounts of disk space in the scenarios detailed in this document. It is essential to ensure that the 1GB [default] Max Disk size is available to Neverfail Heartbeat at all times, failure to observe this will result in Neverfail Heartbeat replication stopping, this will in turn cause Cascade operations to stop as well once the Cascade queue has emptied.
  2. Queues installed on Application Data disks:

    If the Cascade queues are on a separate disk than the Neverfail Send\Rcv queues, ensure that there are no other log or application data directories that can grow unchecked resulting in the system running out of disk space. For example, Exchange log files, as these grow according to application load, and will only be deleted during backups.
  3. Installed on the OS Partition:

    Always ensure that there is adequate space for the operating system to perform operations, if Cascade or Neverfail disk requirements exceed available disk space on the system partition then the OS will fail to operate correctly.


Applies To

Neverfail Heartbeat V5.0.2 and prior


Related Information

None

KBID-503

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.