This knowledgebase article provides additional troubleshooting information in the event that the VMware vCenter Server Heartbeat Channel experiences channel disconnects. Before applying the corrective actions highlighted in this article, please validate that ALL troubleshooting steps referenced in Knowledgebase article #992 (for Neverfail) / 1008551 (for VMware) have been addressed.
- The protected pair experiences intermittent / totally random channel disconnects.
- During the channel disconnect, the NFServerR2 service stops listening on both the channel and management ports ( 57348 / 52267 defaults) on the active server (this can be verified by using the netstat -ano | find “<PID>” command).
- The VMware vCenter Server Heartbeat logs show gaps on the active server during channel disconnects.
- Service Control Manager reports 7011 errors in the system logs on the active server during the channel disconnect – A timeout (30000 milliseconds) was reached while waiting for a transaction response from the NFServerR2 service .
- Disabling the monitoring rules is enough in order to recover from this condition.
This issue occurs due to a deadlock introduced by Microsoft to the Advapi32.dll file in a previous system change. The Advapi32.dll is loaded indirectly into the VMware vCenter Server Heartbeat address space when it first attempts to load the performance counters (PerfLib) library which is implemented in the Advapi32.dll . The following are the affected versions of the Advapi32.dll file:
Applies to Windows Server 2008 R2 Service Pack 1.
If the version of the Advapi32.dll present on the protected server is one of the versions listed above, see Microsoft KB article http://support.microsoft.com/kb/2878378 and apply the recommended hotfix to correct the issue.
vCenter Server Heartbeat – multi ID installs.