Synchronization failure due to Lotus Domino hung processes

Follow

Summary

This Knowledgebase article describes a known issue with Lotus Domino servers where synchronization may fail due to a Lotus Domino hung process.


More Information

Symptom

The Lotus Domino system self diagnostic tool (NSD) might not stop in a timely fashion causing the File System Status to report as out of sync.

Cause

Lotus Domino systems have a self-diagnostic tool called NSD. This tool is automatically launched when the Domino server stops responding. In some cases the process that coordinates diagnosis of the system state prior to the failure hangs and keeps some Domino-specific files loaded thus preventing pair synchronization.

Resolution

After Lotus Domino server stops responding, NSD.EXE will start to collect information of the last state of the server before the incident that had caused the improper functioning of the system. Sometimes NSD will hang and lock a few java files and Domino-specific files stopping the completion of the synchronization of the pair. The switch-back is possible only after manually stopping the Domino specific processes (releasing the locked files).

NSD gives you all current information about the state of the server (calls stacks for all threads, memory information, and so on). In the event of a problem that hinders the Lotus Domino functionality, an NSD log file will automatically be generated by the Domino server and stored in the data\IBM_TECHNICAL_SUPPORT directory. An NSD log will have a file name with a time stamp showing the time when the NSD was generated. For example: Nsd_W32I_KIRANTP_2006_01_17@17_17_18.log indicates this NSD was created on January 17, 2006. When NSD runs, it attaches to each process and thread, to dump the calls stacks. This can help you determine the cause of a server or workstation malfunctioning.

The "heart" of an NSD file is the stack trace section. This section provides a breakdown of the code path each thread in a currently existing process traversed to put it in its current state. This is very helpful in examining hang or failure situations on a server. Also, by examining the NSD file, you can find any core files generated in a Domino data directory, and a base-level analysis to trace the final stack of calls that were made by the process that died and left behind the core. In a complex product such as Domino, a stack trace of the same type of action on two different servers can produce different results.

In the NSD file, you can identify the executable in the failing process by performing a word search for "fatal," "panic," or "segmentation." By finding the process, we can see what preceded it, and hopefully determine how the failure occurred. When neither "panic" nor "fatal" are found, sometimes a core dump will contain a reference to a "segmentation fault" in a function. This indicates that the process tried to access a shared memory segment that was corrupted for some reason, and will fail without calling "fatal_error" or "panic".

For more information about Troubleshooting Lotus Domino hangs and crashes please see:
http://www-128.ibm.com/developerworks/lotus/library/domino-server-crashes/

For All Versions

For systems where these symptoms occur once or infrequently, manually terminate the processes using Task Manager.  This will allow the files to become synchronized.

    Note:  The following procedure should be implemented only if this event occurs repeatedly or on a frequent basis.

    Please note: This workaround is valid only for servers running on Windows 2003 Operating system. For servers running Windows 2000, please contact Neverfail Support for details.

For V5.2.2 and Earlier

  1. In order to avoid sync problems due to locking of files, a couple of lines should be added at the end the stop script of Lotus Domino located in Neverfail installation folder: R2\Scripts\LotusDominoServer\. These commands ensure that the Domino processes will be cleaned up and the synchronization of the pair will take place successfully. The commands are designed to work for Windows Server 2003 and Lotus Domino higher than 6.0.3 version. NOTE: The following lines needs to be added below the last line in the existing stop script. By failing to do so, a clean Lotus Domino shutdown might be prevented.

Short description of the commands:

  • taskkill.exe is stopping forcefully the specified process and all child processes; results are hidden from the console
  • the first FOR is finding the Domino installation path
  • the second FOR finds the Data folder of Lotus Domino
  • the third FOR is getting the drive letter of Domino installation
  • the last 3 lines are launching nsd.exe –kill command in order to definitely flush all Domino related services

@taskkill.exe /F /T /IM nsd.exe >NUL 2>&1

@taskkill.exe /F /T /IM nservice.exe >NUL 2>&1

@taskkill.exe /F /T /IM nserver.exe >NUL 2>&1

@FOR /F "Tokens=1* Delims=REG_SZ, " %%A IN ('REG QUERY "HKEY_LOCAL_MACHINE\Software\lotus\domino" /V Path') DO @SET zpath=%%B

@FOR /F "Tokens=1* Delims=REG_SZ, " %%A IN ('REG QUERY "HKEY_LOCAL_MACHINE\Software\lotus\domino" /V DataPath') DO @SET data=%%B

@FOR /F "Tokens=2* Delims=REG_SZ,\" %%A IN ('REG QUERY "HKEY_LOCAL_MACHINE\Software\lotus\domino" /V DataPath') DO @SET DL=%%A

@%DL% >NUL 2>&1

@CD %data% >NUL 2>&1

@%zpath%\nsd.exe -kill >NUL 2>

For V5.3 and Later

  1. Create a single .bat file consisting of the following commands:

@taskkill.exe /F /T /IM nsd.exe >NUL 2>&1

@taskkill.exe /F /T /IM nservice.exe >NUL 2>&1

@taskkill.exe /F /T /IM nserver.exe >NUL 2>&1

@FOR /F "Tokens=1* Delims=REG_SZ, " %%A IN ('REG QUERY "HKEY_LOCAL_MACHINE\Software\lotus\domino" /V Path') DO @SET zpath=%%B

@FOR /F "Tokens=1* Delims=REG_SZ, " %%A IN ('REG QUERY "HKEY_LOCAL_MACHINE\Software\lotus\domino" /V DataPath') DO @SET data=%%B

@FOR /F "Tokens=2* Delims=REG_SZ,\" %%A IN ('REG QUERY "HKEY_LOCAL_MACHINE\Software\lotus\domino" /V DataPath') DO @SET DL=%%A

@%DL% >NUL 2>&1

@CD %data% >NUL 2>&1

@%zpath%\nsd.exe -kill >NUL 2>

  1. Using the Neverfail Heartbeat Management Client, click on Application -> Tasks .
  2. Click on the Add button.
  3. Add a 'Task Name' and under 'Task Type' select PostStop.
  4. Navigate to the location of the newly created .bat file and when complete, click OK .


Applies To

All Versions


Related Information

None

KBID-497

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.