During a Switchover of Microsoft Exchange, the SMTP Service Fails to Stop, Resulting in an Application Manager Timeout Exception. Replication does not Start after the Switchover has Completed

Follow

Summary

This Knowledgebase article provides information about a situation where during a switchover of Microsoft Exchange, the SMTP service fails to stop resulting in an Application Manager timeout exception and replication fails to start after the switchover has completed.


More Information

Impact of service failure:

If the SMTP service, or any other service, fails to stop within the timeout period, it can cause the Neverfail Application Manager to timeout (the default timeout period is 10 minutes). If this timeout happens, an exception will also occur. This behavior is by design, to protect your application data.

Symptoms

  1. The application will start on the new active server.
  2. Replication will be stopped because part of the application is still running on the new passive server. This is by design, because files may still be kept open by the application on the passive server.
  3. Replication should NOT be started until the application has been stopped manually on the passive server via the Service Control Manager.
  4. In some cases the passive server may require a reboot if the service cannot be stopped.

Causes

There can be several reasons why the SMTP service may fail to stop within the allocated time period, all of which need to be investigated before changing any configuration within Neverfail Heartbeat.

  1. SMTP Mail queued for Delivery.
  2. There may simply be a large number of items that need to be delivered prior the service stopping. This is normal; check the state of the SMTP queues in the Exchange System Manager or the number of items in the on disk queue.

    Similar issues can be experienced in a Microsoft Cluster Environment - see Microsoft Knowledge Base article http://support.microsoft.com/default.aspx?kbid=821833 , which states:

    "In Exchange 2000 clusters, the SMTP resource and the information store resource frequently take the longest time to go offline or come online. In many cases, this delay occurs because of large SMTP queues or because large databases require more time to mount or dismount. This delay can lead to longer failover times while the information store resource waits for the SMTP resource before it can try to go offline or come online."

  3. BAD Mail.
  4. Problems may be experienced with very large numbers of mail items in the badmail queue. This can be a result of SPAM. Refer to this Microsoft Knowledge Base article to ensure you are not an Open Spam Relay:

    http://support.microsoft.com/default.aspx?scid=kb;en-us;310380

    The default location of the folder is C:\program files\exchsrvr\mailroot\badmail. If this folder contains thousands of items then problems will occur stopping the service. Refer to this Microsoft Knowledge Base article for advice:

    http://www.msexchange.org/tutorials/SMTP_Virtual_Server_Uncovered.html .

  5. Corrupt Mail Message

If the folder C:\program files\exchsrvr\mailroot\vsi 1\queue contains a corrupt mail message, then this can cause SMTP to stop delivering mail or fail to stop in a timely manner. Refer to these Microsoft Knowledge Base articles:

http://support.microsoft.com/?id=304166

http://support.microsoft.com/?id=314327

http://support.microsoft.com/default.aspx?scid=kb;en-us;831572&Product=exch2k


Alternatively, you can stop the Exchange Services, delete the oldest mail message from the queue manually, and then restart Exchange. It is advisable to back the file up first in case it was not the problem item.

Resolution

If the service consistently takes time to stop, then simply increase the Application Timeout value for the application. This will increase the time to perform a switchover; however, it will safely ensure that all mail is processed during shutdown, and that normal service will continue after the switchover completes.

Via the Neverfail Heartbeat Management Client ( Start -> All Programs -> Neverfail -> Manage Server ), select System -> Status & Control , and click Stop Replicating . Confirm that you wish to stop the protected application as well. Now select the Application Manager -> Configuration , expand the Neverfail Server node and select 'Exchange Stop script'. Set the timeout (in seconds) to a larger value. Select System -> Status & Control , and click Start Replicating .

As a last resort, and if fast a switchover time is a priority, then the following change can be made to the Exchange Application Stop script, (\r2\scripts\stop.bat).

CAUTION: Changing the behavior of the script in this manner will not guarantee SMTP mail queues are emptied prior to the switchover completing. The timeout value should be set as large as possible to ensure adequate time for normal SMTP processing to occur and this time is organization-specific. These changes are made at your own risk and Neverfail cannot guarantee SMTP mail delivery under these conditions.

The SMTP mail queue is not protected with the default set of filters, so you should provide sufficient time for mail queues to empty. Failure to do so will leave some items in the mail queue on the passive server, and these items will not be delivered until the server becomes active or they are manually copied to the active server's mail queue.

  1. Locate the following line: NfNet Stop "SMTPSVC" /R || set FAILED=1
  2. Change the line to (where x is the number of seconds you are prepared to wait for the service to stop):

NfNet Stop "SMTPSVC" x /F /R || set SMTPFAILED=1
if %SMTPFAILED% == 1 (
echo "SMTP Failed to stop within timeout period."
echo "Forcing SMTP to stop."
iisreset /stop || set FAILED=1
)

If SMTP fails to stop in the allocated time period, then the following output will be seen in the Application Status log:

24/08/04 09:32:17 [ Exchange : stop ] ERROR>
24/08/04 09:32:18 [ Exchange : stop ] ERROR>Failed to STOP SMTPSVC : Service 'SMTPSVC' exceeded 1 second time limit.
24/08/04 09:32:18 [ Exchange : stop ] >Stopping 'SMTPSVC'. Failed
24/08/04 09:32:18 [ Exchange : stop ] >"SMTP Failed to stop within timeout period."
24/08/04 09:32:18 [ Exchange : stop ] >"Forcing SMTP to stop."
24/08/04 09:32:21 [ Exchange : stop ] >
24/08/04 09:32:21 [ Exchange : stop ] >Attempting stop...
24/08/04 09:32:21 [ Exchange : stop ] >
24/08/04 09:32:21 [ Exchange : stop ] >Internet services successfully stopped
24/08/04 09:32:21 [ Exchange : stop ] >


Applies To

Neverfail for Exchange


Related Information

None

KBID-241

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.