LostXXXXException: A XXXX entry has been lost because the PagedPool shared buffer was full or an error occured

Follow

Summary



This Knowledgebase article provides details regarding the exception below:

'LostXXXXException: A XXXX entry has been lost because the PagedPool shared buffer was full or an error occurred'


More Information


Exception in the NFLog file:

[Interceptor$ExceptionsFetcher](com.neverfail.interceptor.Interceptor) - INTERCEPTOR** RECEIVED exception com.neverfail.interceptor.LostCleanupException: A Cleanup entry has been lost because the PagedPool shared buffer was full or an error occurred

Note: This is a sample from the NFLogs of the LostXXXXException, which can be also:

LostCleanupException
LostCloseException
LostCreateException
LostFilterCleanupException
LostFiltersException
LostFilterStartupException
LostFlushBuffersException
LostSetAttributesException
LostSetFileCompressionException
LostSetFileDeleteException
LostSetFileEofException
LostSetFileRenameException
LostSetReparseException
LostSetSecurityException
LostStatusException
LostWriteException

Note: In Neverfail Heartbeat v6.2 and later, the error message appears as:

A Write entry has been lost because the Unspecified shared buffer was full or an error occurred.


This doesn’t mean that system PagedPool is depleted, but that the pageable buffer shared between the Interceptor and channel components has become full, or (more exactly), that the filter driver of the Interceptor component has been unable to allocate a chunk of that buffer to record an XXXX entry, which almost always means that the fixed size buffer is full.

These LostXXXXException are raised when the filter driver has not been able to write on to these data items into the buffer to be sent to apply. This can be because the buffer is all in use, or for some reason we are for a short time unable to use some of the buffer, or some internal error in our software.

The Controller component of Heartbeat on the active server handles these exceptions by stopping and then restarting replication. This forces data in the buffer to be flushed to the passive server as part of stopping replication, and schedules a full system check (verification/synchronization) as part of starting replication.

The “buffer” is the same (at the moment) as the unsafe update queue in a LAN (NON LBM) installation. In a WAN (LBM) installation, it does not have any particular end user meaning.


In these cases, you should always look back in the logs for lines like this …

[Interceptor$StatisticsFetcher](com.neverfail.interceptor.Interceptor) - Counters: BufferPagedBytes 268435456 BufferNonPagedBytes 0 BufferRequestCount 10510 BufferRequestTimer 0 FileNameCount 658 FileNameBytes 680372 WorkerCount 1 ProtectedCount 0 UnprotectedCount 1 CompromisedCount 0

… before the LostXXXXException is recorded. The important field is BufferPagedBytes, which here has a value of 268435456 I units of bytes, which shows the amount of the pageable buffer in use at the time. The default configuration for the size of this buffer is 256MB and this is equal to the value in this report from the interceptor, which indicates that the cause LostXXXXException here is indeed the buffer being full.

In a non-LBM installation, this almost always means that there has been a period where the update I/O rate on the heartbeat protected files is faster than the heartbeat can ship the update data across the channel connection.

In an LBM installation, the reasons are less clear since heartbeat copies the update data from the shared buffer into its own log files and once these are copied, the buffer chunk is considered to have been emptied and is soon available for re-use in the filter driver.


Applies To

All Versions


Related Information

None

KBID-359

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.