Reading Neverfail Heartbeat Statistics from the NFLog

Follow

Summary

This internal Knowledgebase article details how to read statistic information logged periodically in Neverfail Heartbeat NFLogs.


More Information

The following is an example of what is dumped in NFLogs:

Neverfail Heartbeat v5.5.1 and Earlier:

[2007-09-24 16:33:59,863] INFO204942[Controller Scheduler](com.neverfail.manager.Manager) - ============== Neverfail Heartbeat Statistics ============
Manager: EventMeanBytes=916 MinEventTime=0 MinEventHandler=Alerter while NewFileStateMgr.ControlStatus=ControlStatus: [Uninitialized]
EQMax=14 MaxEventHandler=Session 0 while CommandCompleteEvent #(0,37) EQ=0 MinCmdHandler=Manager while GetMultiReadablePropertyValues MaxCmdTime=656 CQMax=22 Sess={} MaxEventTime=409 MinCmdHandlerTime=0 MaxCmdHandler=Manager while GetReadablePropertyValues MaxEvent=CommandCompleteEvent #(1,61) CQ=0 EventMaxBytes=12456
Apply: FLTFL=0 BLKSQ=-1 RTA=0 BLKR=RRC_JustStarted BLKTM=-1 VSFID=-1 SQNO=1 HSQNOC=0 RTF=0 NOENT=0 STENT=0 DELBLK=-1 HSQNO=0 BLKMXTM=-1 BLKFID=-1 RTS=0 VSFN=null
CommsMgr: HashSize=0 USQAms=0 SQAms=16194786 SF=0 UnreplayedLogfiles=-1 MsgRx=0 TP=0.0 USQDR=0.0 BTx=0 USQBD=0 USQB=28886046 DU=33554432 BRx=0 SQBD=0 DF=-1 SQDR=-1.0 MsgTx=0 RF=0 SQB=0
NFLogManagerThread: LogsInQueue=7 BytesWritten=0 WriteSpeed(bpms)=-1.0 BytesInMemory=33554432 LogsOpen=1 Threshold=10
NewFileStateMgr: VSWaitRequestsOut=0 BadStateWorkIn=0 VSResponsesOut=0 VSRequestsIn=0 VSBuffersIn=0 TasksIn=0 VSBuffersOut=0 VSRequestsOut=0 TasksOut=0 VSResponsesIn=0 SystemWork=0 BadStateSize=0 BadStateWorkOut=0 VSWaitRequestsIn=0 TaskWorkSizeInBytes=0
Controller: Status=CtrlStatusStopped Active=true Primary=true V=Neverfail Heartbeat V5.1 (1314)

Neverfail Heartbeat V6.0.0 and Later:

2009-07-05 22:32:59,249] INFO261646[ClusterControllerScheduler](com.neverfail.manager.Manager) - ============== Neverfail Heartbeat Statistics ============
CompressionManager: LicensedCompressionType=ADVANCED CompressionRateCurrent=0 MemoryUsed=0 Uptime=0 AvailableCompressionType=ADVANCED DataVolCurrent=0 MemoryAllocated=0 DataVolLZero=0 CompressionRate=0 ActiveCompressionType=NONE DiskUsed=0
Manager: MaxEventTime=16 MaxEventHandler=Alerter while RuleStatusChangeEvent Sess={} MinEventHandler=Alerter while NewFileStateMgr.ControlStatus=ControlStatus: [Uninitialized]
CQMax=0 MinCmdHandler= MaxCmdTime=0 EventMaxBytes=44637 EQMax=117 MaxCmdHandler= MaxEvent=ActivePassiveConfigMismatch EQ=0 CQ=0 EventMeanBytes=1072 MinEventTime=0 MinCmdHandlerTime=9223372036854775807
Apply: HSQNOC=0 BLKR=RRC_JustStarted BLKFID=-1 FLTFL=0 BLKTM=-1 HSQNO=0 BLKSQ=-1 DELBLK=-1 RTF=0 SQNO=1 RTA=0 BLKMXTM=-1 RTS=0 VSFN=null VSFID=-1 STENT=0 NOENT=0
NewFileStateMgr: BadStateSize=0 BadStateWorkIn=976 SystemWork=0 BadStateWorkOut=976 VSWaitRequestsIn=489 VSWaitRequestsOut=489 VSResponsesIn=489 TasksOut=2 VSRequestsIn=489 TasksIn=2 RenameCount=2 TaskWorkSizeInBytes=0 VSResponsesOut=489 VSBuffersOut=0 VSBuffersIn=0 VSRequestsOut=489
CommsMgr: SF=0 MsgTx=237091 RF=1 UnreplayedLogfiles={SECONDARY=0, TERTIARY=0, PRIMARY=0} MsgRx=122794 HashSize={SECONDARY=0, TERTIARY=0, PRIMARY=0} SQB=0 USQB=0 DU={SECONDARY=4194304, TERTIARY=4194304, PRIMARY=4194304} USQAms=0 TP=0 USQBD=54412043 SQDR=0 BTx=1684749909 DF={SECONDARY=0, TERTIARY=-1, PRIMARY=-1} SQBD=42689679 USQDR=0 MCRS=0 SQAms=0 BRx=47520267
NFLogManagerThread: BytesInMemory=4194304 Threshold=10 LogsOpen=1 BytesWritten=0 LogsInQueue=0 WriteSpeed(bpms)=-1.0
Interceptor: OprsHashSize=0
Controller: V=Heartbeat V6.0 (2748) Host=PRIMARY Group=PRIMARY==>SECONDARY

Explanation of current parameters

Manager component reports info from two main areas. These are:

  • Command Queue (where incoming commands from clients like the GUI, nfclient etc, live until they are dispatched for execution)
  • Event Queue (where events fired by internal components live until dispatched to all clients and other components who listen for those events)

CQMax - Command Q - biggest it has ever been
CQ - Command Q current size
EQMax - Event Q - biggest it has ever been
EQ - Event Q current size

CommsMgr reports on various channel stats. These are:

USQAms - Unsafe Q Age millisecs
SQAms - Safe Q Age millisecs

Note: Prior to v5.3 the counters are switched: USQAms is Safe Q Age and SQAms is Unsafe Q Age

BRx - Bytes Received

  • Non-LBM Environment:

    BRx is the total number of uncompressed bytes received by the Comms since the startup of Neverfail. It will include all the data for Apply, Controller, Manager, Proxies etc including the message headers or other control bytes, Ack messages etc (it will contain all the RAW data received by the Comms on the passive server).
  • LBM Environment:

    BRx is the total number of compressed bytes received by the Comms since the startup of Neverfail.

MsgRx - Messages received
TP - Throughput (Mbit/sec ???)
SQDR - Safe Q dispatch rate
MsgTx - Messages sent
USQDR - Unsafe Q Dispatch Rate (bytes/millisec ????)
BTx - Bytes sent
USQBD - Unsafe Q bytes dispatched
USQB - Unsafe Q bytes (current size)
UnreplayedLogfiles - The number of log files in the safe update queue that haven't yet been dispatched.
DU - The amount of bytes used by the logger on this machine.
SQBD - Safe Q Bytes Dispatched:

  • Prior to v6.5.0 -  the total number of safe uncompressed bytes dispatched to all our components: Apply, Controller, Manager, Proxies etc without the headers or other control bytes)
  • v6.5.0 and later - the total number of safe compressed bytes dispatched to all our components: Apply, Controller, Manager, Proxies etc without the headers or other control bytes

DF - Delay Factor applied to the recv queue.
RF - For debugging: Referenced files - files that are in use.
SQB - Safe Queue Bytes: the current number of uncompressed bytes in the safe queue to be dispatched to the required components (this is an instant value.)
SF - For debugging: Stubborn files - files we wish to delete, but currently can't.

BufferPagedBytes is the instantaneous value of number of buffer paged bytes in use at the sample point.
BufferNonPagedBytes is the instantaneous value of number of buffer non paged bytes in use at the sample point.
BufferRequestCount is the number of buffer requests that have completed in the sample interval (between the previous sample point and the current sample point).
BufferRequestTime is the average time in milliseconds spent in the buffer request for the requests that have completed in the sample interval.
FileNameCount is the instantaneous value of the number of file names in the file name store at the sample point.
FileNameBytes is the instantaneous value of the number of bytes in the file name store at the sample point.

Apply:

VSFID - The FileId that is currently being Verified/Synced (-1 if none)
SQNO - The current sequence number that Apply is processing or waiting for.
HSQNOC - The highest sequence number that Apply has walked contiguously.
NOENT - No of entries received in this session.
HSQNO - The highest sequence no that the walker has ever seen this session.
BLKMXTM - The length of time that the longest block lasted for (-1 if no last block).
BLKFID - FileId Apply is blocked on, -1 if not blocked.
RTS - Number of entries for which Apply has exited the retry code having succeeded in a retry attempt.
VSFN - The Filename that is currently being Verified/Synced (null if none)
RTA - Number of entries for which Apply has dropped into the retry code.
RTF - Number of entries for which Apply has exited the retry code having failed in retry attempts.
BLKR - The reason for the current block.
DELBLK - The number of deliveries received since Apply blocked.
BLKTM - The length of time that the current block has lasted (so far).
BLKSQ - The sequence number that is causing the current block
STENT - Number of entries that Apply has stored waiting for processing.
FLTFL - Number of files that Apply currently 'knows' about.
VSReguestsIn - Number of files / sections / directories the FSM has been asked to verify/sync
VSWaitRequestsIn Number of files / sections / directories the FSM has actually processed and is waiting for Apply reponses for
VSWaitRequestOut - Number of files / sections / directories the FSM has seen responses for
VSResponseOut - Number responses from Apply that the FSM has actually been able to process

Controller:

Controller information is quite clear for versions 5.5.1 and Earlier. However, in versions v6.0.0 and Later the section will show:

  • Host as PRIMARY,SECONDARY or TERTIARY. This is the Active server of the group.
  • Group as:

    PRIMARY==>SECONDARY==>TERTIARY
    for normal operation

    PRIMARY=/=>SECONDARY=/=>TERTIARY
    Out of synch

    PRIMARY--SECONDARY--TERTIARY
    Not replicating

    {PRIMARY,SECONDARY,TERTIARY}
    No active

    PRIMARY==>SECONDARY   TERTIARY:SHUTDOWN
    if partitioned.


Applies To

All versions


Related Information

None

KBID-416

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.