Summary
This internal Knowledgebase article details how to read statistic information logged periodically in Neverfail Heartbeat NFLogs.
More Information
The following is an example of what is dumped in NFLogs:
Neverfail Heartbeat v5.5.1 and Earlier:
[2007-09-24 16:33:59,863] INFO204942[Controller Scheduler](com.neverfail.manager.Manager) - ============== Neverfail Heartbeat Statistics ============
Manager: EventMeanBytes=916 MinEventTime=0 MinEventHandler=Alerter while NewFileStateMgr.ControlStatus=ControlStatus: [Uninitialized]
EQMax=14 MaxEventHandler=Session 0 while CommandCompleteEvent #(0,37) EQ=0 MinCmdHandler=Manager while GetMultiReadablePropertyValues MaxCmdTime=656 CQMax=22 Sess={} MaxEventTime=409 MinCmdHandlerTime=0 MaxCmdHandler=Manager while GetReadablePropertyValues MaxEvent=CommandCompleteEvent #(1,61) CQ=0 EventMaxBytes=12456
Apply: FLTFL=0 BLKSQ=-1 RTA=0 BLKR=RRC_JustStarted BLKTM=-1 VSFID=-1 SQNO=1 HSQNOC=0 RTF=0 NOENT=0 STENT=0 DELBLK=-1 HSQNO=0 BLKMXTM=-1 BLKFID=-1 RTS=0 VSFN=null
CommsMgr: HashSize=0 USQAms=0 SQAms=16194786 SF=0 UnreplayedLogfiles=-1 MsgRx=0 TP=0.0 USQDR=0.0 BTx=0 USQBD=0 USQB=28886046 DU=33554432 BRx=0 SQBD=0 DF=-1 SQDR=-1.0 MsgTx=0 RF=0 SQB=0
NFLogManagerThread: LogsInQueue=7 BytesWritten=0 WriteSpeed(bpms)=-1.0 BytesInMemory=33554432 LogsOpen=1 Threshold=10
NewFileStateMgr: VSWaitRequestsOut=0 BadStateWorkIn=0 VSResponsesOut=0 VSRequestsIn=0 VSBuffersIn=0 TasksIn=0 VSBuffersOut=0 VSRequestsOut=0 TasksOut=0 VSResponsesIn=0 SystemWork=0 BadStateSize=0 BadStateWorkOut=0 VSWaitRequestsIn=0 TaskWorkSizeInBytes=0
Controller: Status=CtrlStatusStopped Active=true Primary=true V=Neverfail Heartbeat V5.1 (1314)
Neverfail Heartbeat V6.0.0 and Later:
2009-07-05 22:32:59,249] INFO261646[ClusterControllerScheduler](com.neverfail.manager.Manager) - ============== Neverfail Heartbeat Statistics ============
CompressionManager: LicensedCompressionType=ADVANCED CompressionRateCurrent=0 MemoryUsed=0 Uptime=0 AvailableCompressionType=ADVANCED DataVolCurrent=0 MemoryAllocated=0 DataVolLZero=0 CompressionRate=0 ActiveCompressionType=NONE DiskUsed=0
Manager: MaxEventTime=16 MaxEventHandler=Alerter while RuleStatusChangeEvent Sess={} MinEventHandler=Alerter while NewFileStateMgr.ControlStatus=ControlStatus: [Uninitialized]
CQMax=0 MinCmdHandler= MaxCmdTime=0 EventMaxBytes=44637 EQMax=117 MaxCmdHandler= MaxEvent=ActivePassiveConfigMismatch EQ=0 CQ=0 EventMeanBytes=1072 MinEventTime=0 MinCmdHandlerTime=9223372036854775807
Apply: HSQNOC=0 BLKR=RRC_JustStarted BLKFID=-1 FLTFL=0 BLKTM=-1 HSQNO=0 BLKSQ=-1 DELBLK=-1 RTF=0 SQNO=1 RTA=0 BLKMXTM=-1 RTS=0 VSFN=null VSFID=-1 STENT=0 NOENT=0
NewFileStateMgr: BadStateSize=0 BadStateWorkIn=976 SystemWork=0 BadStateWorkOut=976 VSWaitRequestsIn=489 VSWaitRequestsOut=489 VSResponsesIn=489 TasksOut=2 VSRequestsIn=489 TasksIn=2 RenameCount=2 TaskWorkSizeInBytes=0 VSResponsesOut=489 VSBuffersOut=0 VSBuffersIn=0 VSRequestsOut=489
CommsMgr: SF=0 MsgTx=237091 RF=1 UnreplayedLogfiles={SECONDARY=0, TERTIARY=0, PRIMARY=0} MsgRx=122794 HashSize={SECONDARY=0, TERTIARY=0, PRIMARY=0} SQB=0 USQB=0 DU={SECONDARY=4194304, TERTIARY=4194304, PRIMARY=4194304} USQAms=0 TP=0 USQBD=54412043 SQDR=0 BTx=1684749909 DF={SECONDARY=0, TERTIARY=-1, PRIMARY=-1} SQBD=42689679 USQDR=0 MCRS=0 SQAms=0 BRx=47520267
NFLogManagerThread: BytesInMemory=4194304 Threshold=10 LogsOpen=1 BytesWritten=0 LogsInQueue=0 WriteSpeed(bpms)=-1.0
Interceptor: OprsHashSize=0
Controller: V=Heartbeat V6.0 (2748) Host=PRIMARY Group=PRIMARY==>SECONDARY
Explanation of current parameters
Manager component reports info from two main areas. These are:
-
Command Queue (where incoming commands from clients like the GUI, nfclient etc, live until they are dispatched for execution)
-
Event Queue (where events fired by internal components live until dispatched to all clients and other components who listen for those events)
CQMax
- Command Q - biggest it has ever been
CQ
- Command Q current size
EQMax
- Event Q - biggest it has ever been
EQ
- Event Q current size
CommsMgr reports on various channel stats. These are:
USQAms
- Unsafe Q Age millisecs
SQAms
- Safe Q Age millisecs
Note: Prior to v5.3 the counters are switched: USQAms is Safe Q Age and SQAms is Unsafe Q Age
BRx
- Bytes Received
-
Non-LBM Environment:
BRx is the total number of uncompressed bytes received by the Comms since the startup of Neverfail. It will include all the data for Apply, Controller, Manager, Proxies etc including the message headers or other control bytes, Ack messages etc (it will contain all the RAW data received by the Comms on the passive server). -
LBM Environment:
BRx is the total number of compressed bytes received by the Comms since the startup of Neverfail.
MsgRx
- Messages received
TP
- Throughput (Mbit/sec ???)
SQDR
- Safe Q dispatch rate
MsgTx
- Messages sent
USQDR
- Unsafe Q Dispatch Rate (bytes/millisec ????)
BTx
- Bytes sent
USQBD
- Unsafe Q bytes dispatched
USQB
- Unsafe Q bytes (current size)
UnreplayedLogfiles
- The number of log files in the safe update queue that haven't yet been dispatched.
DU
- The amount of bytes used by the logger on this machine.
SQBD
- Safe Q Bytes Dispatched:
- Prior to v6.5.0 - the total number of safe uncompressed bytes dispatched to all our components: Apply, Controller, Manager, Proxies etc without the headers or other control bytes)
- v6.5.0 and later - the total number of safe compressed bytes dispatched to all our components: Apply, Controller, Manager, Proxies etc without the headers or other control bytes
DF
- Delay Factor applied to the recv queue.
RF
- For debugging: Referenced files - files that are in use.
SQB
- Safe Queue Bytes: the current number of uncompressed bytes in the safe queue to be dispatched to the required components (this is an instant value.)
SF
- For debugging: Stubborn files - files we wish to delete, but currently can't.
BufferPagedBytes
is the instantaneous value of number of buffer paged bytes in use at the sample point.
BufferNonPagedBytes
is the instantaneous value of number of buffer non paged bytes in use at the sample point.
BufferRequestCount
is the number of buffer requests that have completed in the sample interval (between the previous sample point and the current sample point).
BufferRequestTime
is the average time in milliseconds spent in the buffer request for the requests that have completed in the sample interval.
FileNameCount
is the instantaneous value of the number of file names in the file name store at the sample point.
FileNameBytes
is the instantaneous value of the number of bytes in the file name store at the sample point.
Apply:
VSFID
- The FileId that is currently being Verified/Synced (-1 if none)
SQNO
- The current sequence number that Apply is processing or waiting for.
HSQNOC
- The highest sequence number that Apply has walked contiguously.
NOENT
- No of entries received in this session.
HSQNO
- The highest sequence no that the walker has ever seen this session.
BLKMXTM
- The length of time that the longest block lasted for (-1 if no last block).
BLKFID
- FileId Apply is blocked on, -1 if not blocked.
RTS
- Number of entries for which Apply has exited the retry code having succeeded in a retry attempt.
VSFN
- The Filename that is currently being Verified/Synced (null if none)
RTA
- Number of entries for which Apply has dropped into the retry code.
RTF
- Number of entries for which Apply has exited the retry code having failed in retry attempts.
BLKR
- The reason for the current block.
DELBLK
- The number of deliveries received since Apply blocked.
BLKTM
- The length of time that the current block has lasted (so far).
BLKSQ
- The sequence number that is causing the current block
STENT
- Number of entries that Apply has stored waiting for processing.
FLTFL
- Number of files that Apply currently 'knows' about.
VSReguestsIn
- Number of files / sections / directories the FSM has been asked to verify/sync
VSWaitRequestsIn
Number of files / sections / directories the FSM has actually processed and is waiting for Apply reponses for
VSWaitRequestOut
- Number of files / sections / directories the FSM has seen responses for
VSResponseOut
- Number responses from Apply that the FSM has actually been able to process
Controller:
Controller information is quite clear for versions 5.5.1 and Earlier. However, in versions v6.0.0 and Later the section will show:
- Host as PRIMARY,SECONDARY or TERTIARY. This is the Active server of the group.
-
Group as:
PRIMARY==>SECONDARY==>TERTIARY
for normal operation
PRIMARY=/=>SECONDARY=/=>TERTIARY
Out of synch
PRIMARY--SECONDARY--TERTIARY
Not replicating
{PRIMARY,SECONDARY,TERTIARY}
No active
PRIMARY==>SECONDARY TERTIARY:SHUTDOWN
if partitioned.
Applies To
All versions
Related Information
None
KBID-416
Comments
Please sign in to leave a comment.