AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<John.Keiffer@lsi.com>,<dl-qa@lsi.com>,<dl-engineering@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	85A1D09038E3C1438820EF7A7FAFDD3001077EA433@cosmail01.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 15 Oct 2009 13:26:31 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Keiffer, John" <John.Keiffer@lsi.com>
Cc: DL-ONStor-QA <dl-qa@lsi.com>, DL-ONStor-Engineering
 <dl-engineering@lsi.com>
Subject: Re: Help stopping my filer from crashing
Message-ID: <20091015132631.1052f078@ripper.onstor.net>
In-Reply-To: <85A1D09038E3C1438820EF7A7FAFDD3001077EA433@cosmail01.lsi.com>
References: <85A1D09038E3C1438820EF7A7FAFDD3001077EA433@cosmail01.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Looks to me like the FP is down pretty early in the initialization
sequence.  Tape driver messages in there.  Can that be a coincidence?

On Thu, 15 Oct 2009 12:19:18 -0600 "Keiffer, John"
<John.Keiffer@lsi.com> wrote:

> Its stuck and crashes right after reboot. Can anyone see what the
> problem is or know what to do?
> 
> Oct 15 11:14:55 g9r62 : 0:0:cluster2:NOTICE: Using 10.2.62.9 as my
> primary address Oct 15 11:14:55 g9r62 : 0:0:cluster2:INFO:
> ClusterCtrl_iUpdateState: post PCC up pccname g9r62 Oct 15 11:14:55
> g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55 g9r62 :
> 0:0:eventd:WARNING: Process-EVENT 0.0.0.0: Mgmt Port 0.0.0.0 PCC,
> State Up Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent()
> Oct 15 11:14:55 g9r62 : 1:4:scsi:INFO: 2: ispfc:sp2.2:
> ISPFC_CS_PDB_CHANGED,[8014] Global Event Oct 15 11:14:55 g9r62 :
> 0:0:cluster2:NOTICE: ubik init with buff size 64 Oct 15 11:14:55
> g9r62 : 0:0:cluster2:INFO: cluster_clientSendRmcRpc: Error sending
> rpc to clusterrpc, flags 41a, name cluster2, rc -19, retrying... Oct
> 15 11:14:55 g9r62 : 0:0:cluster2:INFO: cluster_clientSendRmcRpc:
> Retry worked to clusterrpc, flags 472, name cluster2 Oct 15 11:14:55
> g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55 g9r62 :
> 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 0, State Down Oct
> 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct 15
> 11:14:55 g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55
> g9r62 : 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 1, State
> Down Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct
> 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15
> 11:14:55 g9r62 : 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU
> 2, State Down Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: <
> ems_logEvent() Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: >
> ems_logEvent() Oct 15 11:14:55 g9r62 : 0:0:eventd:WARNING:
> Process-EVENT CPU: Slot 1, CPU 3, State Down Oct 15 11:14:55 g9r62 :
> 0:0:eventd:DEBUG: < ems_logEvent() Oct 15 11:14:55 g9r62 :
> 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55 g9r62 :
> 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 4, State Down Oct
> 15 11:14:56 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct 15
> 11:14:56 g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:56
> g9r62 : 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 5, State
> Down Oct 15 11:14:56 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct
> 15 11:14:56 g9r62 : 0:0:cluster2:INFO: ClusterCtrl_iSyncFilerInDb:
> remove old information from cluster started Oct 15 11:14:56 g9r62 :
> 0:0:cluster2:INFO: ClusterCtrl_iSyncFilerInDb: remove old information
> from cluster finished, code 0 Oct 15 11:14:56 g9r62 :
> 0:0:cluster2:INFO: ClusterCtrl_readyRestoreFiler: data base ready to
> restore Oct 15 11:14:56 g9r62 : 0:0:cluster2:WARNING:
> ClusterCtrl_iUpdateState: PCC down pccname g9r62 Oct 15 11:14:56
> g9r62 : 0:0:pm:INFO: /onstor/bin/cluster_server: finished
> initialization. Oct 15 11:14:56 g9r62 : 0:0:cluster2:INFO:
> ClusterCtrl_iUpdateState: post PCC up pccname g9r62 Oct 15 11:14:56
> g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:56 g9r62 :
> 0:0:eventd:WARNING: Process-EVENT 0.0.0.0: Mgmt Port 0.0.0.0 PCC,
> State Up Oct 15 11:14:56 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent()
> Oct 15 11:14:56 g9r62 : 0:0:sdm:INFO: ONStor Storage Device Manager
> (c)2009: Started Oct 15 11:14:56 g9r62 :
> 0:0:pm:INFO: /onstor/bin/sdm_cfgd: finished initialization. Oct 15
> 11:14:56 g9r62 : 0:0:sdm:NOTICE: FP CPU Down Event Oct 15 11:14:56
> g9r62 : 0:0:evm:DEBUG: EVM daemon is (re)starting... Oct 15 11:14:56
> g9r62 : 0:0:evm:DEBUG: EVM myNfx: cluster[g9r62] node[g9r62]
> chassisId[2158] Oct 15 11:14:57 g9r62 :
> 0:0:pm:INFO: /onstor/bin/evm_cfgd: finished initialization. Oct 15
> 11:14:57 g9r62 : 0:0:pm:INFO: /onstor/bin/ea: finished
> initialization. Oct 15 11:14:58 g9r62 : 0:0:evm:INFO: main: Cluster
> version[30330] Oct 15 11:14:59 g9r62 : 0:0:evm:INFO: evm_sendScanReq:
> Requesting SPM scan Oct 15 11:14:59 g9r62 : 0:0:evm:DEBUG:
> evm_procCpuEvent : Received NFP cpu UP event. Oct 15 11:14:59 g9r62 :
> 0:0:evm:DEBUG: evm_procCpuEvent : Received NFP cpu down event. Oct 15
> 11:14:59 g9r62 : 0:0:spm:INFO: main: Cluster version[30330] Oct 15
> 11:14:59 g9r62 : 0:0:pm:INFO: /onstor/bin/spm: finished
> initialization. Oct 15 11:14:59 g9r62 :
> 0:0:pm:INFO: /onstor/bin/ipmd: finished initialization. Oct 15
> 11:14:59 g9r62 : 0:0:pm:INFO: /onstor/bin/tape-driver: finished
> initialization. Oct 15 11:15:00 g9r62 : 0:0:ssc_ndmp:INFO:
> ndmp_GetFPState:  fp card is DOWN Oct 15 11:15:00 g9r62 :
> 0:0:pm:INFO: /onstor/bin/ndmp_cfgd: finished initialization. Oct 15
> 11:15:01 g9r62 : 0:0:spm:ERROR: spm_processPccEvent : Duplicate PCC
> event Oct 15 11:15:01 g9r62 : 0:0:spm:ERROR: spm_processPccEvent :
> Duplicate PCC event Oct 15 11:15:01 g9r62 : 0:0:auth_agent:NOTICE:
> main: auth-agent started Oct 15 11:15:01 g9r62 :
> 0:0:pm:INFO: /onstor/bin/auth-agent: finished initialization. Oct 15
> 11:15:03 g9r62 : 0:0:vsd:INFO: vsd_ensureNisFileCoherence[1230] :
> vs=1, file=71, #masterRec=0 Oct 15 11:15:03 g9r62 : 0:0:vsd:INFO:
> vsd_ensureNisFileCoherence[1230] : vs=1, file=72, #masterRec=0 Oct 15
> 11:15:03 g9r62 : 0:0:vsd:INFO: vsd_ensureNisFileCoherence[1230] :
> vs=1, file=74, #masterRec=0 Oct 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> vsd_ensureNisFileCoherence[1230] : vs=2, file=71, #masterRec=0 Oct 15
> 11:15:04 g9r62 : 0:0:vsd:INFO: vsd_ensureNisFileCoherence[1230] :
> vs=2, file=72, #masterRec=0 Oct 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> vsd_ensureNisFileCoherence[1230] : vs=2, file=74, #masterRec=0 Oct 15
> 11:15:04 g9r62 : 0:0:vsd:INFO: vsd_ensureNisFileCoherence[1230] :
> vs=3, file=71, #masterRec=0 Oct 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> vsd_ensureNisFileCoherence[1230] : vs=3, file=72, #masterRec=0 Oct 15
> 11:15:04 g9r62 : 0:0:vsd:INFO: vsd_ensureNisFileCoherence[1230] :
> vs=3, file=74, #masterRec=0 Oct 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> vsd_ensureNisFileCoherence[1230] : vs=4, file=71, #masterRec=0 Oct 15
> 11:15:05 g9r62 : 0:0:vsd:INFO: vsd_ensureNisFileCoherence[1230] :
> vs=4, file=72, #masterRec=0 Oct 15 11:15:05 g9r62 : 0:0:vsd:INFO:
> vsd_ensureNisFileCoherence[1230] : vs=4, file=74, #masterRec=0 Oct 15
> 11:15:05 g9r62 : 0:0:nfxsh:DEBUG: cmd[0]: autosupport emrs show
> config : status[0] Oct 15 11:15:05 g9r62 :
> 0:0:pm:INFO: /onstor/bin/vsd: finished initialization. Oct 15
> 11:15:05 g9r62 : 0:0:vsd:ERROR: vsd_procTxnEvent : Event 1 detected
> for txn, state 1 Oct 15 11:15:05 g9r62 : 0:0:vsd:ERROR:
> vsd_sendIOTimeoutCfg : Cannot set the IO timeout values on FP, status
> 24 Oct 15 11:15:05 g9r62 : 0:0:vsd:ERROR: vsd_initFpCardProc : Cannot
> set the IO timeout values on FP, status 24 Oct 15 11:15:05 g9r62 :
> 0:0:cluster2:INFO: ClusterCtrl_ReleaseFiler: called by vtm Oct 15
> 11:15:05 g9r62 : 0:0:vtm:DEBUG:
> vtm_get_filer_config_and_start_vsvr_trans: start collecting failover
> vsvr, post event count 0 Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG:
> vtm_get_filer_config_and_start_vsvr_trans: end collecting failover
> vsvr Oct 15 11:15:05 g9r62 : 0:0:vtm:INFO: vtm_sendCardStateMsg:
> Sending card DOWN to g9r62 Oct 15 11:15:05 g9r62 :
> 0:0:pm:INFO: /onstor/bin/vtmd: finished initialization. Oct 15
> 11:15:05 g9r62 : 0:0:vtm:INFO: Vtm_ProcCardStateMsg: card state DOWN
> message from g9r62 Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG:
> vtm_create_transaction: transaction created tag 1255630505, type 4
> Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG: vtm_request_node_info: node
> info request send at 0x4ad766a9 Oct 15 11:15:05 g9r62 :
> 0:0:vtm:DEBUG: vtm_create_transaction: transaction created tag
> 1255630506, type 4 Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG:
> vtm_create_transaction: transaction created tag 1255630507, type 4
> Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG: Vtm_ProcNodeInfoReqMsg:
> runtime info request received from g9r62, tag 0x4ad766a9 Oct 15
> 11:15:05 g9r62 : 0:0:vtm:DEBUG: vtm_create_transaction: transaction
> created tag 1255630508, type 3 Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG:
> Vtm_ProcVolMsg: vol message response message received ta_tag
> 1255630508 Oct 15 11:15:06 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg:
> NFX_EVENT_CPU, state 1, slot 1, cpu 0 Oct 15 11:15:06 g9r62 :
> 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU, state 1, slot 1, cpu
> 1 Oct 15 11:15:07 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg:
> NFX_EVENT_CPU, state 1, slot 1, cpu 2 Oct 15 11:15:07 g9r62 :
> 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU, state 1, slot 1, cpu
> 3 Oct 15 11:15:07 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg:
> NFX_EVENT_CPU, state 1, slot 1, cpu 4 Oct 15 11:15:08 g9r62 :
> 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU, state 1, slot 1, cpu
> 5 Oct 15 11:15:08 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg:
> NFX_EVENT_CPU, state 2, slot 1, cpu 0 Oct 15 11:15:08 g9r62 :
> 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU, state 2, slot 1, cpu
> 1 Oct 15 11:15:08 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg:
> NFX_EVENT_CPU, state 2, slot 1, cpu 2 Oct 15 11:15:08 g9r62 :
> 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU, state 2, slot 1, cpu
> 3 Oct 15 11:15:08 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg:
> NFX_EVENT_CPU, state 2, slot 1, cpu 4 Oct 15 11:15:08 g9r62 :
> 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU, state 2, slot 1, cpu
> 5 Oct 15 11:15:08 g9r62 : 0:0:sanm:NOTICE: SANM: ONStor Data Mirror
> (c)2006: Started Oct 15 11:15:08 g9r62 : 0:0:sanm:ERROR: SANM: FP NIM
> down. Aborting all mirror sessions. Oct 15 11:15:08 g9r62 :
> 0:0:sanm:ERROR: SANM: FP NIM down. Aborting all mirror sessions. Oct
> 15 11:15:08 g9r62 : 0:0:pm:INFO: /onstor/bin/sanmd: finished
> initialization. Oct 15 11:15:08 g9r62 :
> 0:0:pm:INFO: /onstor/bin/cluster_relay: finished initialization. Oct
> 15 11:15:09 g9r62 : 0:0:pm:INFO: /onstor/bin/snmpd: finished
> initialization. Oct 15 11:15:09 g9r62 : 0:0:tape-driver:ERROR:
> tape_rpc: rmc_rpc app sdm cpu 0 slot 0 failed - -19 Oct 15 11:15:09
> g9r62 : 0:0:snmpd:INFO: UCD-SNMP version 4.2.2 Oct 15 11:15:09
> g9r62 : 0:0:nfxsh:DEBUG: cmd[0]: autosupport emrs show config :
> status[0] Oct 15 11:15:09 g9r62 : 0:0:asd:INFO: asd_main: auto
> support conf is TO , NOTETO , FROM g9r62@onstor.com, server 0.0.0.0,
> enable 0 Oct 15 11:15:09 g9r62 : 0:0:pm:INFO: /onstor/bin/asd:
> finished initialization. Oct 15 11:15:10 g9r62 :
> 0:0:pm:INFO: /onstor/bin/sscccc: finished initialization. Oct 15
> 11:15:10 g9r62 : 0:0:pm:INFO: /onstor/bin/crashsaved: finished
> initialization. Oct 15 11:15:10 g9r62 : 0:0:sscccc:INFO:
> no /onstor/etc/sscccc_hosts_deny file Oct 15 11:15:10 g9r62 :
> 0:0:sscccc:INFO: soft nofile resource limit = 1024 Oct 15 11:15:10
> g9r62 : 0:0:nfxsh:DEBUG: cmd[0]: system show chassis : status[0] Oct
> 15 11:15:10 g9r62 : 0:0:nfxsh:NOTICE: cmd[0]: -> EMRS: Not gathering
> h_res_stats: not all processors are up. : status[2] Oct 15 11:15:11
> g9r62 : 0:0:vtm:ERROR: vtm_gatherInfo_proc: gather info failed for
> request from g9r62 Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG:
> vtm_gatherInfo_proc: node info reply send at 0x4ad766ae Oct 15
> 11:15:11 g9r62 : 0:0:vtm:DEBUG: vtm_delete_transaction: transaction
> delete tag  1255630508, type 3 Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG:
> Vtm_ProcNodeInfoRplyMsg: runtime info response received from g9r62
> Oct 15 11:15:11 g9r62 : 0:0:vtm:ERROR: vtm_failover_vsvr_proc: no
> alternative filer found for vs 2 failover Oct 15 11:15:11 g9r62 :
> 0:0:vtm:DEBUG: vtm_delete_transaction: transaction delete tag
> 1255630505, type 4 Oct 15 11:15:11 g9r62 : 0:0:vtm:ERROR:
> vtm_failover_vsvr_proc: no alternative filer found for vs 3 failover
> Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG: vtm_delete_transaction:
> transaction delete tag  1255630506, type 4 Oct 15 11:15:11 g9r62 :
> 0:0:vtm:ERROR: vtm_failover_vsvr_proc: no alternative filer found for
> vs 4 failover Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG:
> vtm_delete_transaction: transaction delete tag  1255630507, type 4
> Oct 15 11:15:12 g9r62 : 0:0:eventd:WARNING: event_rpc_complete:
> failed to deliver event 4 to 1/1 Oct 15 11:15:12 g9r62 last message
> repeated 5 times Oct 15 11:15:12 g9r62 : 0:0:eventd:WARNING:
> event_rpc_complete: failed to deliver event 1 to 1/1 Oct 15 11:15:12
> g9r62 : 0:0:eventd:WARNING: event_rpc_complete: failed to deliver
> event 4 to 1/7 Oct 15 11:15:12 g9r62 last message repeated 5 times
> Oct 15 11:15:12 g9r62 : 0:0:eventd:WARNING: event_rpc_complete:
> failed to deliver event 1 to 1/7
> 
> OnStor GNU/Linux 4.0 g9r62 duart0
> 
> g9r62 login: root
> Password:
> Last login: Thu Oct 15 11:10:41 2009 on duart0
> Linux g9r62 2.6.22-cg #1 Thu Oct 15 05:03:08 PDT 2009 mips64
> 
> Welcome to the ONStor NAS Gateway.
> g9r62:~# nfx
> 
> Welcome to the ONStor NAS Gateway.
> 
> 10/15/09 11:15:18 g9r62 diag> systOct 15 11:15:19 g9r62 :
> 0:0:tape-driver:ERROR: tape_rpc: rmc_rpc app sdm cpu 0 slot 0 failed
> - -19 em show chassis
> 
>  module     cpu         state
> ----------------------------------------------
>  SSC        SSC         UP
>  NFPNIM     TXRX0       DOWN
>             TXRX1       DOWN
>             FP0         DOWN
>             FP1         DOWN
>             FP2         DOWN
>             FP3         DOWN
> ----------------------------------------------
> 10/15/09 11:15:23 g9r62 diag> Oct 15 11:15:23 g9r62 :
> 0:0:nfxsh:NOTICE: cmd[0]: system show chassis  : status[0] Oct 15
> 11:15:25 g9r62 : 0:0:evm:DEBUG: evm_processTimeout: Timeout
> processing txn[0x2bb39158]  state[1] function[0x43e0c0]
> tag[0xa0020001] Oct 15 11:15:25 g9r62 : 0:0:evm:ERROR:
> evm_unfreezeAllLvReq: FP unfreeze all volumes failed, rc[-5627] Oct
> 15 11:15:26 g9r62 : 0:0:evm:DEBUG: Error string[Request timeout
> error. Operation failed.] len[40] Oct 15 11:15:29 g9r62 :
> 0:0:tape-driver:ERROR: tape_rpc: rmc_rpc app sdm cpu 0 slot 0 failed
> - -19
> 
> Thank you,
> John Keiffer
> 408-376-3106
> LSI - QA Engineer
> john.keiffer@lsi.com<mailto:john.keiffer@lsi.com>
> 
