X-Sylpheed-Account-Id:2
S:andy.sharp@lsi.com
SCF:#mh/Mailbox/sent
X-Sylpheed-Sign:0
X-Sylpheed-Encrypt:0
X-Sylpheed-Privacy-System:
RMID:#imap/LSI/INBOX	0	D7A889C980962746B30DE07864593C02CC69E3C3@cosmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 15 Oct 2009 13:42:04 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Limato, Dave" <Dave.Limato@lsi.com>
Cc: "Keiffer, John" <John.Keiffer@lsi.com>, DL-ONStor-QA <dl-qa@lsi.com>,
 DL-ONStor-Engineering <dl-engineering@lsi.com>
Subject: Re: Help stopping my filer from crashing
Message-ID: <20091015134204.11546417@ripper.onstor.net>
References: <85A1D09038E3C1438820EF7A7FAFDD3001077EA433@cosmail01.lsi.com>
	<20091015132631.1052f078@ripper.onstor.net>
	<D7A889C980962746B30DE07864593C02CC69E3C3@cosmail02.lsi.com>
Organization: LSI
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 15 Oct 2009 14:38:51 -0600 "Limato, Dave" <Dave.Limato@lsi.com>
wrote:

> Disconnect the FP ports, bring the box up, and connect one at a time.

My gods, man, you are a true professional.

> -----Original Message-----
> From: Andrew Sharp [mailto:andy.sharp@lsi.com]
> Sent: Thursday, October 15, 2009 1:27 PM
> To: Keiffer, John
> Cc: DL-ONStor-QA; DL-ONStor-Engineering
> Subject: Re: Help stopping my filer from crashing
> 
> Looks to me like the FP is down pretty early in the initialization
> sequence.  Tape driver messages in there.  Can that be a coincidence?
> 
> On Thu, 15 Oct 2009 12:19:18 -0600 "Keiffer, John"
> <John.Keiffer@lsi.com> wrote:
> 
> > Its stuck and crashes right after reboot. Can anyone see what the
> > problem is or know what to do?
> >
> > Oct 15 11:14:55 g9r62 : 0:0:cluster2:NOTICE: Using 10.2.62.9 as my
> > primary address Oct 15 11:14:55 g9r62 : 0:0:cluster2:INFO:
> > ClusterCtrl_iUpdateState: post PCC up pccname g9r62 Oct 15 11:14:55
> > g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55 g9r62 :
> > 0:0:eventd:WARNING: Process-EVENT 0.0.0.0: Mgmt Port 0.0.0.0 PCC,
> > State Up Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent()
> > Oct 15 11:14:55 g9r62 : 1:4:scsi:INFO: 2: ispfc:sp2.2:
> > ISPFC_CS_PDB_CHANGED,[8014] Global Event Oct 15 11:14:55 g9r62 :
> > 0:0:cluster2:NOTICE: ubik init with buff size 64 Oct 15 11:14:55
> > g9r62 : 0:0:cluster2:INFO: cluster_clientSendRmcRpc: Error sending
> > rpc to clusterrpc, flags 41a, name cluster2, rc -19, retrying... Oct
> > 15 11:14:55 g9r62 : 0:0:cluster2:INFO: cluster_clientSendRmcRpc:
> > Retry worked to clusterrpc, flags 472, name cluster2 Oct 15 11:14:55
> > g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55 g9r62 :
> > 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 0, State Down Oct
> > 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct 15
> > 11:14:55 g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55
> > g9r62 : 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 1, State
> > Down Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct
> > 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15
> > 11:14:55 g9r62 : 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU
> > 2, State Down Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: <
> > ems_logEvent() Oct 15 11:14:55 g9r62 : 0:0:eventd:DEBUG: >
> > ems_logEvent() Oct 15 11:14:55 g9r62 : 0:0:eventd:WARNING:
> > Process-EVENT CPU: Slot 1, CPU 3, State Down Oct 15 11:14:55 g9r62 :
> > 0:0:eventd:DEBUG: < ems_logEvent() Oct 15 11:14:55 g9r62 :
> > 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:55 g9r62 :
> > 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 4, State Down Oct
> > 15 11:14:56 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct 15
> > 11:14:56 g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:56
> > g9r62 : 0:0:eventd:WARNING: Process-EVENT CPU: Slot 1, CPU 5, State
> > Down Oct 15 11:14:56 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent() Oct
> > 15 11:14:56 g9r62 : 0:0:cluster2:INFO: ClusterCtrl_iSyncFilerInDb:
> > remove old information from cluster started Oct 15 11:14:56 g9r62 :
> > 0:0:cluster2:INFO: ClusterCtrl_iSyncFilerInDb: remove old
> > information from cluster finished, code 0 Oct 15 11:14:56 g9r62 :
> > 0:0:cluster2:INFO: ClusterCtrl_readyRestoreFiler: data base ready to
> > restore Oct 15 11:14:56 g9r62 : 0:0:cluster2:WARNING:
> > ClusterCtrl_iUpdateState: PCC down pccname g9r62 Oct 15 11:14:56
> > g9r62 : 0:0:pm:INFO: /onstor/bin/cluster_server: finished
> > initialization. Oct 15 11:14:56 g9r62 : 0:0:cluster2:INFO:
> > ClusterCtrl_iUpdateState: post PCC up pccname g9r62 Oct 15 11:14:56
> > g9r62 : 0:0:eventd:DEBUG: > ems_logEvent() Oct 15 11:14:56 g9r62 :
> > 0:0:eventd:WARNING: Process-EVENT 0.0.0.0: Mgmt Port 0.0.0.0 PCC,
> > State Up Oct 15 11:14:56 g9r62 : 0:0:eventd:DEBUG: < ems_logEvent()
> > Oct 15 11:14:56 g9r62 : 0:0:sdm:INFO: ONStor Storage Device Manager
> > (c)2009: Started Oct 15 11:14:56 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/sdm_cfgd: finished initialization. Oct 15
> > 11:14:56 g9r62 : 0:0:sdm:NOTICE: FP CPU Down Event Oct 15 11:14:56
> > g9r62 : 0:0:evm:DEBUG: EVM daemon is (re)starting... Oct 15 11:14:56
> > g9r62 : 0:0:evm:DEBUG: EVM myNfx: cluster[g9r62] node[g9r62]
> > chassisId[2158] Oct 15 11:14:57 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/evm_cfgd: finished initialization. Oct 15
> > 11:14:57 g9r62 : 0:0:pm:INFO: /onstor/bin/ea: finished
> > initialization. Oct 15 11:14:58 g9r62 : 0:0:evm:INFO: main: Cluster
> > version[30330] Oct 15 11:14:59 g9r62 : 0:0:evm:INFO:
> > evm_sendScanReq: Requesting SPM scan Oct 15 11:14:59 g9r62 :
> > 0:0:evm:DEBUG: evm_procCpuEvent : Received NFP cpu UP event. Oct 15
> > 11:14:59 g9r62 : 0:0:evm:DEBUG: evm_procCpuEvent : Received NFP cpu
> > down event. Oct 15 11:14:59 g9r62 : 0:0:spm:INFO: main: Cluster
> > version[30330] Oct 15 11:14:59 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/spm: finished initialization. Oct 15
> > 11:14:59 g9r62 : 0:0:pm:INFO: /onstor/bin/ipmd: finished
> > initialization. Oct 15 11:14:59 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/tape-driver: finished initialization. Oct
> > 15 11:15:00 g9r62 : 0:0:ssc_ndmp:INFO: ndmp_GetFPState:  fp card is
> > DOWN Oct 15 11:15:00 g9r62 : 0:0:pm:INFO: /onstor/bin/ndmp_cfgd:
> > finished initialization. Oct 15 11:15:01 g9r62 : 0:0:spm:ERROR:
> > spm_processPccEvent : Duplicate PCC event Oct 15 11:15:01 g9r62 :
> > 0:0:spm:ERROR: spm_processPccEvent : Duplicate PCC event Oct 15
> > 11:15:01 g9r62 : 0:0:auth_agent:NOTICE: main: auth-agent started
> > Oct 15 11:15:01 g9r62 : 0:0:pm:INFO: /onstor/bin/auth-agent:
> > finished initialization. Oct 15 11:15:03 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=1, file=71, #masterRec=0 Oct
> > 15 11:15:03 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=1, file=72, #masterRec=0 Oct
> > 15 11:15:03 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=1, file=74, #masterRec=0 Oct
> > 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=2, file=71, #masterRec=0 Oct
> > 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=2, file=72, #masterRec=0 Oct
> > 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=2, file=74, #masterRec=0 Oct
> > 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=3, file=71, #masterRec=0 Oct
> > 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=3, file=72, #masterRec=0 Oct
> > 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=3, file=74, #masterRec=0 Oct
> > 15 11:15:04 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=4, file=71, #masterRec=0 Oct
> > 15 11:15:05 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=4, file=72, #masterRec=0 Oct
> > 15 11:15:05 g9r62 : 0:0:vsd:INFO:
> > vsd_ensureNisFileCoherence[1230] : vs=4, file=74, #masterRec=0 Oct
> > 15 11:15:05 g9r62 : 0:0:nfxsh:DEBUG: cmd[0]: autosupport emrs show
> > config : status[0] Oct 15 11:15:05 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/vsd: finished initialization. Oct 15
> > 11:15:05 g9r62 : 0:0:vsd:ERROR: vsd_procTxnEvent : Event 1 detected
> > for txn, state 1 Oct 15 11:15:05 g9r62 : 0:0:vsd:ERROR:
> > vsd_sendIOTimeoutCfg : Cannot set the IO timeout values on FP,
> > status 24 Oct 15 11:15:05 g9r62 : 0:0:vsd:ERROR:
> > vsd_initFpCardProc : Cannot set the IO timeout values on FP, status
> > 24 Oct 15 11:15:05 g9r62 : 0:0:cluster2:INFO:
> > ClusterCtrl_ReleaseFiler: called by vtm Oct 15 11:15:05 g9r62 :
> > 0:0:vtm:DEBUG: vtm_get_filer_config_and_start_vsvr_trans: start
> > collecting failover vsvr, post event count 0 Oct 15 11:15:05
> > g9r62 : 0:0:vtm:DEBUG: vtm_get_filer_config_and_start_vsvr_trans:
> > end collecting failover vsvr Oct 15 11:15:05 g9r62 : 0:0:vtm:INFO:
> > vtm_sendCardStateMsg: Sending card DOWN to g9r62 Oct 15 11:15:05
> > g9r62 : 0:0:pm:INFO: /onstor/bin/vtmd: finished initialization. Oct
> > 15 11:15:05 g9r62 : 0:0:vtm:INFO: Vtm_ProcCardStateMsg: card state
> > DOWN message from g9r62 Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG:
> > vtm_create_transaction: transaction created tag 1255630505, type 4
> > Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG: vtm_request_node_info: node
> > info request send at 0x4ad766a9 Oct 15 11:15:05 g9r62 :
> > 0:0:vtm:DEBUG: vtm_create_transaction: transaction created tag
> > 1255630506, type 4 Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG:
> > vtm_create_transaction: transaction created tag 1255630507, type 4
> > Oct 15 11:15:05 g9r62 : 0:0:vtm:DEBUG: Vtm_ProcNodeInfoReqMsg:
> > runtime info request received from g9r62, tag 0x4ad766a9 Oct 15
> > 11:15:05 g9r62 : 0:0:vtm:DEBUG: vtm_create_transaction: transaction
> > created tag 1255630508, type 3 Oct 15 11:15:05 g9r62 :
> > 0:0:vtm:DEBUG: Vtm_ProcVolMsg: vol message response message
> > received ta_tag 1255630508 Oct 15 11:15:06 g9r62 : 0:0:vtm:NOTICE:
> > Vtm_ProcEventMsg: NFX_EVENT_CPU, state 1, slot 1, cpu 0 Oct 15
> > 11:15:06 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU,
> > state 1, slot 1, cpu 1 Oct 15 11:15:07 g9r62 : 0:0:vtm:NOTICE:
> > Vtm_ProcEventMsg: NFX_EVENT_CPU, state 1, slot 1, cpu 2 Oct 15
> > 11:15:07 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU,
> > state 1, slot 1, cpu 3 Oct 15 11:15:07 g9r62 : 0:0:vtm:NOTICE:
> > Vtm_ProcEventMsg: NFX_EVENT_CPU, state 1, slot 1, cpu 4 Oct 15
> > 11:15:08 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU,
> > state 1, slot 1, cpu 5 Oct 15 11:15:08 g9r62 : 0:0:vtm:NOTICE:
> > Vtm_ProcEventMsg: NFX_EVENT_CPU, state 2, slot 1, cpu 0 Oct 15
> > 11:15:08 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU,
> > state 2, slot 1, cpu 1 Oct 15 11:15:08 g9r62 : 0:0:vtm:NOTICE:
> > Vtm_ProcEventMsg: NFX_EVENT_CPU, state 2, slot 1, cpu 2 Oct 15
> > 11:15:08 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU,
> > state 2, slot 1, cpu 3 Oct 15 11:15:08 g9r62 : 0:0:vtm:NOTICE:
> > Vtm_ProcEventMsg: NFX_EVENT_CPU, state 2, slot 1, cpu 4 Oct 15
> > 11:15:08 g9r62 : 0:0:vtm:NOTICE: Vtm_ProcEventMsg: NFX_EVENT_CPU,
> > state 2, slot 1, cpu 5 Oct 15 11:15:08 g9r62 : 0:0:sanm:NOTICE:
> > SANM: ONStor Data Mirror (c)2006: Started Oct 15 11:15:08 g9r62 :
> > 0:0:sanm:ERROR: SANM: FP NIM down. Aborting all mirror sessions.
> > Oct 15 11:15:08 g9r62 : 0:0:sanm:ERROR: SANM: FP NIM down. Aborting
> > all mirror sessions. Oct 15 11:15:08 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/sanmd: finished initialization. Oct 15
> > 11:15:08 g9r62 : 0:0:pm:INFO: /onstor/bin/cluster_relay: finished
> > initialization. Oct 15 11:15:09 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/snmpd: finished initialization. Oct 15
> > 11:15:09 g9r62 : 0:0:tape-driver:ERROR: tape_rpc: rmc_rpc app sdm
> > cpu 0 slot 0 failed - -19 Oct 15 11:15:09 g9r62 : 0:0:snmpd:INFO:
> > UCD-SNMP version 4.2.2 Oct 15 11:15:09 g9r62 : 0:0:nfxsh:DEBUG:
> > cmd[0]: autosupport emrs show config : status[0] Oct 15 11:15:09
> > g9r62 : 0:0:asd:INFO: asd_main: auto support conf is TO , NOTETO ,
> > FROM g9r62@onstor.com, server 0.0.0.0, enable 0 Oct 15 11:15:09
> > g9r62 : 0:0:pm:INFO: /onstor/bin/asd: finished initialization. Oct
> > 15 11:15:10 g9r62 : 0:0:pm:INFO: /onstor/bin/sscccc: finished
> > initialization. Oct 15 11:15:10 g9r62 :
> > 0:0:pm:INFO: /onstor/bin/crashsaved: finished initialization. Oct
> > 15 11:15:10 g9r62 : 0:0:sscccc:INFO:
> > no /onstor/etc/sscccc_hosts_deny file Oct 15 11:15:10 g9r62 :
> > 0:0:sscccc:INFO: soft nofile resource limit = 1024 Oct 15 11:15:10
> > g9r62 : 0:0:nfxsh:DEBUG: cmd[0]: system show chassis : status[0]
> > Oct 15 11:15:10 g9r62 : 0:0:nfxsh:NOTICE: cmd[0]: -> EMRS: Not
> > gathering h_res_stats: not all processors are up. : status[2] Oct
> > 15 11:15:11 g9r62 : 0:0:vtm:ERROR: vtm_gatherInfo_proc: gather info
> > failed for request from g9r62 Oct 15 11:15:11 g9r62 :
> > 0:0:vtm:DEBUG: vtm_gatherInfo_proc: node info reply send at
> > 0x4ad766ae Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG:
> > vtm_delete_transaction: transaction delete tag  1255630508, type 3
> > Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG: Vtm_ProcNodeInfoRplyMsg:
> > runtime info response received from g9r62 Oct 15 11:15:11 g9r62 :
> > 0:0:vtm:ERROR: vtm_failover_vsvr_proc: no alternative filer found
> > for vs 2 failover Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG:
> > vtm_delete_transaction: transaction delete tag 1255630505, type 4
> > Oct 15 11:15:11 g9r62 : 0:0:vtm:ERROR: vtm_failover_vsvr_proc: no
> > alternative filer found for vs 3 failover Oct 15 11:15:11 g9r62 :
> > 0:0:vtm:DEBUG: vtm_delete_transaction: transaction delete tag
> > 1255630506, type 4 Oct 15 11:15:11 g9r62 : 0:0:vtm:ERROR:
> > vtm_failover_vsvr_proc: no alternative filer found for vs 4
> > failover Oct 15 11:15:11 g9r62 : 0:0:vtm:DEBUG:
> > vtm_delete_transaction: transaction delete tag  1255630507, type 4
> > Oct 15 11:15:12 g9r62 : 0:0:eventd:WARNING: event_rpc_complete:
> > failed to deliver event 4 to 1/1 Oct 15 11:15:12 g9r62 last message
> > repeated 5 times Oct 15 11:15:12 g9r62 : 0:0:eventd:WARNING:
> > event_rpc_complete: failed to deliver event 1 to 1/1 Oct 15
> > 11:15:12 g9r62 : 0:0:eventd:WARNING: event_rpc_complete: failed to
> > deliver event 4 to 1/7 Oct 15 11:15:12 g9r62 last message repeated
> > 5 times Oct 15 11:15:12 g9r62 : 0:0:eventd:WARNING:
> > event_rpc_complete: failed to deliver event 1 to 1/7
> >
> > OnStor GNU/Linux 4.0 g9r62 duart0
> >
> > g9r62 login: root
> > Password:
> > Last login: Thu Oct 15 11:10:41 2009 on duart0
> > Linux g9r62 2.6.22-cg #1 Thu Oct 15 05:03:08 PDT 2009 mips64
> >
> > Welcome to the ONStor NAS Gateway.
> > g9r62:~# nfx
> >
> > Welcome to the ONStor NAS Gateway.
> >
> > 10/15/09 11:15:18 g9r62 diag> systOct 15 11:15:19 g9r62 :
> > 0:0:tape-driver:ERROR: tape_rpc: rmc_rpc app sdm cpu 0 slot 0 failed
> > - -19 em show chassis
> >
> >  module     cpu         state
> > ----------------------------------------------
> >  SSC        SSC         UP
> >  NFPNIM     TXRX0       DOWN
> >             TXRX1       DOWN
> >             FP0         DOWN
> >             FP1         DOWN
> >             FP2         DOWN
> >             FP3         DOWN
> > ----------------------------------------------
> > 10/15/09 11:15:23 g9r62 diag> Oct 15 11:15:23 g9r62 :
> > 0:0:nfxsh:NOTICE: cmd[0]: system show chassis  : status[0] Oct 15
> > 11:15:25 g9r62 : 0:0:evm:DEBUG: evm_processTimeout: Timeout
> > processing txn[0x2bb39158]  state[1] function[0x43e0c0]
> > tag[0xa0020001] Oct 15 11:15:25 g9r62 : 0:0:evm:ERROR:
> > evm_unfreezeAllLvReq: FP unfreeze all volumes failed, rc[-5627] Oct
> > 15 11:15:26 g9r62 : 0:0:evm:DEBUG: Error string[Request timeout
> > error. Operation failed.] len[40] Oct 15 11:15:29 g9r62 :
> > 0:0:tape-driver:ERROR: tape_rpc: rmc_rpc app sdm cpu 0 slot 0 failed
> > - -19
> >
> > Thank you,
> > John Keiffer
> > 408-376-3106
> > LSI - QA Engineer
> > john.keiffer@lsi.com<mailto:john.keiffer@lsi.com>
> >
