AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Raj.Kumar@lsi.com>,<Dave.Limato@lsi.com>,<Larry.Scheer@lsi.com>,<maxim.kozlovsky@lsi.com>,<Ed.Kwan@lsi.com>,<Danqing.Jin@lsi.com>,<Richard.Hardiman@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	0BAA09DBFAD04A4DBB6CE240807CB3B9010C7FE3AE@cosmail03.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 25 Mar 2010 14:58:48 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Kumar, Raj" <Raj.Kumar@lsi.com>
Cc: "Limato, Dave" <Dave.Limato@lsi.com>, "Scheer, Larry"
 <Larry.Scheer@lsi.com>, Maxim Kozlovsky <maxim.kozlovsky@lsi.com>, "Kwan,
 Ed" <Ed.Kwan@lsi.com>, "Jin, Danqing" <Danqing.Jin@lsi.com>, "Hardiman,
 Richard" <Richard.Hardiman@lsi.com>
Subject: Re: s_home backing up.
Message-ID: <20100325145848.001dfea0@ripper.onstor.net>
In-Reply-To: <0BAA09DBFAD04A4DBB6CE240807CB3B9010C7FE3AE@cosmail03.lsi.com>
References: <D7A889C980962746B30DE07864593C02CF29B25A@cosmail02.lsi.com>
	<0BAA09DBFAD04A4DBB6CE240807CB3B9010C7FE3A7@cosmail03.lsi.com>
	<D7A889C980962746B30DE07864593C02CF29B2E0@cosmail02.lsi.com>
	<0BAA09DBFAD04A4DBB6CE240807CB3B9010C7FE3AE@cosmail03.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Yeah.  The age old issue is that it isn't actually reliable.

[sorry my pos email client garbled the log text]

On Thu, 25 Mar 2010 15:52:41 -0600 "Kumar, Raj" <Raj.Kumar@lsi.com>
wrote:

> The failure is due to the age old RMC related issue.
> 
> Mar 25 13:43:43 Dogfood : 1:2:efs:INFO: 7834: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - dump progress[3]: total
> records: 65924712; dumped: 177462; remaining: 65747250; estimated
> time remaining: 268:34:32; percent complete: 0%; throughput: 4386816
> bytes sec Mar 25 13:43:43 Dogfood : 1:2:efs:INFO: 7835: FS:
> s_home       0x10200000187 - dumpStart - dump_restore - [3] dump LOG
> message: Thu Mar 25 13:43:43 2010 dump progress[3]: total records:
> 65924712; dumped: 177462; remaining: 65747250; estimated time
> remaining: 268:34:32; percent complete: 0%; throughput: 4386816 bytes
> sec Mar 25 13:46:22 Dogfood : 0:0:nfxsh:NOTICE: cmd[1]: ndmp stat :
> status[2] Mar 25 13:46:31 Dogfood : 0:0:nfxsh:NOTICE: cmd[2]: ndmp
> sho stat : status[13] Mar 25 13:46:41 Dogfood : 0:0:nfxsh:NOTICE:
> cmd[3]: vsvr set mightydog : status[0] Mar 25 13:46:46 Dogfood :
> 0:0:nfxsh:NOTICE: cmd[4]: ndmp sho stat : status[0] Mar 25 13:48:16
> Dogfood : 1:3:ndmp:INFO: 7836: ndmp_rmcCloseCallBack:172: Close call
> back called for 0x10568c3000 Mar 25 13:48:16 Dogfood : 1:3:ndmp:INFO:
> 7837: ndmp_rmcCloseCallBack:207: Setting RMCSession of servp
> (0x1034664000) to NULL. Mar 25 13:48:17 Dogfood : 1:2:efs:ERROR:
> 7838: FS: s_home       0x10200000187 - dumpStart - dump_restore - RMC
> Reliable send was not delivered: Context[0x1044f18000]
> msg[0x10029e0aa0] Mar 25 13:48:17 Dogfood : 1:2:efs:ERROR: 7839: FS:
> s_home       0x10200000187 - dumpStart - dump_restore - RMC Msg
> Header Contents: flags[0x1004] stat[-19] sess[0x0] seq[0xedb8]
> len[24536d] Mar 25 13:48:17 Dogfood : 1:2:efs:ERROR: 7840: FS:
> s_home       0x10200000187 - dumpStart - dump_restore - RMC Reliable
> send was not delivered: Context[0x1044f18000] msg[0x10029e1220] Mar
> 25 13:48:17 Dogfood : 1:4:efs:ERROR: 7841: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - fs_dr_rmcSendMessage: rmc
> send failed. Looks like session has gone Mar 25 13:48:17 Dogfood :
> 1:4:efs:ERROR: 7842: FS: s_home       0x10200000187 - dumpStart -
> dump_restore - fs_dumpSendFileHistory: rmc send message failed Mar 25
> 13:48:17 Dogfood : 1:2:efs:ERROR: 7843: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - RMC Msg Header Contents:
> flags[0x1004] stat[-19] sess[0x0] seq[0xedb9] len[24536d] Mar 25
> 13:48:17 Dogfood : 1:2:efs:ERROR: 7844: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - RMC Reliable send was not
> delivered: Context[0x1044f18000] msg[0x10029e0000] Mar 25 13:48:17
> Dogfood : 1:4:efs:INFO: 7845: FS: s_home       0x10200000187 -
> dumpStart - dump_restore - dump [3]: Pass III complete with dirinodes
> 4841499 ndumped 2710610 ino: 41071320 Mar 25 13:48:17 Dogfood :
> 1:2:efs:ERROR: 7846: FS: s_home       0x10200000187 - dumpStart -
> dump_restore - RMC Msg Header Contents: flags[0x1004] stat[-19]
> sess[0x0] seq[0xedba] len[24568d] Mar 25 13:48:17 Dogfood :
> 1:4:efs:INFO: 7847: FS: s_home       0x10200000187 - dumpStart -
> dump_restore - [3] dump LOG message: Thu Mar 25 13:48:17 2010 DUMP
> ABORTED due to i/o error Mar 25 13:48:17 Dogfood : 1:2:efs:ERROR:
> 7848: FS: s_home       0x10200000187 - dumpStart - dump_restore - RMC
> Reliable send was not delivered: Context[0x1044f18000]
> msg[0x10029e9cc0] Mar 25 13:48:17 Dogfood : 1:4:efs:ERROR: 7849: FS:
> s_home       0x10200000187 - dumpStart - dump_restore -
> fs_dr_rmcSendMessage: rmc send failed. Looks like session has gone
> Mar 25 13:48:17 Dogfood : 1:2:efs:ERROR: 7850: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - RMC Msg Header Contents:
> flags[0x1004] stat[-19] sess[0x0] seq[0xedbb] len[24568d] Mar 25
> 13:48:17 Dogfood : 1:4:efs:ERROR: 7851: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - fs_dumpSendLog: rmc send
> message failed Mar 25 13:48:18 Dogfood : 1:2:efs:ERROR: 7852: FS:
> s_home       0x10200000187 - dumpStart - dump_restore - RMC Reliable
> send was not delivered: Context[0x1044f18000] msg[0x10029e9e00] Mar
> 25 13:48:18 Dogfood : 1:2:efs:ERROR: 7853: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - RMC Msg Header Contents:
> flags[0x1004] stat[-19] sess[0x0] seq[0xedb4] len[24568d] Mar 25
> 13:48:18 Dogfood : 1:2:efs:ERROR: 7854: FS: s_home
> 0x10200000187 - dumpStart - dump_restore - RMC Reliable send was not
> delivered: Context[0x1044f18000] msg[0x10029e0640] Mar 25 13:48:18
> Dogfood : 1:2:efs:ERROR: 7855: FS: s_home       0x10200000187 -
> dumpStart - dump_restore - RMC Msg Header Contents: flags[0x1004]
> stat[-19] sess[0x0] seq[0xedb5] len[24536d] Mar 25 13:48:18 Dogfood :
> 1:2:efs:ERROR: 7856: FS: s_home       0x10200000187 - dumpStart -
> dump_restore - RMC Reliable send was not delivered:
> Context[0x1044f18000] msg[0x10029e1900] Mar 25 13:48:18 Dogfood :
> 1:2:efs:ERROR: 7857: FS: s_home       0x10200000187 - dumpStart -
> dump_restore - RMC Msg Header Contents: flags[0x1004] stat[-19]
> sess[0x0] seq[0xedb6] len[24568d] Mar 25 13:48:18 Dogfood :
> 1:4:efs:INFO: 7858: FS: s_home       0x10200000187 - dumpStart -
> dump_restore - [3] io output stats: dump paused due to ndmp/tape flow
> control 45992 (usec) Mar 25 13:48:18 Dogfood : 1:2:efs:ERROR: 7859:
> FS: s_home       0x10200000187 - dumpStart - dump_restore - RMC
> Reliable send was not delivered: Context[0x1044f18000]
> msg[0x10029e1360] Mar 25 13:48:18 Dogfood : 1:4:efs:INFO: 7860: FS:
> s_home       0x10200000187 - dumpStart - dump_restore - [3] io output
> stats: recs written 193128 bytes written 12459073536 Mar 25 13:48:19
> Dogfood : 1:2:efs:ERROR: 7861: FS: s_home       0x10200000187 -
> dumpStart - dump_restore - RMC Msg Header Contents: flags[0x1004]
> stat[-19] sess[0x0] seq[0xedb7] len[24568d] Mar 25 13:48:19 Dogfood :
> 1:4:efs:INFO: 7862: FS: s_home       0x10200000187 - dumpStart -
> dump_restore - [3] Pass III stats: nDirs 4841499 blks 2764936 compact
> 1812036 Mar 25 13:48:19 Dogfood : 1:5:efs:INFO: 7863: FS:
> s_home       0x10200000187 - dumpStart - snp - snapshot dump_3
> removal initiated Mar 25 13:48:37 Dogfood : 0:0:ssc_ndmp:ERROR:
> sendRequestToFp:  RMC message to FP failed: mid:9611 rc:-3 Mar 25
> 13:48:37 Dogfood : 0:0:ssc_ndmp:ERROR: ndmp_procSession: Sess
> 1269399515: Error communicating with FP; aborting session Mar 25
> 13:48:37 Dogfood : 0:0:ssc_ndmp:ERROR: sendRequestToFp:  RMC message
> to FP failed: mid:9605 rc:-3 Mar 25 13:48:37 Dogfood :
> 0:0:ssc_ndmp:ERROR: ndmp_sendCloseSessToNNIM: Message send to FP to
> CLOSE SESSION failed, sessId 1269399515 Mar 25 13:48:37 Dogfood :
> 0:0:ssc_ndmp:ERROR: tape_send_multi: waitfor_tape_msg failed -
> status:-19 Mar 25 13:48:37 Dogfood : 0:0:ssc_ndmp:ERROR:
> tape_rpc_multi: tape_send_multi failed - app tape-driver cpu 0 slot 0
> - -19 Mar 25 13:48:37 Dogfood : 0:0:ssc_ndmp:ERROR: sndReqToTapeDrv:
> tape_rpc_multi failed Mar 25 13:48:37 Dogfood : 0:0:ssc_ndmp:ERROR:
> ndmp_closeTapeDev: Message send to close tape failed
> 
> From: Limato, Dave
> Sent: Thursday, March 25, 2010 2:46 PM
> To: Kumar, Raj; Scheer, Larry; Sharp, Andy; Kwan, Ed; Jin, Danqing;
> Hardiman, Richard Subject: RE: s_home backing up.
> 
> I checked with Mai, last backup failed. I am going to monitor this
> session for today, and see what rea;;y happened on last backup. We
> have IT folks in the building, I wonder if they are messing with the
> network?
> 
> Backup job ID #6 failed right? Fail code 99, it is network connection
> I just restart the backup again, still running Mai
> 
> 
> From: Kumar, Raj
> Sent: Thursday, March 25, 2010 2:41 PM
> To: Limato, Dave; Scheer, Larry; Sharp, Andy; Kwan, Ed; Jin, Danqing;
> Hardiman, Richard Subject: RE: s_home backing up.
> 
> On the filer the KPI stats for SCSI reads and writes doesn't look as
> bad as soak's. I suspect we will not see big queue on the array side.
> 
> From: Limato, Dave
> Sent: Thursday, March 25, 2010 1:12 PM
> To: Scheer, Larry; Sharp, Andy; Kwan, Ed; Jin, Danqing; Hardiman,
> Richard; Kumar, Raj Subject: s_home backing up.
> 
> Lets see if I can keep this one going. DMA config issues prevented
> this before (ran out of space). Lets get the IP address of the array,
> and we can look in and run the infamous luall.
> 
> I think there might be rush to judgment with that command, but lets
> follow it through, and see what the Storage teams say.
> 
> Session Status:
> ==============
> 
> SessId: 1269399515
> ======================
>  Virtual Server      Ver  Client Address   Start Time         Elapsed
> Time ------------------  ---  ---------------  -----------------
> ---------------- MIGHTYDOG           4    10.2.57.200      12:56:05
> 03-25-10  00 days 00:10:15
> 
>  Device                                                Mode
>  ----------------------------------------------------  ----------
>  NRNU526h                                              Read/Write
> 
>  Mover State  Data State  Transferred(MB)  Throughput(MB/s)
>  -----------  ----------  ---------------  ----------------
>  Active       Active      861              6.1
> 
>  Operation  Status     Est Size(MB)     Level  Est Time Remain
> Completed(%) ---------  ---------  ---------------  -----
> ---------------  ------------ BACKUP     Active     4055914
> 0      162:01:21        0.02
> 
>  Path
>  --------------------------------------
>  /s_home/
> 
> 
> 
> 
> Dave Limato - Sr. QA Engineer - LSI Corporation - ONStor Product Test
> - desk 408-433-8742  - cell 510.329.9994 -- dave.limato@lsi.com
> 
