AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Brian.Stark@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	E1EC65251D4B3D46BBC0AAA3C0629222B2392CF9@cosmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 20 Jan 2010 09:53:42 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Stark, Brian" <Brian.Stark@lsi.com>
Subject: Re: CF Cards from Migration
Message-ID: <20100120095342.6b080ed1@ripper.onstor.net>
In-Reply-To: <E1EC65251D4B3D46BBC0AAA3C0629222B2392CF9@cosmail02.lsi.com>
References: <B50ED5C0A7967343B07C1764EE1D7BD1F4A0962D@cosmail01.lsi.com>
	<20100114185615.25e76b9a@ripper.onstor.net>
	<E1EC65251D4B3D46BBC0AAA3C0629222B2392CF9@cosmail02.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Tue, 19 Jan 2010 22:07:17 -0700 "Stark, Brian" <Brian.Stark@lsi.com>
wrote:

> Have you received any feedback on this?  I think you hit the nail on
> the head...


You're the first.

> 
> -----Original Message-----
> From: Andrew Sharp [mailto:andy.sharp@lsi.com] 
> Sent: Thursday, January 14, 2010 6:56 PM
> To: Duffy, Bill
> Cc: Kwan, Ed; Jin, Danqing; Limato, Dave; Thiessen, Joachim; Collins,
> Caeli; Keiffer, John; Scheer, Larry; Boulanger, Sandrine; Currin,
> Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris Subject: Re: CF
> Cards from Migration
> 
> Keep in mind that we're all LSI.  4.2.0.10 is our official release
> right now, no?  We don't have the resources to support a half-dozen
> different releases, it's already hurting us with just the small number
> of different releases involved in this schlmozzle.
> 
> Given that, I would say that, being sensitive to [internal] customer
> fears, we need to make it clear that there isn't necessarily a
> "choice" for them to make, beyond the choice to migrate or not.  Just
> as in LSI storage arrays, the NAS gateway isn't 100% bug free, and we
> have to be able to advance the releases without hindrance from the
> [internal] customer.  Put another way, it's really our decision, we
> are the experts.  If problems crop up, we support, fix, workaround
> them as usual.  If we the experts believe 4.2.0.10 is the correct
> thing to do, then that's it.
> 
> Thoughts?
> 
> 
> On Thu, 14 Jan 2010 18:10:12 -0700 "Duffy, Bill" <Bill.Duffy@lsi.com>
> wrote:
> 
> > Lsi said no way
> > 
> > ________________________________
> > From: Kwan, Ed
> > To: Duffy, Bill; Jin, Danqing; Limato, Dave; Thiessen, Joachim;
> > Collins, Caeli; Keiffer, John; Scheer, Larry; Sharp, Andy;
> > Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> > Vandever, Chris Sent: Thu Jan 14 18:09:43 2010 Subject: RE: CF
> > Cards from Migration Why wasn't 4.0.2.10 installed in the first
> > place?
> > 
> > From: Duffy, Bill
> > Sent: Thursday, January 14, 2010 5:07 PM
> > To: Jin, Danqing; Limato, Dave; Thiessen, Joachim; Kwan, Ed;
> > Collins, Caeli; Keiffer, John; Scheer, Larry; Sharp, Andy;
> > Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> > Vandever, Chris Subject: Re: CF Cards from Migration
> > 
> > 
> > So 4.0.2.10 logged in as root should be the only release used for
> > transition going forward. Correct?
> > 
> > ________________________________
> > From: Jin, Danqing
> > To: Limato, Dave; Thiessen, Joachim; Kwan, Ed; Collins, Caeli;
> > Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc:
> > Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy,
> > Bill Sent: Thu Jan 14 17:58:30 2010 Subject: RE: CF Cards from
> > Migration Yes, "system version -s" and "system copy all" both
> > failed on minonstor1.
> > 
> > From: Limato, Dave
> > Sent: Thursday, January 14, 2010 4:58 PM
> > To: Thiessen, Joachim; Kwan, Ed; Collins, Caeli; Jin, Danqing;
> > Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc:
> > Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy,
> > Bill Subject: RE: CF Cards from Migration
> > 
> > It looks like system version -s and system copy all -I failed around
> > 14:23 which indicates trouble mounting other flash I think.
> > 
> > From: Thiessen, Joachim
> > Sent: Thursday, January 14, 2010 4:56 PM
> > To: Kwan, Ed; Collins, Caeli; Jin, Danqing; Keiffer, John; Limato,
> > Dave; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc: Currin,
> > Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy, Bill
> > Subject: RE: CF Cards from Migration
> > 
> > Would a "system copy all -i"  or "system config copy" expose the
> > problem that we booted from the wrong CF card?
> > 
> > From: Kwan, Ed
> > Sent: Thursday, January 14, 2010 4:39 PM
> > To: Collins, Caeli; Jin, Danqing; Keiffer, John; Limato, Dave;
> > Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc: Currin, Shawn;
> > Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy, Bill; Thiessen,
> > Joachim Subject: RE: CF Cards from Migration
> > 
> > 4.0.2.8 and 4.0.2.4
> > 
> > From: Collins, Caeli
> > Sent: Thursday, January 14, 2010 4:37 PM
> > To: Kwan, Ed; Jin, Danqing; Keiffer, John; Limato, Dave; Scheer,
> > Larry; Sharp, Andy; Boulanger, Sandrine Cc: Currin, Shawn; Kumar,
> > Raj; Stark, Brian; Vandever, Chris; Duffy, Bill; Thiessen, Joachim
> > Subject: RE: CF Cards from Migration
> > 
> > What release were they running?
> > 
> > Caeli
> > 
> > From: Kwan, Ed
> > Sent: Thursday, January 14, 2010 4:27 PM
> > To: Jin, Danqing; Keiffer, John; Limato, Dave; Scheer, Larry; Sharp,
> > Andy; Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark,
> > Brian; Vandever, Chris; Collins, Caeli Subject: RE: CF Cards from
> > Migration
> > 
> > CC'ing Caeli.
> > 
> > From: Jin, Danqing
> > Sent: Thursday, January 14, 2010 4:21 PM
> > To: Keiffer, John; Limato, Dave; Scheer, Larry; Sharp, Andy;
> > Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> > Vandever, Chris; Kwan, Ed Subject: RE: CF Cards from Migration
> > 
> > John pointed out that the filer booted off to a wrong flash card and
> > the following message:
> > 
> > Jan  9 13:55:22 minonstor1 kernel: prom_init: env[9] =
> > 'bootdev=/dev/sda1' ...
> > Jan  9 13:55:22 minonstor1 kernel: irq 56: nobody cared (try booting
> > with the "irqpoll" option) Jan  9 13:55:22 minonstor1 kernel: Call
> > Trace: Jan  9 13:55:22 minonstor1 kernel: [<ffffffff82007888>]
> > dump_stack+0x8/0x38 Jan  9 13:55:22 minonstor1 kernel:
> > [<ffffffff82050f90>] __report_bad_irq+0x40/0xd8 ...
> > 
> > So this really smells like defect 27788 which Andy already fixed
> > (included in patch 4.0.2.10 and later)?
> > 
> > From: Keiffer, John
> > Sent: Thursday, January 14, 2010 2:57 PM
> > To: Keiffer, John; Limato, Dave; Scheer, Larry; Sharp, Andy;
> > Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> > Vandever, Chris; Kwan, Ed; Jin, Danqing Subject: RE: CF Cards from
> > Migration
> > 
> > From minonstor1:
> > 
> > Jan  9 16:39:34 minonstor1 : 0:0:ncm:WARNING: ncmd :
> > ncm_filer_rsp_complete: rpc_rsp[0x52ca48] flags[1004] sz[3176]
> > len[3176] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> > minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> > rpc_rsp[0x569b78] flags[1004] sz[3176] len[3176] dest_appid[39]
> > status[-19]     failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING:
> > ncmd : ncm_filer_rsp_complete: rpc_rsp[0x5136c0] flags[1004]
> > sz[3176] len[3176] dest_appid[39] status[-19]     failed Jan  9
> > 16:39:35 minonstor1 : 0:0:ncm:WARNING: ncmd :
> > ncm_filer_rsp_complete: rpc_rsp[0x521e80] flags[1004] sz[1816]
> > len[1816] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> > minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> > rpc_rsp[0x521fe8] flags[1004] sz[3176] len[3176] dest_appid[39]
> > status[-19]     failed Jan  9 16:39:35 minonstor1 :
> > 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete: rpc_rsp[0x5680b0]
> > flags[1004] sz[960] len[960] dest_appid[39] status[-19] failed Jan
> > 9 16:39:35 minonstor1 : 0:0:ncm:WARNING: ncmd :
> > ncm_filer_rsp_complete: rpc_rsp[0x521b60] flags[1004] sz[8048]
> > len[8048] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> > minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> > rpc_rsp[0x521da0] flags[1004] sz[8048] len[8048] dest_appid[39]
> > status[-19]     failed Jan  9 16:39:35 minonstor1 :
> > 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete: rpc_rsp[0x52c808]
> > flags[1004] sz[3176] len[3176] dest_appid[39] status[-19]
> > failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING: ncmd :
> > ncm_filer_rsp_complete: rpc_rsp[0x52c8c8] flags[1004] sz[3176]
> > len[3176] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> > minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> > rpc_rsp[0x56ad60] flags[1004] sz[8048] len[8048] dest_appid[39]
> > status[-19]     failed Jan  9 16:39:35 minonstor1 :
> > 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete: rpc_rsp[0x567e28]
> > flags[4] sz[8048] len[8048] dest_appid[39] status[-19] failed Jan
> > 9 16:39:35 minonstor1 : 0:0:ncm:WARNING: ncmd :
> > ncm_filer_rsp_complete: rpc_rsp[0x52c988] flags[4] sz[960] len[960]
> > dest_appid[39] status[-19] failed Jan  9 16:39:36 minonstor1 :
> > 0:0:nfxsh:NOTICE: cmd[0]: vsvr set "VS_MGMT_67686" : status[0] Jan
> > 9 16:39:36 minonstor1 : 0:0:nfxsh:NOTICE: cmd[1]: interface show
> > interface : status[0] Jan  9 16:39:36 minonstor1 : 0:0:snmpd:INFO:
> > getVolumeSummary: got rsp status error (0) Jan  9 16:39:36
> > minonstor1 : 0:0:snmpd:INFO: read_volume_info: Can't get vol
> > summary info (rc=-4)
> > 
> > From: Keiffer, John
> > Sent: Thursday, January 14, 2010 2:55 PM
> > To: Limato, Dave; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine
> > Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> > Ed; Jin, Danqing Subject: RE: CF Cards from Migration
> > 
> > This is hard to sift through.
> > 
> > Anybody else agree that it appears to have gone wonky around 16:39
> > on 1/9?
> > 
> > At that time I believe we were on Top-CF0 and Bottom-CF1...
> > 
> > Seems to me like after they added the trunk1 interface things got
> > messed up? Seems to have led to ncm warnings and ea issues etc...
> > 
> > Jan  9 16:38:45 minonstor2 : 0:0:nfxsh:NOTICE: cmd[0]: vsvr set
> > "MINFSV06" : status[0] Jan  9 16:38:46 minonstor2 :
> > 0:0:nfxsh:NOTICE: cmd[1]: vsvr stats -i 1 -c 1 : status[0] Jan  9
> > 16:38:49 minonstor2 : 0:0:ea:INFO: nfxnis_resRcv[3050]:
> > DNS[192.19.189.10] closed connection, VS=6. err=0 Jan  9 16:39:00
> > minonstor2 : 0:0:nfxsh:NOTICE: cmd[6]: interface create trunk1 -l
> > trunk1 : status[0] Jan  9 16:39:11 minonstor2 : 0:0:ncm:WARNING:
> > ncmd : ncm_local_rpc_received: ncm_forward_to_filer failed - -9
> > Jan  9 16:39:15 minonstor2 last message repeated 3 times Jan  9
> > 16:39:17 minonstor2 : 0:0:nfxsh:NOTICE: cmd[7]: interface show  :
> > status[4] Jan  9 16:39:18 minonstor2 : 0:0:ncm:WARNING: ncmd :
> > ncm_local_rpc_received: ncm_forward_to_filer failed - -9 Jan  9
> > 16:39:21 minonstor2 last message repeated 2 times Jan  9 16:39:23
> > minonstor2 : 0:0:vsd:INFO: vsd_ipStack_initCtxt : There is no IP
> > interface configured for vs 1
> > 
> > From: Limato, Dave
> > Sent: Thursday, January 14, 2010 1:14 PM
> > To: Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine
> > Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> > Ed; Jin, Danqing Subject: RE: CF Cards from Migration
> > 
> > I hear Ed request Danqing construct a timeline of what happened. In
> > the meantime, if you can figure out which flash/node was the PCC and
> > which was the second node of the cluster. This will help others with
> > diagnosis.
> > 
> > From: Keiffer, John
> > Sent: Thursday, January 14, 2010 1:11 PM
> > To: Limato, Dave; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine
> > Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> > Ed; Jin, Danqing Subject: RE: CF Cards from Migration
> > 
> > This will take a while to sift through. Would be nice if we had a
> > timeline for when it supposedly went bad, and on which system it
> > first was reported against.
> > 
> > I see that when the top blade initially booted it booted to CF1,
> > which it is not supposed to... but that's probably nothing at this
> > point.
> > 
> > Jan  9 13:23:55 localhost kernel: prom_init: env[9] =
> > 'bootdev=/dev/sdb1'
> > 
> > From: Limato, Dave
> > Sent: Thursday, January 14, 2010 1:00 PM
> > To: Limato, Dave; Scheer, Larry; Sharp, Andy; Keiffer, John;
> > Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> > Vandever, Chris; Kwan, Ed; Jin, Danqing Subject: RE: CF Cards from
> > Migration
> > 
> > I have copied all the data from the flash cards to
> > 
> > 10.0.0.222:/nx_corevol/defect_27946
> > 
> > From: Limato, Dave
> > Sent: Thursday, January 14, 2010 11:23 AM
> > To: Scheer, Larry; Sharp, Andy; Keiffer, John; Boulanger, Sandrine
> > Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> > Ed; Jin, Danqing Subject: CF Cards from Migration
> > 
> > I have the CF Cards from that migration. I am going to pull all
> > of /var and /onstor/conf. Does anyone think we need anything else to
> > debug this issue.  Let me know. I will also try and copy all of /
> > but not sure how long that will take.
> > 
> > 
> > 
> > Dave Limato - Sr. QA Engineer - LSI Corporation - ONStor Product
> > Test
> > - desk 408-433-8742  - cell 510.329.9994 -- dave.limato@lsi.com
> > 
