AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Bill.Duffy@lsi.com>,<Ed.Kwan@lsi.com>,<Danqing.Jin@lsi.com>,<Dave.Limato@lsi.com>,<Joachim.Thiessen@lsi.com>,<Caeli.Collins@lsi.com>,<John.Keiffer@lsi.com>,<Larry.Scheer@lsi.com>,<Sandrine.Boulanger@lsi.com>,<Shawn.Currin@lsi.com>,<Raj.Kumar@lsi.com>,<Brian.Stark@lsi.com>,<Chris.Vandever@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	B50ED5C0A7967343B07C1764EE1D7BD1F4A0962D@cosmail01.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 14 Jan 2010 18:56:15 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Duffy, Bill" <Bill.Duffy@lsi.com>
Cc: "Kwan, Ed" <Ed.Kwan@lsi.com>, "Jin, Danqing" <Danqing.Jin@lsi.com>,
 "Limato, Dave" <Dave.Limato@lsi.com>, "Thiessen, Joachim"
 <Joachim.Thiessen@lsi.com>, "Collins, Caeli" <Caeli.Collins@lsi.com>,
 "Keiffer, John" <John.Keiffer@lsi.com>, "Scheer, Larry"
 <Larry.Scheer@lsi.com>, "Boulanger, Sandrine" <Sandrine.Boulanger@lsi.com>,
 "Currin, Shawn" <Shawn.Currin@lsi.com>, "Kumar, Raj" <Raj.Kumar@lsi.com>,
 "Stark, Brian" <Brian.Stark@lsi.com>, "Vandever, Chris"
 <Chris.Vandever@lsi.com>
Subject: Re: CF Cards from Migration
Message-ID: <20100114185615.25e76b9a@ripper.onstor.net>
In-Reply-To: <B50ED5C0A7967343B07C1764EE1D7BD1F4A0962D@cosmail01.lsi.com>
References: <B50ED5C0A7967343B07C1764EE1D7BD1F4A0962D@cosmail01.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Keep in mind that we're all LSI.  4.2.0.10 is our official release
right now, no?  We don't have the resources to support a half-dozen
different releases, it's already hurting us with just the small number
of different releases involved in this schlmozzle.

Given that, I would say that, being sensitive to [internal] customer
fears, we need to make it clear that there isn't necessarily a
"choice" for them to make, beyond the choice to migrate or not.  Just
as in LSI storage arrays, the NAS gateway isn't 100% bug free, and we
have to be able to advance the releases without hindrance from the
[internal] customer.  Put another way, it's really our decision, we are
the experts.  If problems crop up, we support, fix, workaround them as
usual.  If we the experts believe 4.2.0.10 is the correct thing to do,
then that's it.

Thoughts?


On Thu, 14 Jan 2010 18:10:12 -0700 "Duffy, Bill" <Bill.Duffy@lsi.com>
wrote:

> Lsi said no way
>=20
> ________________________________
> From: Kwan, Ed
> To: Duffy, Bill; Jin, Danqing; Limato, Dave; Thiessen, Joachim;
> Collins, Caeli; Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger,
> Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris
> Sent: Thu Jan 14 18:09:43 2010 Subject: RE: CF Cards from Migration
> Why wasn=E2=80=99t 4.0.2.10 installed in the first place?
>=20
> From: Duffy, Bill
> Sent: Thursday, January 14, 2010 5:07 PM
> To: Jin, Danqing; Limato, Dave; Thiessen, Joachim; Kwan, Ed; Collins,
> Caeli; Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine
> Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris Subject:
> Re: CF Cards from Migration
>=20
>=20
> So 4.0.2.10 logged in as root should be the only release used for
> transition going forward. Correct?
>=20
> ________________________________
> From: Jin, Danqing
> To: Limato, Dave; Thiessen, Joachim; Kwan, Ed; Collins, Caeli;
> Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc:
> Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy, Bill
> Sent: Thu Jan 14 17:58:30 2010 Subject: RE: CF Cards from Migration
> Yes, =E2=80=9Csystem version =E2=80=93s=E2=80=9D and =E2=80=9Csystem copy=
 all=E2=80=9D both failed on
> minonstor1.
>=20
> From: Limato, Dave
> Sent: Thursday, January 14, 2010 4:58 PM
> To: Thiessen, Joachim; Kwan, Ed; Collins, Caeli; Jin, Danqing;
> Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc:
> Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy, Bill
> Subject: RE: CF Cards from Migration
>=20
> It looks like system version =E2=80=93s and system copy all =E2=80=93I fa=
iled around
> 14:23 which indicates trouble mounting other flash I think.
>=20
> From: Thiessen, Joachim
> Sent: Thursday, January 14, 2010 4:56 PM
> To: Kwan, Ed; Collins, Caeli; Jin, Danqing; Keiffer, John; Limato,
> Dave; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc: Currin,
> Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy, Bill
> Subject: RE: CF Cards from Migration
>=20
> Would a =E2=80=9Csystem copy all =E2=80=93i=E2=80=9D  or =E2=80=9Csystem =
config copy=E2=80=9D expose the
> problem that we booted from the wrong CF card?
>=20
> From: Kwan, Ed
> Sent: Thursday, January 14, 2010 4:39 PM
> To: Collins, Caeli; Jin, Danqing; Keiffer, John; Limato, Dave;
> Scheer, Larry; Sharp, Andy; Boulanger, Sandrine Cc: Currin, Shawn;
> Kumar, Raj; Stark, Brian; Vandever, Chris; Duffy, Bill; Thiessen,
> Joachim Subject: RE: CF Cards from Migration
>=20
> 4.0.2.8 and 4.0.2.4
>=20
> From: Collins, Caeli
> Sent: Thursday, January 14, 2010 4:37 PM
> To: Kwan, Ed; Jin, Danqing; Keiffer, John; Limato, Dave; Scheer,
> Larry; Sharp, Andy; Boulanger, Sandrine Cc: Currin, Shawn; Kumar,
> Raj; Stark, Brian; Vandever, Chris; Duffy, Bill; Thiessen, Joachim
> Subject: RE: CF Cards from Migration
>=20
> What release were they running?
>=20
> Caeli
>=20
> From: Kwan, Ed
> Sent: Thursday, January 14, 2010 4:27 PM
> To: Jin, Danqing; Keiffer, John; Limato, Dave; Scheer, Larry; Sharp,
> Andy; Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark,
> Brian; Vandever, Chris; Collins, Caeli Subject: RE: CF Cards from
> Migration
>=20
> CC=E2=80=99ing Caeli.
>=20
> From: Jin, Danqing
> Sent: Thursday, January 14, 2010 4:21 PM
> To: Keiffer, John; Limato, Dave; Scheer, Larry; Sharp, Andy;
> Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> Vandever, Chris; Kwan, Ed Subject: RE: CF Cards from Migration
>=20
> John pointed out that the filer booted off to a wrong flash card and
> the following message:
>=20
> Jan  9 13:55:22 minonstor1 kernel: prom_init: env[9] =3D
> 'bootdev=3D/dev/sda1' =E2=80=A6
> Jan  9 13:55:22 minonstor1 kernel: irq 56: nobody cared (try booting
> with the "irqpoll" option) Jan  9 13:55:22 minonstor1 kernel: Call
> Trace: Jan  9 13:55:22 minonstor1 kernel: [<ffffffff82007888>]
> dump_stack+0x8/0x38 Jan  9 13:55:22 minonstor1 kernel:
> [<ffffffff82050f90>] __report_bad_irq+0x40/0xd8 =E2=80=A6
>=20
> So this really smells like defect 27788 which Andy already fixed
> (included in patch 4.0.2.10 and later)?
>=20
> From: Keiffer, John
> Sent: Thursday, January 14, 2010 2:57 PM
> To: Keiffer, John; Limato, Dave; Scheer, Larry; Sharp, Andy;
> Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> Vandever, Chris; Kwan, Ed; Jin, Danqing Subject: RE: CF Cards from
> Migration
>=20
> From minonstor1:
>=20
> Jan  9 16:39:34 minonstor1 : 0:0:ncm:WARNING: ncmd :
> ncm_filer_rsp_complete: rpc_rsp[0x52ca48] flags[1004] sz[3176]
> len[3176] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> rpc_rsp[0x569b78] flags[1004] sz[3176] len[3176] dest_appid[39]
> status[-19]     failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING:
> ncmd : ncm_filer_rsp_complete: rpc_rsp[0x5136c0] flags[1004] sz[3176]
> len[3176] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> rpc_rsp[0x521e80] flags[1004] sz[1816] len[1816] dest_appid[39]
> status[-19]     failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING:
> ncmd : ncm_filer_rsp_complete: rpc_rsp[0x521fe8] flags[1004] sz[3176]
> len[3176] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> rpc_rsp[0x5680b0] flags[1004] sz[960] len[960] dest_appid[39]
> status[-19] failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING:
> ncmd : ncm_filer_rsp_complete: rpc_rsp[0x521b60] flags[1004] sz[8048]
> len[8048] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> rpc_rsp[0x521da0] flags[1004] sz[8048] len[8048] dest_appid[39]
> status[-19]     failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING:
> ncmd : ncm_filer_rsp_complete: rpc_rsp[0x52c808] flags[1004] sz[3176]
> len[3176] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> rpc_rsp[0x52c8c8] flags[1004] sz[3176] len[3176] dest_appid[39]
> status[-19]     failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING:
> ncmd : ncm_filer_rsp_complete: rpc_rsp[0x56ad60] flags[1004] sz[8048]
> len[8048] dest_appid[39] status[-19]     failed Jan  9 16:39:35
> minonstor1 : 0:0:ncm:WARNING: ncmd : ncm_filer_rsp_complete:
> rpc_rsp[0x567e28] flags[4] sz[8048] len[8048] dest_appid[39]
> status[-19] failed Jan  9 16:39:35 minonstor1 : 0:0:ncm:WARNING:
> ncmd : ncm_filer_rsp_complete: rpc_rsp[0x52c988] flags[4] sz[960]
> len[960] dest_appid[39] status[-19] failed Jan  9 16:39:36
> minonstor1 : 0:0:nfxsh:NOTICE: cmd[0]: vsvr set "VS_MGMT_67686" :
> status[0] Jan  9 16:39:36 minonstor1 : 0:0:nfxsh:NOTICE: cmd[1]:
> interface show interface : status[0] Jan  9 16:39:36 minonstor1 :
> 0:0:snmpd:INFO: getVolumeSummary: got rsp status error (0) Jan  9
> 16:39:36 minonstor1 : 0:0:snmpd:INFO: read_volume_info: Can't get vol
> summary info (rc=3D-4)
>=20
> From: Keiffer, John
> Sent: Thursday, January 14, 2010 2:55 PM
> To: Limato, Dave; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine
> Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> Ed; Jin, Danqing Subject: RE: CF Cards from Migration
>=20
> This is hard to sift through.
>=20
> Anybody else agree that it appears to have gone wonky around 16:39 on
> 1/9?
>=20
> At that time I believe we were on Top-CF0 and Bottom-CF1=E2=80=A6
>=20
> Seems to me like after they added the trunk1 interface things got
> messed up? Seems to have led to ncm warnings and ea issues etc=E2=80=A6
>=20
> Jan  9 16:38:45 minonstor2 : 0:0:nfxsh:NOTICE: cmd[0]: vsvr set
> "MINFSV06" : status[0] Jan  9 16:38:46 minonstor2 : 0:0:nfxsh:NOTICE:
> cmd[1]: vsvr stats -i 1 -c 1 : status[0] Jan  9 16:38:49 minonstor2 :
> 0:0:ea:INFO: nfxnis_resRcv[3050]: DNS[192.19.189.10] closed
> connection, VS=3D6. err=3D0 Jan  9 16:39:00 minonstor2 :
> 0:0:nfxsh:NOTICE: cmd[6]: interface create trunk1 -l trunk1 :
> status[0] Jan  9 16:39:11 minonstor2 : 0:0:ncm:WARNING: ncmd :
> ncm_local_rpc_received: ncm_forward_to_filer failed - -9 Jan  9
> 16:39:15 minonstor2 last message repeated 3 times Jan  9 16:39:17
> minonstor2 : 0:0:nfxsh:NOTICE: cmd[7]: interface show  : status[4]
> Jan  9 16:39:18 minonstor2 : 0:0:ncm:WARNING: ncmd :
> ncm_local_rpc_received: ncm_forward_to_filer failed - -9 Jan  9
> 16:39:21 minonstor2 last message repeated 2 times Jan  9 16:39:23
> minonstor2 : 0:0:vsd:INFO: vsd_ipStack_initCtxt : There is no IP
> interface configured for vs 1
>=20
> From: Limato, Dave
> Sent: Thursday, January 14, 2010 1:14 PM
> To: Keiffer, John; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine
> Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> Ed; Jin, Danqing Subject: RE: CF Cards from Migration
>=20
> I hear Ed request Danqing construct a timeline of what happened. In
> the meantime, if you can figure out which flash/node was the PCC and
> which was the second node of the cluster. This will help others with
> diagnosis.
>=20
> From: Keiffer, John
> Sent: Thursday, January 14, 2010 1:11 PM
> To: Limato, Dave; Scheer, Larry; Sharp, Andy; Boulanger, Sandrine
> Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> Ed; Jin, Danqing Subject: RE: CF Cards from Migration
>=20
> This will take a while to sift through. Would be nice if we had a
> timeline for when it supposedly went bad, and on which system it
> first was reported against.
>=20
> I see that when the top blade initially booted it booted to CF1,
> which it is not supposed to=E2=80=A6 but that=E2=80=99s probably nothing =
at this
> point.
>=20
> Jan  9 13:23:55 localhost kernel: prom_init: env[9] =3D
> 'bootdev=3D/dev/sdb1'
>=20
> From: Limato, Dave
> Sent: Thursday, January 14, 2010 1:00 PM
> To: Limato, Dave; Scheer, Larry; Sharp, Andy; Keiffer, John;
> Boulanger, Sandrine Cc: Currin, Shawn; Kumar, Raj; Stark, Brian;
> Vandever, Chris; Kwan, Ed; Jin, Danqing Subject: RE: CF Cards from
> Migration
>=20
> I have copied all the data from the flash cards to
>=20
> 10.0.0.222:/nx_corevol/defect_27946
>=20
> From: Limato, Dave
> Sent: Thursday, January 14, 2010 11:23 AM
> To: Scheer, Larry; Sharp, Andy; Keiffer, John; Boulanger, Sandrine
> Cc: Currin, Shawn; Kumar, Raj; Stark, Brian; Vandever, Chris; Kwan,
> Ed; Jin, Danqing Subject: CF Cards from Migration
>=20
> I have the CF Cards from that migration. I am going to pull all
> of /var and /onstor/conf. Does anyone think we need anything else to
> debug this issue.  Let me know. I will also try and copy all of / but
> not sure how long that will take.
>=20
>=20
>=20
> Dave Limato - Sr. QA Engineer - LSI Corporation - ONStor Product Test
> - desk 408-433-8742  - cell 510.329.9994 -- dave.limato@lsi.com
>=20
