AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Ed.Kwan@lsi.com>,<Brian.Stark@lsi.com>,<Bill.Fisher@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	2B044E14371DA244B71F8BF2514563F5042123A3@cosmail03.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 10 Dec 2009 12:59:51 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Kwan, Ed" <Ed.Kwan@lsi.com>
Cc: "Stark, Brian" <Brian.Stark@lsi.com>, "Fisher, Bill"
 <Bill.Fisher@lsi.com>
Subject: Re: TED 27548 Case 13462 - LSI Logic - Linux crash on Cougar at
 4.0.2.6
Message-ID: <20091210125951.51e16f81@ripper.onstor.net>
In-Reply-To: <2B044E14371DA244B71F8BF2514563F5042123A3@cosmail03.lsi.com>
References: <2B044E14371DA244B71F8BF2514563F504212093@cosmail03.lsi.com>
	<20091209155113.070f20b8@ripper.onstor.net>
	<E1EC65251D4B3D46BBC0AAA3C0629222B007FE9B@cosmail02.lsi.com>
	<20091210113357.51c53fd1@ripper.onstor.net>
	<2B044E14371DA244B71F8BF2514563F5042123A3@cosmail03.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Like I mentioned before, we can put a delay loop in there, but it might
have a significant effect on failover times.  Not something I would
be comfortable sending out to just any customer, or even this
installation site, unless they are experiencing the problem still.

On Thu, 10 Dec 2009 12:38:55 -0700 "Kwan, Ed" <Ed.Kwan@lsi.com> wrote:

> Then we need to figure out how to prevent Linux from crashing so we
> get a txrx core.
> 
> -----Original Message-----
> From: Andrew Sharp [mailto:andy.sharp@lsi.com] 
> Sent: Thursday, December 10, 2009 11:34 AM
> To: Stark, Brian
> Cc: Kwan, Ed; Fisher, Bill
> Subject: Re: TED 27548 Case 13462 - LSI Logic - Linux crash on Cougar
> at 4.0.2.6
> 
> On Thu, 10 Dec 2009 12:28:06 -0700 "Stark, Brian"
> <Brian.Stark@lsi.com> wrote:
> 
> > Couple of questions:
> > 
> > - Is the issue reproducible?  Probably not, but it's worth asking.
> > - Is there any evidence of a TXRX or FP crash at the same time?
> > I've seen kernel crashes like this when the TXRX or FP has already
> > crashed, and the system is waiting to reboot.
> 
> That's what I believe is happening, the TXRX crashes, causes an
> inadvertent MBI, Linux mgmtbus driver dutifully goes off and tries to
> process some non-existent traffic on the mgmtbus, causing this page
> fault panic in the mgmtbus driver (kernel).
> 
> 
> > -----Original Message-----
> > From: Andrew Sharp [mailto:andy.sharp@lsi.com] 
> > Sent: Wednesday, December 09, 2009 3:51 PM
> > To: Kwan, Ed
> > Cc: Stark, Brian; Fisher, Bill
> > Subject: Re: TED 27548 Case 13462 - LSI Logic - Linux crash on
> > Cougar at 4.0.2.6
> > 
> > On Wed, 9 Dec 2009 11:33:55 -0700 "Kwan, Ed" <Ed.Kwan@lsi.com>
> > wrote:
> > 
> > > Hi Andy & Rendell,
> > > 
> > > What's the next step for this defect?
> > 
> > Mark it as limbo?
> > 
> > I'm not sure what we can do.  Possibly Bill has some ideas as he
> > knows a thing or two about the mgmt bus.  Maybe we can print out
> > the bogus address in question or something.  Not sure if anyone
> > would be able to do anything with that or not.  Maybe we can put a
> > long sleep in there which possibly would allow the TXRX to dump
> > core if it's crashing.  If it's just "off in the weeds" that
> > wouldn't do anything except delay a failover.
> > 
> > If there's a bug in the Linux kernel mgmtbus driver, possibly Bill
> > can spot it with some code inspection.  I'm hesitant to ask him to
> > do that unless you think this is hot enough.  Truth is, this was
> > back in Oct., all it did was cause the vsvrs to fail over, do I
> > have that correct? I know they want explanations, but we have to
> > trim expectations some if they really require testimony for each
> > and every imperfection in the product.
