AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Brian.Stark@lsi.com>,<Ed.Kwan@lsi.com>,<Bill.Fisher@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	E1EC65251D4B3D46BBC0AAA3C0629222B007FE9B@cosmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 10 Dec 2009 11:33:57 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Stark, Brian" <Brian.Stark@lsi.com>
Cc: "Kwan, Ed" <Ed.Kwan@lsi.com>, "Fisher, Bill" <Bill.Fisher@lsi.com>
Subject: Re: TED 27548 Case 13462 - LSI Logic - Linux crash on Cougar at
 4.0.2.6
Message-ID: <20091210113357.51c53fd1@ripper.onstor.net>
In-Reply-To: <E1EC65251D4B3D46BBC0AAA3C0629222B007FE9B@cosmail02.lsi.com>
References: <2B044E14371DA244B71F8BF2514563F504212093@cosmail03.lsi.com>
	<20091209155113.070f20b8@ripper.onstor.net>
	<E1EC65251D4B3D46BBC0AAA3C0629222B007FE9B@cosmail02.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 10 Dec 2009 12:28:06 -0700 "Stark, Brian" <Brian.Stark@lsi.com>
wrote:

> Couple of questions:
> 
> - Is the issue reproducible?  Probably not, but it's worth asking.
> - Is there any evidence of a TXRX or FP crash at the same time?  I've
> seen kernel crashes like this when the TXRX or FP has already
> crashed, and the system is waiting to reboot.

That's what I believe is happening, the TXRX crashes, causes an
inadvertent MBI, Linux mgmtbus driver dutifully goes off and tries to
process some non-existent traffic on the mgmtbus, causing this page
fault panic in the mgmtbus driver (kernel).


> -----Original Message-----
> From: Andrew Sharp [mailto:andy.sharp@lsi.com] 
> Sent: Wednesday, December 09, 2009 3:51 PM
> To: Kwan, Ed
> Cc: Stark, Brian; Fisher, Bill
> Subject: Re: TED 27548 Case 13462 - LSI Logic - Linux crash on Cougar
> at 4.0.2.6
> 
> On Wed, 9 Dec 2009 11:33:55 -0700 "Kwan, Ed" <Ed.Kwan@lsi.com> wrote:
> 
> > Hi Andy & Rendell,
> > 
> > What's the next step for this defect?
> 
> Mark it as limbo?
> 
> I'm not sure what we can do.  Possibly Bill has some ideas as he knows
> a thing or two about the mgmt bus.  Maybe we can print out the bogus
> address in question or something.  Not sure if anyone would be able to
> do anything with that or not.  Maybe we can put a long sleep in there
> which possibly would allow the TXRX to dump core if it's crashing.  If
> it's just "off in the weeds" that wouldn't do anything except delay a
> failover.
> 
> If there's a bug in the Linux kernel mgmtbus driver, possibly Bill can
> spot it with some code inspection.  I'm hesitant to ask him to do that
> unless you think this is hot enough.  Truth is, this was back in Oct.,
> all it did was cause the vsvrs to fail over, do I have that correct?
> I know they want explanations, but we have to trim expectations some
> if they really require testimony for each and every imperfection in
> the product.
