AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Larry.Scheer@lsi.com>,<David.Olien@lsi.com>,<Rendell.Fong@lsi.com>,<maxim.kozlovsky@lsi.com>,<bill.fisher@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	DEC609CD0E54B2448DAF023C89AE9755EB50C54A@cosmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 1 Apr 2010 17:20:05 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Scheer, Larry" <Larry.Scheer@lsi.com>
Cc: "Olien, David" <David.Olien@lsi.com>, "Fong, Rendell"
 <Rendell.Fong@lsi.com>, Maxim Kozlovsky <maxim.kozlovsky@lsi.com>, Bill
 Fisher <bill.fisher@lsi.com>
Subject: Re: Do you have a fix for this tuxrx crash?
Message-ID: <20100401172005.18845dfb@ripper.onstor.net>
In-Reply-To: <DEC609CD0E54B2448DAF023C89AE9755EB50C54A@cosmail02.lsi.com>
References: <DEC609CD0E54B2448DAF023C89AE9755EB50C54A@cosmail02.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 1 Apr 2010 17:41:34 -0600 "Scheer, Larry"
<Larry.Scheer@lsi.com> wrote:

> Hi David,
>    Andy said you might have a fix for a TXRX crash we are seeing on a
> QA system. We are booting the bottom blade and see a crash in
> sbmac_init.
> 
> I suspect this problem may only be occurring on the bottom blades.
> What filer are you using? I could check to see if it, too is the
> bottom blade.
> 
> If you do, could you send to me the diffs.
> 
> Thanks,
> 
> Larry
> 
> DBE physical address: 0010068200
                        ^^^^^^^^^^
This is a very puzzling address, I cannot see why sbmac_init_module
would be hitting that address.  Something is strange.  Possibly there
is something very recently broken in the branch.

Dave, was this what you were seeing?

It should be hitting 10064000 - 10067000, but should never get to 8000
as far as I can tell.  Not in that function, which is pretty simple.


> Data bus error, epc == ffffffff8332ab84, ra == ffffffff8332ab7c
> Oops[#1]:
> Cpu 0
> $ 0   : 0000000000000000 0000000014001fe0 0000000001200008
> 00000000000000ff $ 4   : a80000000b05bec4 0000000000000007
> 0000000000000034 0000000000000007 $ 8   : 0000000000000000
> 0000000000000008 0000000000000041 0000000000000008 $12   :
> a80000000b05bee7 0000000000000010 0000000000000000 ffffffff83248910
> $16   : 000000000000000f a80000000b05beda a80000000b05beca
> 9000000010068208 $20   : 000000000000003a 000000000000002d
> 0000000000000005 a80000000b05beca $24   : 0000000000000000
> 0000000000000030 $28   : a80000000b058000 a80000000b05beb0
> a80000000b05bec4 ffffffff8332ab7c Hi    : 000000000000000f Lo    :
> 0000000000000000 epc   : ffffffff8332ab84
> sbmac_init_module+0x26c/0x5d0     Not tainted ra    :
> ffffffff8332ab7c sbmac_init_module+0x264/0x5d0 Status: 14001fe3    KX
> SX UX KERNEL EXL IE Cause : 0080801c
> PrId  : 01041100
> Modules linked in:
> Process swapper (pid: 1, threadinfo=a80000000b058000,
> task=a80000000b057870) Stack : 0000000014001fe1 ffffffff83304d60
> 0734070083307810 3a37303a3030ffff 46463a37303a3433 463a323846464646
> 0034394646464646 ffffffff832b0000 0000000000000000 ffffffff833334f0
> 0000000000000000 fffffffffffffffe 0000000000000000 ffffffff832b0000
> ffffffff83330000 ffffffff83330000 ffffffff83330000 ffffffff833147d8
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 ffffffff83004b90 0000000000000000 ffffffff83004b80
> 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a ...
> Call Trace:
> [<ffffffff8332ab84>] sbmac_init_module+0x26c/0x5d0
> [<ffffffff833147d8>] kernel_init+0x1d0/0x3f8
> [<ffffffff83004b90>] kernel_thread_helper+0x10/0x18
> 
> 
> Code: 26d60001  fe620000  de620000 <dfa20030> 16c2ff89  66731000
> 3c028338  3c038338  0000a82d Can't open proc stat
> Crashdump not saved, prom device open error
> primary crash already saved... crash #2 (Attempted to kill init!)
> will be ignored Kernel panic - not syncing: Attempted to kill init!
> Rebooting in 5 seconds..