AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080407133828.6bf9c912@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<sandrine.boulanger@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E05C7466B@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 7 Apr 2008 13:38:47 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Sandrine Boulanger" <sandrine.boulanger@onstor.com>
Subject: Re: how do we get more info for kernel oops?
Message-ID: <20080407133847.11f7e78f@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E05C7466B@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E05C7466B@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

That's all the information we need.  Here is the relevant part:

> tx0: Waiting for FP to start coredump local mem copy

The FP crashed and sent a bogus address, 0x120, to the SSC via the
mgmt_bus driver.  The mgmt_bus driver, having no way to know it's a
bogus address, dereferences it, and of course gets a page fault.  While
I recently changed this from a panic to an oops, when it happens in the
context of an interrupt, the kernel has no choice but panic.

I thought there was already a bug filed for this, but I don't see it.
Someone may have marked it WAD or assigned it to someone else as the
root cause was the FP crash.


On Mon, 7 Apr 2008 13:13:19 -0700 "Sandrine Boulanger"
<sandrine.boulanger@onstor.com> wrote:

> I just noticied this, don't know how it happened. 
> 
> g14r10 login: Oops[#1]:
> CPU 0 Unable to handle kernel paging request at virtual address
> 0000000000000120, epc == ffffffff82006a88, ra == ffffffff82006a90
> Oops[#2]:
> Cpu 0
> $ 0   : 0000000000000000 0000000010001fe0 000000000000000d
> 0000000000000001
> $ 4   : ffffffff8226daf8 0000000000000000 ffffffffffffffff
> 0000000000004699
> $ 8   : ffffffff822b0000 ffffffff822b2890 ffffffffffff4699
> ffffffff82300000
> $12   : ffffffff82310000 ffffffff82300000 fffffffffffffffd
> ffffffff8223b508
> $16   : 0000000000000000 0000000000000000 0000000000000000
> ffffffff82270000
> $20   : ffffffff822ca680 900000f81a7084a0 0000000000000018
> a80000008b430760
> $24   : 0000000000000000 0000000000000020
> $28   : a80000008be60000 a80000008be63ab0 0000000000000000
> ffffffff82006a90
> Hi    : 0000000000000000
> Lo    : 0000000000000000
> epc   : ffffffff82006a88 show_regs+0x38/0x470     Not tainted
> ra    : ffffffff82006a90 show_regs+0x40/0x470
> Status: 10001fe2    KX SX UX KERNEL EXL
> Cause : 80809008
> BadVA : 0000000000000120
> PrId  : 00040103
> Modules linked in: autofs4
> Process ndmp_cfgd (pid: 1331, threadinfo=a80000008be60000,
> task=a80000000491b200)
> Stack : 0000000000000000 0000000000000000 a80000008e21a0e0
> a80000008eb7f200
>         ffffffff822ca680 ffffffff82006ffc ffffffff8226df70
> ffffffff820070cc
>         0000000000000000 a80000008eb7f210 ffffffff82195e60
> fffffffe000025a5
>         900000008f000000 ffffffff82195f84 a80000008e21a0e0
> ffffffff822ca680
>         0000000000000018 a80000008eb7f200 ffffffff822ca680
> a80000008be63da8
>         ffffffff821a9ea4 ffffffff821a9ea4 0000000000000022
> a80000008e21a0e0
>         0000000000000018 ffffffff82220938 0000000000000000
> 0000000000000008
>         0000000000000018 0000000000000007 0000000000000001
> 000000000000001f
>         ffffffff822ca950 0000000000000000 a80000008be63c88
> a80000008bb78cc0
>         0000000000000000 000000007feb7b20 000000007feb7bb0
> 0000000000000018
>         ...
> Call Trace:
> [<ffffffff82006a88>] show_regs+0x38/0x470
> [<ffffffff82006ffc>] show_registers+0x14/0x68
> [<ffffffff820070cc>] die+0x7c/0xe0
> [<ffffffff82195e60>] MGMTBUS_PHYS2VIRT+0xc8/0x128
> [<ffffffff82195f84>] mgmtbus_hard_start_xmit+0xc4/0x178
> [<ffffffff821a9ea4>] dev_queue_xmit+0x30c/0x458
> [<ffffffff82220938>] eee_dgram_sendmsg+0x2b8/0x440
> [<ffffffff82199540>] sock_sendmsg+0x98/0xe8
> [<ffffffff82199998>] sys_sendto+0xe8/0x138
> [<ffffffff8200fec8>] handle_sys+0x108/0x124
> 
> 
> Code: 0000882d  3c138227  6484daf8 <0c8094da> 8e540120  08801abb
> 0200282d  24050010  1200001a
> Kernel panic - not syncing: Fatal exception in interrupt
> Rebooting in 5 seconds..<6>tx0:
> tx0:
> tx0: Exception Cause = Watchdog Timeout/NMI
> tx0: ERREPC:   0xffffffff83245494
> tx0: RA:       0xffffffff8324548c
> tx0: SR:       0x200800e1
> tx0: Waiting for FP to start coredump local mem copy
> SiByte Watchdog in danger of initiating system reset in 8.1 seconds
> 
> 
> 
> PowerOn Self Test........OK
> 
> Initializing System......please wait
> 
> 
