AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080116121802.7b9ab8c2@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<vikas.saini@onstor.com>,<dl-Cougar@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E07AE236D@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 16 Jan 2008 12:18:08 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Vikas Saini" <vikas.saini@onstor.com>
Cc: "dl-Cougar" <dl-Cougar@onstor.com>
Subject: Re: kernel panic ?
Message-ID: <20080116121808.441d2f5a@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E07AE236D@onstor-exch02.onstor.net>
References: <20080116120936.1c4a80e7@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E07AE236D@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

I don't see the stack trace from Linux.  Can you attach the
entire /var/log/messages file to the bug?

Max, I really hate Linux panics like this.  How bad would some sanity
checking code around these translations affect performance?  We also
should install some range checking code in the mipsphys driver.  By
"should" I mean "please do."

Cheers,

a


On Wed, 16 Jan 2008 12:11:46 -0800 "Vikas Saini"
<vikas.saini@onstor.com> wrote:

> Looks like there was a FP crash around same time as kernel panic. I
> will add this to defect.
> 
> 
> Jan 16 11:46:58 g12r10 : 1:2:evm:WARNING: 1089: [][]: 2 I/O retries
> for volume[g12r10-raj-vs2-vol01].
> Jan 16 11:47:01 g12r10 kernel: fp0: 1090: [][]: 2 I/O retries for
> volume[g12r10-raj-vs2-vol01].
> Jan 16 11:47:01 g12r10 : 1:2:evm:WARNING: 1090: [][]: 2 I/O retries
> for volume[g12r10-raj-vs2-vol01].
> Jan 16 11:47:15 g12r10 kernel: fp0: 1091: [][]: 3 I/O retries for
> volume[g12r10-raj-vs2-vol01].
> Jan 16 11:47:15 g12r10 : 1:2:evm:WARNING: 1091: [][]: 3 I/O retries
> for volume[g12r10-raj-vs2-vol01].
> Jan 16 11:47:24 g12r10 kernel: tx0: passive crash handler: continuing
> in debug mode
> Jan 16 11:47:24 g12r10 kernel: tx1: passive crash handler: continuing
> in debug mode
> Jan 16 11:47:24 g12r10 kernel: fp0: passive crash handler: continuing
> in debug mode
> Jan 16 11:47:24 g12r10 kernel: fp0: cpu 5 crashed, core 3
> Jan 16 11:47:24 g12r10 kernel: fp1: passive crash handler: continuing
> in debug mode
> Jan 16 11:47:24 g12r10 kernel: fp1: cpu 5 crashed, core 3
> Jan 16 11:47:24 g12r10 kernel: tx0: cpu 5 crashed, core 3
> Jan 16 11:47:24 g12r10 kernel: tx1: cpu 5 crashed, core 3
> Jan 16 11:47:24 g12r10 kernel: tx0: Autoreboot "off", stopping in
> debug mode
> Jan 16 11:47:24 g12r10 kernel: tx1: Autoreboot "off", stopping in
> debug mode
> Jan 16 11:47:24 g12r10 kernel: fp0: Autoreboot "off", stopping in
> debug mode
> Jan 16 11:47:24 g12r10 kernel: fp1: Autoreboot "off", stopping in
> debug mode
> Jan 16 11:47:24 g12r10 kernel: fp3:
> Jan 16 11:47:24 g12r10 kernel: fp3:
> Jan 16 11:47:24 g12r10 kernel: fp3: Exception Cause = Watchdog
> Timeout/NMI
> Jan 16 11:47:24 g12r10 kernel: fp3: ERREPC:   0xffffffff8305261c
> Jan 16 11:47:24 g12r10 kernel: fp3: RA:       0xffffffff83050234
> Jan 16 11:47:24 g12r10 kernel: fp3: SR:       0x20086fa1
> Jan 16 11:47:24 g12r10 kernel: fp3: NMI : Watchdog Timeout NMI
> Jan 16 11:47:24 g12r10 kernel: fp3: Image Version : NFP_FP :
> R4.0.0.DEV-CGDBG-011408 : Mon Jan 14 12:52:22 2008
> Jan 16 11:47:24 g12r10 kernel: fp3: PROM Version  : PROM_SIBYTE_CG :
> Cougar-prom-1.0.2 : Fri Dec 14 11:53:50 2007
> Jan 16 11:47:24 g12r10 kernel: fp3: Boot Time  : Wed Jan 16 03:13:28
> GMT 2008
> Jan 16 11:47:24 g12r10 kernel: fp3: Crash Time : Wed Jan 16 11:47:20
> GMT 2008
> Jan 16 11:47:24 g12r10 kernel: fp3: core_dump_open 0 0
> Jan 16 11:47:24 g12r10 kernel: fp3: Can not open core file
> Jan 16 11:47:24 g12r10 kernel: fp3: Autoreboot "off", stopping in
> debug mode
> 
> 
> -----Original Message-----
> From: Andy Sharp 
> Sent: Wednesday, January 16, 2008 12:10 PM
> To: Vikas Saini
> Cc: dl-Cougar
> Subject: Re: kernel panic ?
> 
> The panic should have been fully logged in /var/log/messages.  But the
> error is plain to see: the management bus code has gotten ahold of a
> bogus address.  But file a bug and include the stack trace
> from /var/log/messages.  If you need assistance, you know where my
> doorbell is ~:^)
> 
> Cheers,
> 
> a
> 
> On Wed, 16 Jan 2008 12:05:00 -0800 "Vikas Saini"
> <vikas.saini@onstor.com> wrote:
> 
> > Do we have any logs which can provide more info about it.
> > 
> >  
> > 
> > Thanks
> > 
> > Vikas
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > Jan 16 11:46:01 g12r10 : 1:2:evm:WARNING: 1079: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:02 g12r10 : 1:2:evm:WARNING: 1080: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:13 g12r10 : 1:2:evm:WARNING: 1081: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:28 g12r10 : 1:2:evm:WARNING: 1082: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:35 g12r10 : 1:2:evm:WARNING: 1083: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:40 g12r10 : 1:2:evm:WARNING: 1084: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:44 g12r10 : 1:2:evm:WARNING: 1085: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:45 g12r10 : 1:2:evm:WARNING: 1086: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:51 g12r10 : 1:2:evm:WARNING: 1087: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:55 g12r10 : 1:2:evm:WARNING: 1088: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:46:58 g12r10 : 1:2:evm:WARNING: 1089: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:47:01 g12r10 : 1:2:evm:WARNING: 1090: [][]: 2 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Jan 16 11:47:15 g12r10 : 1:2:evm:WARNING: 1091: [][]: 3 I/O retries
> > for volume[g12r10-raj-vs2-vol01].
> > 
> > Kernel panic - not syncing: MGMTBUS_PHYS2VIRT: invalid phys addr 1
> > 
> > Rebooting in 5 seconds..<2>SiByte Watchdog in danger of initiating
> > system reset in 3.6 seconds
> > 
> > SiByte Watchdog in danger of initiating system reset in 3.6 seconds
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > PowerOn Self Test........OK
> > 
> >  
> > 
> > Initializing System......please wait
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > PMON [SSC,EL,FP,64]
> > 
> > ONStor Inc. PROM_SIBYTE_CG : Cougar-prom-1.0.2 : Fri Dec 14 11:53:50
> > 2007
> > 
> > CPU type SB1125.  Rev 35  600 MHz
> > 
> > module: SSC, Slot 0, CPU 0
> > 
> > Memory size 512 MB.
> > 
> > Icache size  32 KB, 32/line (4 way)
> > 
> > Dcache size  32 KB, 32/line (4 way)
> > 
> > Scache size 256 KB, 32/line  (4 way)
> > 
> > debug IP addr = 10.2.10.12
> > 
> > debug IP mask = 255.255.0.0
> > 
> >  
> > 
> >  
> > 
> > Initializing Autoloader, hit control-E to bypass
> > 
> >
> ........................................................................
> > ........
> > 
> >  
> > 
> > Type ctrl-e to stop autoload.
> > 
> > Waiting for SSC to enter autoload init state...done.
> > 
> > tftp server addr = 10.2.0.4
> > 
> > load, ip addr = 0xa020004, fname = 10.2.10.12/vmlinux.bin
> > 
> > loading 10.2.10.12/vmlinux.bin from 0xa020004 at 0xffffffff82000000
> > 
> > tpl_findBindCb prot=17 lport=9736
> > 
> > tpl_allocBindCb
> > 
> > tftp_tplAddConnInd
> > 
> > TFTP transfer completed.
> > 
> > tftp_tplDelConnInd
> > 
> > binary load, 3079568 bytes (0x2efd90)
> > 
> > Type ctrl-e to stop autoload.
> > 
> > Waiting for TXRX to enter autoload init state...done.
> > 
> > tftp server addr = 10.2.0.4
> > 
> > load, ip addr = 0xa020004, fname = 10.2.10.12/txrx_cg.bin
> > 
> > loading 10.2.10.12/txrx_cg.bin from 0xa020004 at 0x42000000
> > 
> > tftp_tplAddConnInd
> > 
> > TFTP transfer completed.
> > 
> > tftp_tplDelConnInd
> > 
> > binary load, 10047456 bytes (0x994fe0)
> > 
