AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080116112603.1da37447@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<tim.gardner@onstor.com>,<brian.stark@onstor.com>,<vikas.saini@onstor.com>,<larry.scheer@onstor.com>,<sandrine.boulanger@onstor.com>,<dl-Cougar@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E07AE229F@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 16 Jan 2008 11:26:28 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Tim Gardner" <tim.gardner@onstor.com>
Cc: "Brian Stark" <brian.stark@onstor.com>, "Vikas Saini"
 <vikas.saini@onstor.com>, "Larry Scheer" <larry.scheer@onstor.com>,
 "Sandrine Boulanger" <sandrine.boulanger@onstor.com>, "dl-Cougar"
 <dl-Cougar@onstor.com>
Subject: Re: g12r10 DOWN...need help
Message-ID: <20080116112628.319708e3@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E07AE229F@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E07AE219F@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E07AE229F@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Wed, 16 Jan 2008 10:58:19 -0800 "Tim Gardner"
<tim.gardner@onstor.com> wrote:

> When BSD crashes on bobcat, doesn't it write some into to the prom
> that is then read when the system is rebooted and stuffed in a file?
> Perhaps there is some remnant of this remaining in cougar that is
> causing the corruption. Has this seep corruption always occurred after
> an SSC crash?

The SSC didn't crash, the FP crashed.  Perhaps chassisd or
something else on the SSC is happening at that point that's wiping the
seep.

> > -----Original Message-----
> > From: Brian Stark
> > Sent: Wednesday, January 16, 2008 9:41 AM
> > To: Vikas Saini; Larry Scheer; Sandrine Boulanger; dl-Cougar
> > Subject: RE: g12r10 DOWN...need help
> >=20
> > We've seen this problem with a corrupted SEEP a few times before.
> > The SSC is the only piece of hardware that has control over the
> > board level SEEP. The corruption is either happening as a result of
> > the crash or the reboot. If it's happening as a result of the
> > crash, then I think we need to look at chassisd.  If it's happening
> > after the reboot, then it would be something with PROM.  However,
> > we've looked at the PROM code, and the only time writes are done to
> > the SEEP is with explicit commands like 'seep program' or 'env
> > set'.  Also, we reboot systems a lot and haven't seen this issue.
> >=20
> > In order to get the systems back up seamlessly after this happens,
> > the SEEP needs to be reprogrammed in PROM with 'seep program'.  The
> > same contents should be entered, particularly the MAC address.  I
> > would suggest that we do a 'seep view' from PROM on all the systems
> > and then store the results away somewhere that everyone can
> > access.  This will make it much easier to reconstruct the SEEP when
> > the problem is seen again.
> >=20
> >=20
> > Brian
> >=20
> >=20
> > > -----Original Message-----
> > > From: Vikas Saini
> > > Sent: Tuesday, January 15, 2008 7:23 PM
> > > To: Larry Scheer; Sandrine Boulanger; dl-Cougar
> > > Subject: RE: g12r10 DOWN...need help
> > >
> > > G12r10 is back... thanks to Brian and Tim...
> > >
> > > Thanks
> > > Vikas
> > >
> > >
> > > -----Original Message-----
> > > From: Larry Scheer
> > > Sent: Tuesday, January 15, 2008 7:13 PM
> > > To: Sandrine Boulanger; Vikas Saini; dl-Cougar
> > > Subject: RE: g12r10 DOWN...need help
> > >
> > > I can reprogram the seep for you.
> > >
> > > Send me email ASAP if you DON"T want me to reprogram the seep.
> > >
> > > BTW:
> > > I have rebooted cougars over and over, even g12r10, and never
> > > seen this happen.Can anyone tell us what was running just
> > > before the reboot or what prompted the user to reboot or if
> > > the reboot was automatically done due to a watchdog or whatever.
> > >
> > > Thanks,
> > >
> > > Larry
> > >
> > >
> > > -----Original Message-----
> > > From: Sandrine Boulanger
> > > Sent: Tue 1/15/2008 6:54 PM
> > > To: Vikas Saini; dl-Cougar
> > > Subject: RE: g12r10 DOWN...need help
> > >
> > > It's scary that a simple reboot would mess up the seep. I
> > > think this has happened to John K as well.
> > >
> > > Can anyone explain how this can happen?
> > >
> > >
> > >
> > > ________________________________
> > >
> > > From: Vikas Saini
> > > Sent: Tuesday, January 15, 2008 6:42 PM
> > > To: dl-Cougar
> > > Subject: g12r10 DOWN...need help
> > > Importance: High
> > >
> > >
> > >
> > > I rebooted g12r10 to recover from FP crash and g12r10 went to
> > > SSC debug PROM. Seep output is totally messed out. Need help
> > > from Dev in fixing it.
> > >
> > >
> > >
> > >
> > >
> > > PowerOn S
> > >
> > > DEBUG-PROM-SSC:D
> > >
> > > DEBUG-PROM-SSC:B
> > >
> > > DEBUG-PROM-SSC:D
> > >
> > > DEBUG-PROM-SSC:D
> > >
> > > DEBUG-PROM-SSC:D
> > >
> > > DEBUG-PROM-SSC:
> > >
> > >
> > >
> > > DEBUG-PROM-SSC:Dreboot
> > >
> > > Rebooting ...
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > PowerOn Self Test........OK
> > >
> > >
> > >
> > > Initializing System......please wait
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > PMON [SSC,EL,FP,64]
> > >
> > > ONStor Inc. PROM_SIBYTE_CG : Cougar-prom-1.0.2 : Fri Dec 14
> > > 11:53:50 2007
> > >
> > > CPU type SB1125.  Rev 35  600 MHz
> > >
> > > module: SSC, Slot 0, CPU 0
> > >
> > > Memory size 512 MB.
> > >
> > > Icache size  32 KB, 32/line (4 way)
> > >
> > > Dcache size  32 KB, 32/line (4 way)
> > >
> > > Scache size 256 KB, 32/line  (4 way)
> > >
> > > debug IP addr =3D 10.2.10.12
> > >
> > > debug IP mask =3D 255*=C3=A02E
> > >
> > >
> > >
> > >
> > >
> > > Initializing Autoloader, hit control-E to bypass
> > >
> > > ..............................................................
> > > ..................
> > >
> > >
> > >
> > > Type ctrl-e to stop autoload.
> > >
> > > Waiting for SSC to enter autoload init state...done.
> > >
> > > tftp server addr =3D 0.209.0.2
> > >
> > > load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/vmlinux.bin
> > >
> > > loading 10.2.10.12/vmlinux.bin from 0xd10002 at 0xffffffff82000000
> > >
> > > tpl_findBindCb prot=3D17 lport=3D9736
> > >
> > > tpl_allocBindCb
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > TFTP Timeout.
> > >
> > > binary load, 0 bytes (0x0)
> > >
> > > Error loading from network
> > >
> > > usage: load [ip addr] <fname>
> > >
> > > Type ctrl-e to stop autoload.
> > >
> > > Waiting for TXRX to enter autoload init state...done.
> > >
> > > tftp server addr =3D 0.209.0.2
> > >
> > > load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/txrx_cg.bin
> > >
> > > loading 10.2.10.12/txrx_cg.bin from 0xd10002 at 0x42000000
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > TFTP Timeout.
> > >
> > > binary load, 0 bytes (0x0)
> > >
> > > Error loading from network
> > >
> > > usage: load [ip addr] <fname>
> > >
> > > Type ctrl-e to stop autoload.
> > >
> > > Waiting for FP to enter autoload init state...done.
> > >
> > > tftp server addr =3D 0.209.0.2
> > >
> > > load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/fp_cg.bin
> > >
> > > loading 10.2.10.12/fp_cg.bin from 0xd10002 at 0x44000000
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > File: bsdsock-api.c, Line: 742
> > >
> > > sosend failed, err num 65
> > >
> > >
> > >
> > > TFTP Timeout.
> > >
> > > binary load, 0 bytes (0x0)
> > >
> > > Error loading from network
> > >
> > > usage: load [ip addr] <fname>
> > >
> > >  -- Do BSD launch (argc =3D 0)
> > >
> > >
> > >
> > > MAC address in chassis seep is invalid
> > >
> > > MAC address in chassis seep is invalid
> > >
> > > env[0] =3D 80b70970:.cpuclock=3D600000000.
> > >
> > > env[1] =3D 80b709c0:.memsize=3D512.
> > >
> > > env[2] =3D 80b70a10:.osloadoptions=3DmAt.
> > >
> > > env[3] =3D 80b70a60:.boot=3Dcold.
> > >
> > > env[4] =3D 80b70ab0:.busclock=3D600.
> > >
> > > env[5] =3D 80b70b00:.ipaddr=3D10.2.10.12.
> > >
> > > env[6] =3D 80b70b50:.netmask=3D255*=C3=A02E.
> > >
> > > env[7] =3D 80b70ba0:.macaddr0=3D.00:07:34:0a:0c:00.
> > >
> > > env[8] =3D 80b70bf0:.macaddr1=3D.00:07:34:0a:0c:01.
> > >
> > >  pointer to Prom Util routines =3D 0
> > >
> > >  Command should be  (addr)(argc, argv, env_strings,
> > > ptr_prom_util_routines)
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > PMON TLBS exception
> > >
> > > EPC:      0xffffffff84800024
> > >
> > > RA:       0xffffffff808762d0
> > >
> > > Cause:    3000800c
> > >
> > > BadVaddr: 0x0
> > >
> > > Autoreboot set "off", stopping in debug mode
> > >
> > > DEBUG-PROM-SSC:1 > seep view
> > >
> > > SEEP Info (sig=3DCB01)
> > >
> > >         Model Number:          =C2=AA*
> > >
> > >         Board Serial Number:   0746050009
> > >
> > >         Board Revision:        3.0
> > >
> > >         Deviation:
> > >
> > >         Result of ICT:
> > >
> > >         Date of ICT:
> > >
> > >         Result of FT:
> > >
> > >         IP addr:               10.2.10.12
> > >
> > >         IP mask:               255*=C3=A02E
> > >
> > >         MAC addr:              8cE
> > >
> > >         No of Reboots:         176
> > >
> > >         tftp_server_ip         0.209.0.2
> > >
> > > DEBUG-PROM-SSC:2 >
> > >
> > > DEBUG-PROM-SSC:2 >
> > >
> > >
> > >
