X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C85871.C247BFA9@onstor-exch02.onstor.net>; Wed, 16 Jan 2008 11:58:19 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: g12r10 DOWN...need help 
Date: Wed, 16 Jan 2008 11:58:19 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E07AE229F@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E07AE219F@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: g12r10 DOWN...need help 
Thread-Index: AchX6V9nBxLGefT0RFOdw9v4iG994gAAY6AAAAC1Z8IAAE85wAAdenIgAAMj7nA=
From: "Tim Gardner" <tim.gardner@onstor.com>
To: "Brian Stark" <brian.stark@onstor.com>,
	"Vikas Saini" <vikas.saini@onstor.com>,
	"Larry Scheer" <larry.scheer@onstor.com>,
	"Sandrine Boulanger" <sandrine.boulanger@onstor.com>,
	"dl-Cougar" <dl-Cougar@onstor.com>

When BSD crashes on bobcat, doesn't it write some into to the prom that
is then read when the system is rebooted and stuffed in a file?
Perhaps there is some remnant of this remaining in cougar that is
causing the corruption. Has this seep corruption always occurred after
an SSC crash?

> -----Original Message-----
> From: Brian Stark
> Sent: Wednesday, January 16, 2008 9:41 AM
> To: Vikas Saini; Larry Scheer; Sandrine Boulanger; dl-Cougar
> Subject: RE: g12r10 DOWN...need help
>=20
> We've seen this problem with a corrupted SEEP a few times before.  The =
SSC
> is the only piece of hardware that has control over the board level =
SEEP.
> The corruption is either happening as a result of the crash or the =
reboot.
> If it's happening as a result of the crash, then I think we need to =
look
> at chassisd.  If it's happening after the reboot, then it would be
> something with PROM.  However, we've looked at the PROM code, and the =
only
> time writes are done to the SEEP is with explicit commands like 'seep
> program' or 'env set'.  Also, we reboot systems a lot and haven't seen
> this issue.
>=20
> In order to get the systems back up seamlessly after this happens, the
> SEEP needs to be reprogrammed in PROM with 'seep program'.  The same
> contents should be entered, particularly the MAC address.  I would =
suggest
> that we do a 'seep view' from PROM on all the systems and then store =
the
> results away somewhere that everyone can access.  This will make it =
much
> easier to reconstruct the SEEP when the problem is seen again.
>=20
>=20
> Brian
>=20
>=20
> > -----Original Message-----
> > From: Vikas Saini
> > Sent: Tuesday, January 15, 2008 7:23 PM
> > To: Larry Scheer; Sandrine Boulanger; dl-Cougar
> > Subject: RE: g12r10 DOWN...need help
> >
> > G12r10 is back... thanks to Brian and Tim...
> >
> > Thanks
> > Vikas
> >
> >
> > -----Original Message-----
> > From: Larry Scheer
> > Sent: Tuesday, January 15, 2008 7:13 PM
> > To: Sandrine Boulanger; Vikas Saini; dl-Cougar
> > Subject: RE: g12r10 DOWN...need help
> >
> > I can reprogram the seep for you.
> >
> > Send me email ASAP if you DON"T want me to reprogram the seep.
> >
> > BTW:
> > I have rebooted cougars over and over, even g12r10, and never
> > seen this happen.Can anyone tell us what was running just
> > before the reboot or what prompted the user to reboot or if
> > the reboot was automatically done due to a watchdog or whatever.
> >
> > Thanks,
> >
> > Larry
> >
> >
> > -----Original Message-----
> > From: Sandrine Boulanger
> > Sent: Tue 1/15/2008 6:54 PM
> > To: Vikas Saini; dl-Cougar
> > Subject: RE: g12r10 DOWN...need help
> >
> > It's scary that a simple reboot would mess up the seep. I
> > think this has happened to John K as well.
> >
> > Can anyone explain how this can happen?
> >
> >
> >
> > ________________________________
> >
> > From: Vikas Saini
> > Sent: Tuesday, January 15, 2008 6:42 PM
> > To: dl-Cougar
> > Subject: g12r10 DOWN...need help
> > Importance: High
> >
> >
> >
> > I rebooted g12r10 to recover from FP crash and g12r10 went to
> > SSC debug PROM. Seep output is totally messed out. Need help
> > from Dev in fixing it.
> >
> >
> >
> >
> >
> > PowerOn S
> >
> > DEBUG-PROM-SSC:D
> >
> > DEBUG-PROM-SSC:B
> >
> > DEBUG-PROM-SSC:D
> >
> > DEBUG-PROM-SSC:D
> >
> > DEBUG-PROM-SSC:D
> >
> > DEBUG-PROM-SSC:
> >
> >
> >
> > DEBUG-PROM-SSC:Dreboot
> >
> > Rebooting ...
> >
> >
> >
> >
> >
> >
> >
> > PowerOn Self Test........OK
> >
> >
> >
> > Initializing System......please wait
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > PMON [SSC,EL,FP,64]
> >
> > ONStor Inc. PROM_SIBYTE_CG : Cougar-prom-1.0.2 : Fri Dec 14
> > 11:53:50 2007
> >
> > CPU type SB1125.  Rev 35  600 MHz
> >
> > module: SSC, Slot 0, CPU 0
> >
> > Memory size 512 MB.
> >
> > Icache size  32 KB, 32/line (4 way)
> >
> > Dcache size  32 KB, 32/line (4 way)
> >
> > Scache size 256 KB, 32/line  (4 way)
> >
> > debug IP addr =3D 10.2.10.12
> >
> > debug IP mask =3D 255*=E02E
> >
> >
> >
> >
> >
> > Initializing Autoloader, hit control-E to bypass
> >
> > ..............................................................
> > ..................
> >
> >
> >
> > Type ctrl-e to stop autoload.
> >
> > Waiting for SSC to enter autoload init state...done.
> >
> > tftp server addr =3D 0.209.0.2
> >
> > load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/vmlinux.bin
> >
> > loading 10.2.10.12/vmlinux.bin from 0xd10002 at 0xffffffff82000000
> >
> > tpl_findBindCb prot=3D17 lport=3D9736
> >
> > tpl_allocBindCb
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > TFTP Timeout.
> >
> > binary load, 0 bytes (0x0)
> >
> > Error loading from network
> >
> > usage: load [ip addr] <fname>
> >
> > Type ctrl-e to stop autoload.
> >
> > Waiting for TXRX to enter autoload init state...done.
> >
> > tftp server addr =3D 0.209.0.2
> >
> > load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/txrx_cg.bin
> >
> > loading 10.2.10.12/txrx_cg.bin from 0xd10002 at 0x42000000
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > TFTP Timeout.
> >
> > binary load, 0 bytes (0x0)
> >
> > Error loading from network
> >
> > usage: load [ip addr] <fname>
> >
> > Type ctrl-e to stop autoload.
> >
> > Waiting for FP to enter autoload init state...done.
> >
> > tftp server addr =3D 0.209.0.2
> >
> > load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/fp_cg.bin
> >
> > loading 10.2.10.12/fp_cg.bin from 0xd10002 at 0x44000000
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > File: bsdsock-api.c, Line: 742
> >
> > sosend failed, err num 65
> >
> >
> >
> > TFTP Timeout.
> >
> > binary load, 0 bytes (0x0)
> >
> > Error loading from network
> >
> > usage: load [ip addr] <fname>
> >
> >  -- Do BSD launch (argc =3D 0)
> >
> >
> >
> > MAC address in chassis seep is invalid
> >
> > MAC address in chassis seep is invalid
> >
> > env[0] =3D 80b70970:.cpuclock=3D600000000.
> >
> > env[1] =3D 80b709c0:.memsize=3D512.
> >
> > env[2] =3D 80b70a10:.osloadoptions=3DmAt.
> >
> > env[3] =3D 80b70a60:.boot=3Dcold.
> >
> > env[4] =3D 80b70ab0:.busclock=3D600.
> >
> > env[5] =3D 80b70b00:.ipaddr=3D10.2.10.12.
> >
> > env[6] =3D 80b70b50:.netmask=3D255*=E02E.
> >
> > env[7] =3D 80b70ba0:.macaddr0=3D.00:07:34:0a:0c:00.
> >
> > env[8] =3D 80b70bf0:.macaddr1=3D.00:07:34:0a:0c:01.
> >
> >  pointer to Prom Util routines =3D 0
> >
> >  Command should be  (addr)(argc, argv, env_strings,
> > ptr_prom_util_routines)
> >
> >
> >
> >
> >
> >
> >
> > PMON TLBS exception
> >
> > EPC:      0xffffffff84800024
> >
> > RA:       0xffffffff808762d0
> >
> > Cause:    3000800c
> >
> > BadVaddr: 0x0
> >
> > Autoreboot set "off", stopping in debug mode
> >
> > DEBUG-PROM-SSC:1 > seep view
> >
> > SEEP Info (sig=3DCB01)
> >
> >         Model Number:          =AA*
> >
> >         Board Serial Number:   0746050009
> >
> >         Board Revision:        3.0
> >
> >         Deviation:
> >
> >         Result of ICT:
> >
> >         Date of ICT:
> >
> >         Result of FT:
> >
> >         IP addr:               10.2.10.12
> >
> >         IP mask:               255*=E02E
> >
> >         MAC addr:              8cE
> >
> >         No of Reboots:         176
> >
> >         tftp_server_ip         0.209.0.2
> >
> > DEBUG-PROM-SSC:2 >
> >
> > DEBUG-PROM-SSC:2 >
> >
> >
> >
