X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C85866.FB77941E@onstor-exch02.onstor.net>; Wed, 16 Jan 2008 10:41:10 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: g12r10 DOWN...need help 
Date: Wed, 16 Jan 2008 10:41:10 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E07AE219F@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E07AE1F5A@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: g12r10 DOWN...need help 
Thread-Index: AchX6V9nBxLGefT0RFOdw9v4iG994gAAY6AAAAC1Z8IAAE85wAAdenIg
References: <BB375AF679D4A34E9CA8DFA650E2B04E042F002C@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E07AE1F5A@onstor-exch02.onstor.net>
From: "Brian Stark" <brian.stark@onstor.com>
To: "Vikas Saini" <vikas.saini@onstor.com>,
	"Larry Scheer" <larry.scheer@onstor.com>,
	"Sandrine Boulanger" <sandrine.boulanger@onstor.com>,
	"dl-Cougar" <dl-Cougar@onstor.com>

We've seen this problem with a corrupted SEEP a few times before.  The =
SSC is the only piece of hardware that has control over the board level =
SEEP.  The corruption is either happening as a result of the crash or =
the reboot.  If it's happening as a result of the crash, then I think we =
need to look at chassisd.  If it's happening after the reboot, then it =
would be something with PROM.  However, we've looked at the PROM code, =
and the only time writes are done to the SEEP is with explicit commands =
like 'seep program' or 'env set'.  Also, we reboot systems a lot and =
haven't seen this issue.

In order to get the systems back up seamlessly after this happens, the =
SEEP needs to be reprogrammed in PROM with 'seep program'.  The same =
contents should be entered, particularly the MAC address.  I would =
suggest that we do a 'seep view' from PROM on all the systems and then =
store the results away somewhere that everyone can access.  This will =
make it much easier to reconstruct the SEEP when the problem is seen =
again.


Brian
=20

> -----Original Message-----
> From: Vikas Saini=20
> Sent: Tuesday, January 15, 2008 7:23 PM
> To: Larry Scheer; Sandrine Boulanger; dl-Cougar
> Subject: RE: g12r10 DOWN...need help=20
>=20
> G12r10 is back... thanks to Brian and Tim...
>=20
> Thanks
> Vikas
>=20
>=20
> -----Original Message-----
> From: Larry Scheer
> Sent: Tuesday, January 15, 2008 7:13 PM
> To: Sandrine Boulanger; Vikas Saini; dl-Cougar
> Subject: RE: g12r10 DOWN...need help=20
>=20
> I can reprogram the seep for you.
>=20
> Send me email ASAP if you DON"T want me to reprogram the seep.
>=20
> BTW:
> I have rebooted cougars over and over, even g12r10, and never=20
> seen this happen.Can anyone tell us what was running just=20
> before the reboot or what prompted the user to reboot or if=20
> the reboot was automatically done due to a watchdog or whatever.
>=20
> Thanks,
>=20
> Larry=20
>=20
>=20
> -----Original Message-----
> From: Sandrine Boulanger
> Sent: Tue 1/15/2008 6:54 PM
> To: Vikas Saini; dl-Cougar
> Subject: RE: g12r10 DOWN...need help=20
> =20
> It's scary that a simple reboot would mess up the seep. I=20
> think this has happened to John K as well.=20
>=20
> Can anyone explain how this can happen?
>=20
> =20
>=20
> ________________________________
>=20
> From: Vikas Saini
> Sent: Tuesday, January 15, 2008 6:42 PM
> To: dl-Cougar
> Subject: g12r10 DOWN...need help
> Importance: High
>=20
> =20
>=20
> I rebooted g12r10 to recover from FP crash and g12r10 went to=20
> SSC debug PROM. Seep output is totally messed out. Need help=20
> from Dev in fixing it.
>=20
> =20
>=20
> =20
>=20
> PowerOn S
>=20
> DEBUG-PROM-SSC:D
>=20
> DEBUG-PROM-SSC:B
>=20
> DEBUG-PROM-SSC:D
>=20
> DEBUG-PROM-SSC:D
>=20
> DEBUG-PROM-SSC:D
>=20
> DEBUG-PROM-SSC:
>=20
> =20
>=20
> DEBUG-PROM-SSC:Dreboot
>=20
> Rebooting ...
>=20
> =20
>=20
> =20
>=20
> =20
>=20
> PowerOn Self Test........OK
>=20
> =20
>=20
> Initializing System......please wait
>=20
> =20
>=20
> =20
>=20
> =20
>=20
> =20
>=20
> =20
>=20
> PMON [SSC,EL,FP,64]
>=20
> ONStor Inc. PROM_SIBYTE_CG : Cougar-prom-1.0.2 : Fri Dec 14=20
> 11:53:50 2007
>=20
> CPU type SB1125.  Rev 35  600 MHz
>=20
> module: SSC, Slot 0, CPU 0
>=20
> Memory size 512 MB.
>=20
> Icache size  32 KB, 32/line (4 way)
>=20
> Dcache size  32 KB, 32/line (4 way)
>=20
> Scache size 256 KB, 32/line  (4 way)
>=20
> debug IP addr =3D 10.2.10.12
>=20
> debug IP mask =3D 255*=E02E
>=20
> =20
>=20
> =20
>=20
> Initializing Autoloader, hit control-E to bypass
>=20
> ..............................................................
> ..................
>=20
> =20
>=20
> Type ctrl-e to stop autoload.
>=20
> Waiting for SSC to enter autoload init state...done.
>=20
> tftp server addr =3D 0.209.0.2
>=20
> load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/vmlinux.bin
>=20
> loading 10.2.10.12/vmlinux.bin from 0xd10002 at 0xffffffff82000000
>=20
> tpl_findBindCb prot=3D17 lport=3D9736
>=20
> tpl_allocBindCb
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> TFTP Timeout.
>=20
> binary load, 0 bytes (0x0)
>=20
> Error loading from network
>=20
> usage: load [ip addr] <fname>
>=20
> Type ctrl-e to stop autoload.
>=20
> Waiting for TXRX to enter autoload init state...done.
>=20
> tftp server addr =3D 0.209.0.2
>=20
> load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/txrx_cg.bin
>=20
> loading 10.2.10.12/txrx_cg.bin from 0xd10002 at 0x42000000
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> TFTP Timeout.
>=20
> binary load, 0 bytes (0x0)
>=20
> Error loading from network
>=20
> usage: load [ip addr] <fname>
>=20
> Type ctrl-e to stop autoload.
>=20
> Waiting for FP to enter autoload init state...done.
>=20
> tftp server addr =3D 0.209.0.2
>=20
> load, ip addr =3D 0xd10002, fname =3D 10.2.10.12/fp_cg.bin
>=20
> loading 10.2.10.12/fp_cg.bin from 0xd10002 at 0x44000000
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> File: bsdsock-api.c, Line: 742
>=20
> sosend failed, err num 65
>=20
> =20
>=20
> TFTP Timeout.
>=20
> binary load, 0 bytes (0x0)
>=20
> Error loading from network
>=20
> usage: load [ip addr] <fname>
>=20
>  -- Do BSD launch (argc =3D 0)
>=20
> =20
>=20
> MAC address in chassis seep is invalid
>=20
> MAC address in chassis seep is invalid
>=20
> env[0] =3D 80b70970:.cpuclock=3D600000000.
>=20
> env[1] =3D 80b709c0:.memsize=3D512.
>=20
> env[2] =3D 80b70a10:.osloadoptions=3DmAt.
>=20
> env[3] =3D 80b70a60:.boot=3Dcold.
>=20
> env[4] =3D 80b70ab0:.busclock=3D600.
>=20
> env[5] =3D 80b70b00:.ipaddr=3D10.2.10.12.
>=20
> env[6] =3D 80b70b50:.netmask=3D255*=E02E.
>=20
> env[7] =3D 80b70ba0:.macaddr0=3D.00:07:34:0a:0c:00.
>=20
> env[8] =3D 80b70bf0:.macaddr1=3D.00:07:34:0a:0c:01.
>=20
>  pointer to Prom Util routines =3D 0
>=20
>  Command should be  (addr)(argc, argv, env_strings,=20
> ptr_prom_util_routines)
>=20
> =20
>=20
> =20
>=20
> =20
>=20
> PMON TLBS exception
>=20
> EPC:      0xffffffff84800024
>=20
> RA:       0xffffffff808762d0
>=20
> Cause:    3000800c
>=20
> BadVaddr: 0x0
>=20
> Autoreboot set "off", stopping in debug mode
>=20
> DEBUG-PROM-SSC:1 > seep view
>=20
> SEEP Info (sig=3DCB01)
>=20
>         Model Number:          =AA*
>=20
>         Board Serial Number:   0746050009
>=20
>         Board Revision:        3.0
>=20
>         Deviation:
>=20
>         Result of ICT:
>=20
>         Date of ICT:
>=20
>         Result of FT:
>=20
>         IP addr:               10.2.10.12
>=20
>         IP mask:               255*=E02E
>=20
>         MAC addr:              8cE
>=20
>         No of Reboots:         176
>=20
>         tftp_server_ip         0.209.0.2
>=20
> DEBUG-PROM-SSC:2 >
>=20
> DEBUG-PROM-SSC:2 >
>=20
>=20
>=20
