X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C86DE1.C5E74E30@onstor-exch02.onstor.net>; Tue, 12 Feb 2008 18:43:03 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Manohar Divate has entered a new call into the Help Desk
Date: Tue, 12 Feb 2008 18:43:03 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E050CF9D7@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E050CF9D5@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Manohar Divate has entered a new call into the Help Desk
Thread-Index: Achs7J9s+TjzSbXHQqCl9s1RAZMadQACr9/gAAX9QCAAJoofUAAGKSlgAAfOmtA=
From: "Manohar Divate" <manohar.divate@onstor.com>
To: "Manohar Divate" <manohar.divate@onstor.com>,
	"Brian Stark" <brian.stark@onstor.com>,
	"John Rogers" <john.rogers@onstor.com>,
	"John VanderWerf" <john.vanderwerf@onstor.com>,
	"Vikas Saini" <vikas.saini@onstor.com>
Cc: "Sandrine Boulanger" <sandrine.boulanger@onstor.com>,
	"dl-Cougar" <dl-Cougar@onstor.com>

I think these are enough crash for today

Vikas - I need a different node to continue cluster testing


g9r74:~# cat /var/log/messages|grep crashed
Feb 12 13:27:10 g9r74 kernel: tx1: cpu 0 crashed, core 0
Feb 12 13:27:10 g9r74 kernel: fp1: cpu 0 crashed, core 0
Feb 12 13:27:10 g9r74 kernel: fp2: cpu 0 crashed, core 0
Feb 12 13:27:10 g9r74 kernel: fp3: cpu 0 crashed, core 0

Feb 12 15:46:56 g9r74 kernel: tx1: cpu 0 crashed, core 0
Feb 12 15:46:56 g9r74 kernel: fp1: cpu 0 crashed, core 0
Feb 12 15:46:56 g9r74 kernel: fp2: cpu 0 crashed, core 0
Feb 12 15:46:56 g9r74 kernel: fp3: cpu 0 crashed, core 0

Feb 12 16:25:48 g9r74 kernel: tx0: cpu 1 crashed, core 1
Feb 12 16:25:48 g9r74 kernel: fp1: cpu 1 crashed, core 1
Feb 12 16:25:48 g9r74 kernel: fp2: cpu 1 crashed, core 1
Feb 12 16:25:48 g9r74 kernel: fp3: cpu 1 crashed, core 1

Feb 12 17:18:05 g9r74 kernel: tx0: cpu 1 crashed, core 1
Feb 12 17:18:05 g9r74 kernel: fp3: cpu 1 crashed, core 1
Feb 12 17:18:05 g9r74 kernel: fp1: cpu 1 crashed, core 1
Feb 12 17:18:05 g9r74 kernel: fp2: cpu 1 crashed, core 1

-----Original Message-----
From: Manohar Divate=20
Sent: Tuesday, February 12, 2008 2:00 PM
To: Brian Stark; John Rogers; John VanderWerf; Vikas Saini
Cc: Manohar Divate
Subject: RE: Manohar Divate has entered a new call into the Help Desk
Importance: High

another txrx crash on the same node g9r74=20

I have opened 2 defects 222286 & 22294

Can somebody help to isolate whether this is SW/HW issue=20

Thanks
manny


Remote MIPS debugging using 10.2.74.9:61231
0x83003644 in eee_alloc_buf_ra (pool=3D0xffffffff839d6fc8,
ra=3D18446744071612627872) at eee-desc.c:483
483     eee-desc.c: No such file or directory.
        in eee-desc.c
(gdb) bt
#0  0x83003644 in eee_alloc_buf_ra (pool=3D0xffffffff839d6fc8,
ra=3D18446744071612627872) at eee-desc.c:483
#1  0x830035c8 in eee_alloc_buf (pool=3D0xffffffff839d6fc8) at
eee-desc.c:468
#2  0x83037ba0 in pkt_allocPkt () at pkt-alloc.c:72
#3  0x830267cc in tpl_allocPkt (so=3D0x1007189300) at tpl-alloc.c:82
#4  0x8307d87c in tcp_send_fast_ack (tp=3D0x1007189448,
pConn=3D0x1003d80000, fastpath=3D1) at tcp_input.c:3076
#5  0x83079658 in tcp_inputC (ptr=3D0x2037dbbe00, iphlen=3D20,
pconn=3D0x1003d80000, incb=3D0x1007189300, ip=3D0x2037dbbe58,
    ret=3D0xffffffff854b5a68, parm=3D0x0) at tcp_input.c:982
#6  0x83030488 in tpl_rcvPkt (pConnId=3D0x1003d80000, =
pDesc=3D0x2037dbbe00)
at tpl-rcv.c:635
#7  0x83063e14 in vstack_link_input (ln=3D0x100725fc00,
pDesc=3D0x2037dbbe00) at vstack_vlink.c:1801
#8  0x83063a8c in vl_input (ln=3D0x100725fc00, pDesc=3D0x2037dbbe00) at
vstack_vlink.c:1679
#9  0x83021f9c in luc_rcvPkt (pDesc=3D0x2037dbbe00, pktLen=3D1526,
dw1=3D9802139842918465090, dw2=3D1718741460562602305, linkId=3D0)
    at luc-rx.c:562
#10 0x83022ea0 in bmc12500Eth_rxPkt_net (rxc=3D0x10052afa68,
cfg=3D0x10052af780, loopCnt=3D8) at luc-rx.c:1120
#11 0x8344d878 in bmc12500Eth_rxPktPoll (cb=3D0x10052af780, tlimit=3D2) =
at
bmc12500-eth-rx.c:1478
#12 0x83013570 in eee_poll (num_loops=3D58) at eee-poll.c:551
#13 0x830ce3a0 in getchar () at serio-api.c:333
#14 0x830c3284 in get_line (p=3D0xffffffff854b5f08 "", usehist=3D1) at
hist.c:145
#15 0x830c39d0 in get_input (p=3D0xffffffff854b5f08 "") at hist.c:259
#16 0x830c3a08 in get_cmd (p=3D0xffffffff854b5f08 "") at hist.c:284
#17 0x830ccd80 in runtime_prompt () at test.c:558
#18 0x830ccca4 in _main () at test.c:541
(gdb) quit

-----Original Message-----
From: Manohar Divate=20
Sent: Tuesday, February 12, 2008 11:05 AM
To: Brian Stark; John Rogers; John VanderWerf; Durairaj Muthusamy
Subject: RE: Manohar Divate has entered a new call into the Help Desk

Crashdumps =3DYes
We monitor rcons only after crash
Currently system is crashed and bt is



       in eee-desc.c
(gdb) bt
#0  0x83003644 in eee_alloc_buf_ra (pool=3D0xffffffff839d6fc8,
ra=3D18446744071617034568) at eee-desc.c:483
#1  0x83003f30 in eee_allocateBufferRA (mempool=3D1, buf_size=3D1, =
cos=3D0,
ra=3D18446744071617034568) at eee-desc.c:917
#2  0x83003eac in eee_allocateBuffer (mempool=3D1, buf_size=3D1, =
cos=3D0) at
eee-desc.c:899
#3  0x8346b948 in mgmtBus_rxPacket (cb=3D0x100527ff00, tlimit=3D2) at
mgmt-bus-emb.c:451
#4  0x83013570 in eee_poll (num_loops=3D15) at eee-poll.c:551
#5  0x830ce3a0 in getchar () at serio-api.c:333
#6  0x830c3284 in get_line (p=3D0xffffffff854b5f08 "", usehist=3D1) at
hist.c:145
#7  0x830c39d0 in get_input (p=3D0xffffffff854b5f08 "") at hist.c:259
#8  0x830c3a08 in get_cmd (p=3D0xffffffff854b5f08 "") at hist.c:284
#9  0x830ccd80 in runtime_prompt () at test.c:558
#10 0x830ccca4 in _main () at test.c:541

-Manny
-----Original Message-----
From: Brian Stark=20
Sent: Monday, February 11, 2008 5:40 PM
To: John Rogers; Manohar Divate; John VanderWerf; Durairaj Muthusamy
Subject: RE: Manohar Divate has entered a new call into the Help Desk

Gonna need some more information here.  To start with, are there
crashdumps?  Also, what are the rcons showing for the TXRX and FP?=20


Brian


> -----Original Message-----
> From: John Rogers=20
> Sent: Monday, February 11, 2008 1:46 PM
> To: Brian Stark; Manohar Divate; John VanderWerf; Durairaj Muthusamy
> Subject: RE: Manohar Divate has entered a new call into the Help Desk
>=20
> Hi Brian,
>=20
> One of the cougar systems in the lab seems to be having some=20
> trouble keeping the cpu's online.
>=20
>=20
>=20
> -----Original Message-----
> From: manohar.divate@onstor.com [mailto:manohar.divate@onstor.com]
> Sent: Monday, February 11, 2008 12:28 PM
> To: John Rogers; John VanderWerf; Durairaj Muthusamy
> Subject: Manohar Divate has entered a new call into the Help Desk
>=20
> HelpDesk
> Customer =3D Manohar Divate
> Category =3D Lab related requests
> ********************************************
>=20
>=20
> g9r74 - cpu's down
>=20
> Reboot's does not resolve everytime
>=20
> 02/11/08 12:24:16 g9r74 VS_MGMT_1887 diag> syst show chassis
>=20
>  module     cpu         state
> ----------------------------------------------
>  SSC        SSC         UP
>  NFPNIM     TXRX0       DOWN
>             TXRX1       DOWN
>             FP0         DOWN
>             FP1         DOWN
>             FP2         DOWN
>             FP3         DOWN
> ----------------------------------------------
>=20
>=20
> ********************************************
> Reference Number: 1202761528
>=20
>=20
