AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Abdallah.Harb@lsi.com>,<Brian.Stark@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	27AEC73CFDE2EA41849ACAC11A0B39D5CD3013FA@cosmail03.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 1 Mar 2010 16:47:56 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Harb, Abdallah" <Abdallah.Harb@lsi.com>
Cc: "Stark, Brian" <Brian.Stark@lsi.com>
Subject: Re: SiByte Watchdog messages
Message-ID: <20100301164756.67bb91f9@ripper.onstor.net>
In-Reply-To: <27AEC73CFDE2EA41849ACAC11A0B39D5CD3013FA@cosmail03.lsi.com>
References: <27AEC73CFDE2EA41849ACAC11A0B39D504032A7F@cosmail03.lsi.com>
	<E1EC65251D4B3D46BBC0AAA3C0629222B239293E@cosmail02.lsi.com>
	<27AEC73CFDE2EA41849ACAC11A0B39D5CD3013BF@cosmail03.lsi.com>
	<2E4A140D742C3B4E911151A30C39CFE10DDA1244@cosmail03.lsi.com>
	<27AEC73CFDE2EA41849ACAC11A0B39D5CD3013FA@cosmail03.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

So what's the trick to getting it to do it's thing?  I logged in and it
wasn't putting out that message, but I forgot to check if chassisd was
running before I installed my kernel and rebooted.  So far I've
rebooted 3 times and nothing.


On Thu, 18 Feb 2010 12:36:05 -0700 "Harb, Abdallah"
<Abdallah.Harb@lsi.com> wrote:

> Andy,
>=20
> I was told that you'll be helping us debugging the SiByte watchdog
> messages. The following are the connections to a Cougar unit in the
> HW lab that is constantly showing the failure.
>=20
> Power Sentry: 10.0.20.15 port# 2.
> Top board console: 10.0.20.11 2002
> Top board IP address: 10.0.20.102
> Bottom board console: 10.0.20.11 2001
> Bottom board IP address: 10.0.20.101
>=20
> I also have another Cougar 2U in the HW lab that shows the same
> failure. Let me know if you need access to this 2nd unit as well.
>=20
> Thanks,
> Abdallah
>  =20
> ________________________________________
> From: Harb, Abdallah
> Sent: Friday, February 12, 2010 6:39 PM
> To: Stark, Brian; Fong, Rendell
> Subject: SiByte Watchdog messages
>=20
> Good evening,
>=20
> This is a follow up to our conversation regarding the SiByte watch
> dog messages that we had yesterday. Today, I tried to characterize
> the failure using a good chassis, a good Mezzanine board, and two
> suspected motherboards. At the end of the day, I had so many pages of
> experiment notes, but unfortunately, it's hard to draw any meaningful
> conclusion out of it. In a nutshell, slot location seems to be
> irrelevant to triggering the failure, nor ejecting or inserting a
> motherboard from the chassis.
>=20
> Next week, I will continue with this experiment using another failed
> unit from Venture. I hope that I will have more meaningful results
> than the one below.
>=20
> Here's a summary of my experiment that I conducted today:
> I used a known good chassis, a known good Mezzanine card, and two
> suspected motherboards.
>=20
> Experiment #1
> Top slot - board IP: 10.0.20.101
> Bottom slot - board IP: 10.0.20.102
>=20
> When both boards were inserted:
> =E2=80=A2 Top board came up and showed continuous SiByte messages.
> =E2=80=A2 Bottom board came up OK, but showed only one SiByte message (1.0
> sec).
>=20
> When bottom board was ejected:
> =E2=80=A2 Top board came up OK, No messages.
>=20
> When top board was ejected:
> =E2=80=A2 Bottom board came up OK, No messages.
>=20
> Experiment #2
> (Swapped slot locations)
> Top slot - board IP: 10.0.20.102
> Bottom slot - board IP: 10.0.20.101
>=20
> When both boards were inserted:
> =E2=80=A2 Top board came up OK, but showed only one SiByte message (1.0 s=
ec).
> =E2=80=A2 Bottom board came up OK, No messages.
> When this step was repeated:
> =E2=80=A2 Top board came up OK, No messages.
> =E2=80=A2 Bottom board came up and showed continuous SiByte messages (0.9
> sec).
>=20
> When bottom board was ejected:
> =E2=80=A2 Top board came up OK, No messages.
>=20
> When top board was ejected:
> =E2=80=A2 Bottom board came up OK, No messages.
>=20
> Please don't conclude that the failure only follows the board with IP
> 10.0.20.101, because the other board also reported continuous SiByte
> messages during another set of experiment.
>=20
> Regards,
> Abdallah
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
>=20
> -----Original Message-----
> From: Harb, Abdallah
> Sent: Tuesday, January 19, 2010 7:23 PM
> To: Stark, Brian
> Subject: RE: SiByte Watchdog messages
>=20
> Brian,
>=20
> Unfortunately, I didn't get a chance to work on it back in October,
> but I worked on it today. I tried all of the following debug tests
> listed in your email, and here are my findings:
>=20
> Q: Do the messages go away if sysdvt is halted?
> A: No.
>=20
> Q: If the motherboards are swapped, do the SiByte messages follow the
> board, stay on the same slot, or go away? A: The SiByte messages
> follow the board.
>=20
> Q: If the mezzanine board is swapped out, do the SiByte messages go
> away? A: Yes, the SiByte messages go away.
>=20
> Q: If the motherboard showing the problem is moved to another
> chassis, do the SiByte messages go away? A: Yes, the SiByte messages
> go away.
>=20
> The mezzanine board seems to be the source of the failure, but the
> question is why would the SiByte messages show only on one
> motherboard not on any other board? And if it shows on one
> motherboard it always follow that specific motherboard regardless of
> its slot number in the chassis, as long as the suspect mezzanine
> board is used.
>=20
> Tomorrow morning, I will be at Venture, and then at ONStor in the
> afternoon. Please let me know if there's anything else that I should
> try?
>=20
> Regards,
> Abdallah
>=20
>=20
