AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Abdallah.Harb@lsi.com>,<Brian.Stark@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	27AEC73CFDE2EA41849ACAC11A0B39D5CD30141D@cosmail03.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 15 Mar 2010 18:13:18 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Harb, Abdallah" <Abdallah.Harb@lsi.com>
Cc: "Stark, Brian" <Brian.Stark@lsi.com>
Subject: Re: SiByte Watchdog messages
Message-ID: <20100315181318.3ac93d29@ripper.onstor.net>
In-Reply-To: <27AEC73CFDE2EA41849ACAC11A0B39D5CD30141D@cosmail03.lsi.com>
References: <27AEC73CFDE2EA41849ACAC11A0B39D504032A7F@cosmail03.lsi.com>
	<E1EC65251D4B3D46BBC0AAA3C0629222B239293E@cosmail02.lsi.com>
	<27AEC73CFDE2EA41849ACAC11A0B39D5CD3013BF@cosmail03.lsi.com>
	<2E4A140D742C3B4E911151A30C39CFE10DDA1244@cosmail03.lsi.com>
	<27AEC73CFDE2EA41849ACAC11A0B39D5CD3013FA@cosmail03.lsi.com>
	<20100301164756.67bb91f9@ripper.onstor.net>
	<27AEC73CFDE2EA41849ACAC11A0B39D5CD30140B@cosmail03.lsi.com>
	<20100301190016.6b8edf57@ripper.onstor.net>
	<27AEC73CFDE2EA41849ACAC11A0B39D5CD301419@cosmail03.lsi.com>
	<E1EC65251D4B3D46BBC0AAA3C0629222B281D633@cosmail02.lsi.com>
	<20100303181843.38b7ae5c@ripper.onstor.net>
	<27AEC73CFDE2EA41849ACAC11A0B39D5CD30141D@cosmail03.lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

OK, it looks like I got it.  I found a bunch of crazy bugs in the
driver.  I'm afraid to go back and see how many are in the shipping
version, but obviously some of them are, because that's what causes the
repeating message on these "special" systems.

Abdallah, you can take the /boot/vmlinux.bin off either of blades and
try it on that machine at Venture.  The one on 10.0.20.101 has all the
debug messages removed so you're not likely to get any extraneous
messages on the console if you use that one.  All the repeating sibyte
messages should be history.

Let me know if you have any questions.

I'll prepare a changelist for submitting.

Cheers,

a

On Wed, 3 Mar 2010 19:40:17 -0700 "Harb, Abdallah"
<Abdallah.Harb@lsi.com> wrote:

> Once done... I can test the fixed kernel using the other Cougar 2U in
> the HW lab and the Cougar 1U at Venture.
> ________________________________________ From: Andrew Sharp
> [andy.sharp@lsi.com] Sent: Wednesday, March 03, 2010 6:18 PM
> To: Stark, Brian
> Cc: Harb, Abdallah
> Subject: Re: SiByte Watchdog messages
> 
> I am writing a test program to test a couple of things before
> declaring victory.
> 
> On Wed, 3 Mar 2010 18:55:51 -0700 "Stark, Brian" <Brian.Stark@lsi.com>
> wrote:
> 
> > Does this mean we may have a fix?
> >
> >
> > -----Original Message-----
> > From: Harb, Abdallah
> > Sent: Wednesday, March 03, 2010 10:56 AM
> > To: Sharp, Andy
> > Cc: Sharp, Andy; Stark, Brian
> > Subject: RE: SiByte Watchdog messages
> >
> > I booted from the bottom CF and then the SiByte messages started
> > showing up again. I released both consoles, you can re-connect to
> > it. ________________________________________
> > From: Andrew Sharp [andy.sharp@lsi.com]
> > Sent: Monday, March 01, 2010 7:00 PM
> > To: Harb, Abdallah
> > Cc: Sharp, Andy; Stark, Brian
> > Subject: Re: SiByte Watchdog messages
> >
> > On Mon, 1 Mar 2010 18:07:02 -0700 "Harb, Abdallah"
> > <Abdallah.Harb@lsi.com> wrote:
> >
> > > Are you looking at both blades? or just one?
> >
> > Both.
> >
> > > I'm at Venture this afternoon.
> > > If you're unable to reproduce the failure by tomorrow morning,
> > > then I do the trick to get it to fail.
> >
> > I eagerly await the trick.
> >
> >
> > >
> > > ________________________________________
> > > From: Andrew Sharp [andy.sharp@lsi.com]
> > > Sent: Monday, March 01, 2010 4:47 PM
> > > To: Harb, Abdallah
> > > Cc: Stark, Brian
> > > Subject: Re: SiByte Watchdog messages
> > >
> > > So what's the trick to getting it to do it's thing?  I logged in
> > > and it wasn't putting out that message, but I forgot to check if
> > > chassisd was running before I installed my kernel and rebooted.
> > > So far I've rebooted 3 times and nothing.
> > >
> > >
> > > On Thu, 18 Feb 2010 12:36:05 -0700 "Harb, Abdallah"
> > > <Abdallah.Harb@lsi.com> wrote:
> > >
> > > > Andy,
> > > >
> > > > I was told that you'll be helping us debugging the SiByte
> > > > watchdog messages. The following are the connections to a
> > > > Cougar unit in the HW lab that is constantly showing the
> > > > failure.
> > > >
> > > > Power Sentry: 10.0.20.15 port# 2.
> > > > Top board console: 10.0.20.11 2002
> > > > Top board IP address: 10.0.20.102
> > > > Bottom board console: 10.0.20.11 2001
> > > > Bottom board IP address: 10.0.20.101
> > > >
> > > > I also have another Cougar 2U in the HW lab that shows the same
> > > > failure. Let me know if you need access to this 2nd unit as
> > > > well.
> > > >
> > > > Thanks,
> > > > Abdallah
> > > >
> > > > ________________________________________
> > > > From: Harb, Abdallah
> > > > Sent: Friday, February 12, 2010 6:39 PM
> > > > To: Stark, Brian; Fong, Rendell
> > > > Subject: SiByte Watchdog messages
> > > >
> > > > Good evening,
> > > >
> > > > This is a follow up to our conversation regarding the SiByte
> > > > watch dog messages that we had yesterday. Today, I tried to
> > > > characterize the failure using a good chassis, a good Mezzanine
> > > > board, and two suspected motherboards. At the end of the day, I
> > > > had so many pages of experiment notes, but unfortunately, it's
> > > > hard to draw any meaningful conclusion out of it. In a
> > > > nutshell, slot location seems to be irrelevant to triggering
> > > > the failure, nor ejecting or inserting a motherboard from the
> > > > chassis.
> > > >
> > > > Next week, I will continue with this experiment using another
> > > > failed unit from Venture. I hope that I will have more
> > > > meaningful results than the one below.
> > > >
> > > > Here's a summary of my experiment that I conducted today:
> > > > I used a known good chassis, a known good Mezzanine card, and
> > > > two suspected motherboards.
> > > >
> > > > Experiment #1
> > > > Top slot - board IP: 10.0.20.101
> > > > Bottom slot - board IP: 10.0.20.102
> > > >
> > > > When both boards were inserted:
> > > > * Top board came up and showed continuous SiByte messages.
> > > > * Bottom board came up OK, but showed only one SiByte message
> > > > (1.0 sec).
> > > >
> > > > When bottom board was ejected:
> > > > * Top board came up OK, No messages.
> > > >
> > > > When top board was ejected:
> > > > * Bottom board came up OK, No messages.
> > > >
> > > > Experiment #2
> > > > (Swapped slot locations)
> > > > Top slot - board IP: 10.0.20.102
> > > > Bottom slot - board IP: 10.0.20.101
> > > >
> > > > When both boards were inserted:
> > > > * Top board came up OK, but showed only one SiByte message (1.0
> > > > sec). * Bottom board came up OK, No messages.
> > > > When this step was repeated:
> > > > * Top board came up OK, No messages.
> > > > * Bottom board came up and showed continuous SiByte messages
> > > > (0.9 sec).
> > > >
> > > > When bottom board was ejected:
> > > > * Top board came up OK, No messages.
> > > >
> > > > When top board was ejected:
> > > > * Bottom board came up OK, No messages.
> > > >
> > > > Please don't conclude that the failure only follows the board
> > > > with IP 10.0.20.101, because the other board also reported
> > > > continuous SiByte messages during another set of experiment.
> > > >
> > > > Regards,
> > > > Abdallah
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Harb, Abdallah
> > > > Sent: Tuesday, January 19, 2010 7:23 PM
> > > > To: Stark, Brian
> > > > Subject: RE: SiByte Watchdog messages
> > > >
> > > > Brian,
> > > >
> > > > Unfortunately, I didn't get a chance to work on it back in
> > > > October, but I worked on it today. I tried all of the following
> > > > debug tests listed in your email, and here are my findings:
> > > >
> > > > Q: Do the messages go away if sysdvt is halted?
> > > > A: No.
> > > >
> > > > Q: If the motherboards are swapped, do the SiByte messages
> > > > follow the board, stay on the same slot, or go away? A: The
> > > > SiByte messages follow the board.
> > > >
> > > > Q: If the mezzanine board is swapped out, do the SiByte messages
> > > > go away? A: Yes, the SiByte messages go away.
> > > >
> > > > Q: If the motherboard showing the problem is moved to another
> > > > chassis, do the SiByte messages go away? A: Yes, the SiByte
> > > > messages go away.
> > > >
> > > > The mezzanine board seems to be the source of the failure, but
> > > > the question is why would the SiByte messages show only on one
> > > > motherboard not on any other board? And if it shows on one
> > > > motherboard it always follow that specific motherboard
> > > > regardless of its slot number in the chassis, as long as the
> > > > suspect mezzanine board is used.
> > > >
> > > > Tomorrow morning, I will be at Venture, and then at ONStor in
> > > > the afternoon. Please let me know if there's anything else that
> > > > I should try?
> > > >
> > > > Regards,
> > > > Abdallah
> > > >
> > > >
