AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080630164518.601c5399@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<brian.stark@onstor.com>,<rendell.fong@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E0AAC6097@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 30 Jun 2008 16:45:25 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Brian Stark" <brian.stark@onstor.com>
Cc: "Rendell Fong" <rendell.fong@onstor.com>
Subject: Re: Cluster in a box reset
Message-ID: <20080630164525.14990fb3@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E0AAC6097@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E0AAC5A6F@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E09624B7B@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0AAC6097@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Let's not forget that I'm supposed to send out an email, like,
now-a-days, saying that folks are to hold any NMF defects until
the release branch is created next week....  ~:^)


On Mon, 30 Jun 2008 16:03:06 -0700 "Brian Stark"
<brian.stark@onstor.com> wrote:

> Rendell,
> 
> Excellent, thanks for getting this going.  I was just talking with
> Andy, and before this is checked in, we should get something out to
> dl-designreview.  I'll put something together for that.  We also need
> to figure out if this is a must-fix change and when it should be
> checked in, e.g. before GA or after GA.
> 
> 
> Brian
>  
> 
> > -----Original Message-----
> > From: Rendell Fong 
> > Sent: Monday, June 30, 2008 11:13 AM
> > To: Brian Stark
> > Cc: Andy Sharp
> > Subject: RE: Cluster in a box reset
> > 
> > I've fixed that define and now its working.
> > 
> > Rendell 
> > 
> > 
> > -----Original Message-----
> > From: Brian Stark
> > Sent: Saturday, June 28, 2008 10:14 PM
> > To: Rendell Fong
> > Cc: Andy Sharp
> > Subject: RE: Cluster in a box reset
> > 
> > I've upgraded the BM-FPGA to Rev 4 on both blades, but the 
> > command is not resetting the other blade.  Looks like the 
> > problem is in cm-api-cg.h:
> > 
> > #define CM_OTHER_BLADE_RESET    0x00800000
> > 
> > This will flip bit 23, not bit 27.  
> > 
> > 
> > Brian
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Rendell Fong
> > > Sent: Thursday, June 26, 2008 3:21 PM
> > > To: Brian Stark
> > > Cc: Andy Sharp
> > > Subject: RE: Cluster in a box reset
> > > 
> > > I've made the code changes to support this capability.
> > > It's loaded on my cluster.  All we need now is your FPGA update.
> > > You are welcome to use my filers to test this tomorrow since I'm
> > > on vacation (as you know).
> > > 
> > > G6r5:
> > > sc0 IP:10.2.5.6
> > > Terminal Server: 10.2.5.235
> > > SSC Console: 9021
> > > 
> > > G7r5:
> > > Sc0 IP: 10.2.5.7
> > > Terminal Server: 10.2.5.235
> > > SSC Console: 9026
> > > 
> > > g6r5 diag> system reset -s
> > > Are you sure ? [y|n] : y
> > > Blade reset not supported by BMFPGA Version 3.
> > > % Command failure.
> > > 
> > > 
> > > 
> > > -----Original Message-----
> > > From: Brian Stark
> > > Sent: Thursday, June 26, 2008 10:44 AM
> > > To: Andy Sharp
> > > Cc: Rendell Fong
> > > Subject: RE: Cluster in a box reset
> > > 
> > > Good idea on using 'system reset'.  It was Cheetah specific and 
> > > allowed for individual boards to be reset, e.g. system reset fp, 
> > > system reset sp.  Makes sense to use this on Cougar.
> > > 
> > > Here are the specifics on doing this with the BM-FPGA:
> > > 
> > > - GPP Register, bit 27	0 = release other board from 
> > > reset, 1 = put other board into reset
> > > 
> > > The software should do a read-modify-write of the GPP register
> > > and then or in bit 27 to set the reset.  Same goes for when 
> > clearing bit 
> > > 27.  Also, we need around 1 sec delay between setting and then 
> > > clearing bit 27.  I've attached a screen capture showing the
> > > memory modify commands in PROM as a reference.
> > > 
> > > This support is in Rev 4 of the BM-FPGA.  Doing this on Rev 3 or 
> > > earlier will have no effect since bit 27 in the GPP register is a 
> > > don't care.
> > > 
> > > Note that the hardware has no protection -- the command is 
> > typed, the 
> > > other board is rebooted.  Because of this, maybe it makes sense
> > > to include the -y option and if it's not entered, prompt the user
> > > with the 'are you sure' string like with system reboot.
> > > 
> > > 
> > > 
> > > Brian
> > > 
> > >  
> > > 
> > > > -----Original Message-----
> > > > From: Andy Sharp
> > > > Sent: Thursday, June 26, 2008 10:11 AM
> > > > To: Brian Stark
> > > > Cc: Rendell Fong
> > > > Subject: Re: Cluster in a box reset
> > > > 
> > > > We can reuse the old system reset command which is gut-able 
> > > because it 
> > > > was cheetah specific.  _system reset -s_ or something.
> > > > 
> > > > On Wed, 25 Jun 2008 21:14:32 -0700 "Brian Stark"
> > > > <brian.stark@onstor.com> wrote:
> > > > 
> > > > > Guys,
> > > > > 
> > > > > I've been asked by several folks about rebooting one
> > > > motherboard from
> > > > > another in the cluster in a box chassis.  This would help 
> > > reboot a 
> > > > > board that is wedged for some reason, which has been seen 
> > > before on 
> > > > > Bobcat.
> > > > > 
> > > > > There are signals running across the midplane between the
> > > > BM-FPGA on
> > > > > each board that can be used for this.  I've put a little
> > > > state machine
> > > > > into the BM-FPGA that will put the other board into reset
> > > > by writing a
> > > > > bit.  The other board will then boot when this bit is 
> > > cleared.  The 
> > > > > code to handle this would be pretty simple:
> > > > > 
> > > > > - Set bit in GPP register in BM-FPGA
> > > > > - Delay for 1 sec
> > > > > - Clear bit in GPP register in BM-FPGA
> > > > > 
> > > > > We would need to build this into a command, presumably at
> > > > the 'system'
> > > > > level.  It could be a new command or it could be a new
> > > > option on the
> > > > > 'system reboot' command.  Also, I would advocate that the
> > > > command is
> > > > > only exposed at the root level.  It's not necessarily 
> > > something we 
> > > > > document or advertise, but Customer Support would be able
> > > > to use it if
> > > > > needed.
> > > > > 
> > > > > Let me know what you think.
> > > > > 
> > > > > 
> > > > > Brian
> > > > > 
> > > > 
> > > 
> > 
