AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080613113234.05e3657d@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<maxim.kozlovsky@onstor.com>,<dl-designreview@onstor.com>,<brian.stark@onstor.com>,<warren.gale@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#mh/Mailbox/design review	0	BB375AF679D4A34E9CA8DFA650E2B04E0A6E8AB0@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Fri, 13 Jun 2008 11:33:05 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>
Cc: "dl-Design Review" <dl-designreview@onstor.com>, "Brian Stark"
 <brian.stark@onstor.com>, "Warren Gale" <warren.gale@onstor.com>
Subject: Re: Proposed design for new(ish) boot procedure for Cougar
Message-ID: <20080613113305.5342c689@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E0A6E8AB0@onstor-exch02.onstor.net>
References: <20080612182458.010d3d89@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0A6E8AB0@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Fri, 13 Jun 2008 10:47:49 -0700 "Maxim Kozlovsky"
<maxim.kozlovsky@onstor.com> wrote:

> For the independent fp/txrx reboot the main problem is reinitializing
> the mgmtbus driver. The reboot code will have to ifconfig down the
> mgmtbus, reset fp/txrx, and ifconfig the mgmtbus up after the fp/txrx
> reboot. The management bus driver will have to execute the equivalent
> of the current startup code when this happens. 

Not a big problem, really.  Rearrange the code a bit to make it truly
modular, add a couple lines of code to implement the if up/down, and
shazam.

But this is all PHASE-II.

> >-----Original Message-----
> >From: Andy Sharp
> >Sent: Thursday, June 12, 2008 6:25 PM
> >To: dl-Design Review; Brian Stark; Warren Gale
> >Subject: Proposed design for new(ish) boot procedure for Cougar
> >
> >                       Cougar Boot Procedure Redesign
> >                       ______________________________
> >
> >Problem
> >=======
> >
> >    Booting takes far too long on Cougar, and in theory the embedded
> >    nodes should be rebootable w/o rebooting Linux on the Sibyte
> > 1125.
> >
> >Reasons:
> >    1)    Image load from CF is intolerably slow
> >    2)    After image load, Linux boot takes the longest but is the
> >          least likely to need rebooting, resulting in an unnecessary
> >		  bottleneck.
> >
> >Solution
> >========
> >
> >    Redesign the boot flow to allow the embedded cores to be
> >    independently booted if Linux is up.
> >
> >Proposal
> >========
> >
> >    Take a phased approach to implementing a redesigned boot
> > procedure:
> >
> >	Phase I
> >	-------
> >	1)  Change SSC PROM to load and boot only Linux.
> >	2)  Change FP/TXRX PROM to write a magic cookie in a
> >	    predefined memory location indicating its readiness
> >	    for it's image to be loaded.
> >	3)  Impement an early start Linux daemon that waits for these
> >	    boot magic cookies to be set by the embedded cores, loads
> >	    their images to the correct memory locations, and signals
> >	    to the FP/TXRX when finished.  The FP and TXRX could boot
> >            while Linux completes its boot steps.
> >
> >	Phase 2
> >	-------
> >	1)  Through testing, determine what needs to be done to allow
> >	    FP/TXRX to be rebooted independently without disturbing
> >	the Linux kernel and each other.  Current daemons that
> >            communicate with FP/TXRX are not expected to be much
> trouble
> >            since they had to handle this for Cheetah, although this
> has
> >            not been extensively tested on Cheetah in the last few
> >            releases.
> >
> >Expected Results
> >================
> >
> >Phase I
> >-------
> >
> >Current boot time           Predicted Boot time        Predicted
> savings
> >-----------------           -------------------
> -----------------
> >2 minutes, 57 secs          1 minute, 43.7 secs        1 minute, 13.7
> >secs
> >
> >42% reduction in boot time: current boot time* is 2:57, resulting
> >boot time is estimated to be 1:43.7, or, a savings of 1:13.7, or,
> >the new method would boot 1.7 times faster (2 times faster, or twice
> >as fast, would be a 50% reduction in boot time).
> >
> >These estimations based on a difference in image load time for the
> >FP/TXRX of 86 seconds for the PROM, and 12.7 seconds for Linux (cold
> >cache).
> >
> >
> >Phase II
> >--------
> >If just rebooting one or both of the FP/TXRX nodes, boot time
> >estimated to be in the sub 10 second range.  This would
> >substantially increase customer satisfaction and supportability, as
> >well as resulting in a substantial increase in developer efficiency.
> >
> >
> >
> >
> >
> >* Boot time measured from when PROM code starts loading the first
> >boot image to when nfxsh CLI is available.
