AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080612182439.17ba428b@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<dl-designreview@onstor.com>,<brian.stark@onstor.com>,<warren.gale@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 12 Jun 2008 18:24:58 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: dl-Design Review <dl-designreview@onstor.com>, Brian Stark
 <brian.stark@onstor.com>, Warren Gale <warren.gale@onstor.com>
Subject: Proposed design for new(ish) boot procedure for Cougar
Message-ID: <20080612182458.010d3d89@ripper.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

                       Cougar Boot Procedure Redesign
                       ______________________________

Problem
=======

    Booting takes far too long on Cougar, and in theory the embedded
    nodes should be rebootable w/o rebooting Linux on the Sibyte 1125.

Reasons:
    1)    Image load from CF is intolerably slow
    2)    After image load, Linux boot takes the longest but is the
          least likely to need rebooting, resulting in an unnecessary
		  bottleneck.

Solution
========

    Redesign the boot flow to allow the embedded cores to be
    independently booted if Linux is up.

Proposal
========

    Take a phased approach to implementing a redesigned boot procedure:

	Phase I
	-------
	1)  Change SSC PROM to load and boot only Linux.
	2)  Change FP/TXRX PROM to write a magic cookie in a
	    predefined memory location indicating its readiness
	    for it's image to be loaded.
	3)  Impement an early start Linux daemon that waits for these
	    boot magic cookies to be set by the embedded cores, loads
	    their images to the correct memory locations, and signals
	    to the FP/TXRX when finished.  The FP and TXRX could boot
            while Linux completes its boot steps.

	Phase 2
	-------
	1)  Through testing, determine what needs to be done to allow
	    FP/TXRX to be rebooted independently without disturbing the
	    Linux kernel and each other.  Current daemons that
            communicate with FP/TXRX are not expected to be much trouble
            since they had to handle this for Cheetah, although this has
            not been extensively tested on Cheetah in the last few
            releases.

Expected Results
================

Phase I
-------

Current boot time           Predicted Boot time        Predicted savings
-----------------           -------------------        -----------------
2 minutes, 57 secs          1 minute, 43.7 secs        1 minute, 13.7
secs

42% reduction in boot time: current boot time* is 2:57, resulting boot
time is estimated to be 1:43.7, or, a savings of 1:13.7, or, the new
method would boot 1.7 times faster (2 times faster, or twice as fast,
would be a 50% reduction in boot time).

These estimations based on a difference in image load time for the
FP/TXRX of 86 seconds for the PROM, and 12.7 seconds for Linux (cold
cache).


Phase II
--------
If just rebooting one or both of the FP/TXRX nodes, boot time estimated
to be in the sub 10 second range.  This would substantially increase
customer satisfaction and supportability, as well as resulting in a
substantial increase in developer efficiency.





* Boot time measured from when PROM code starts loading the first boot
image to when nfxsh CLI is available.
