X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C8638B.5CD4A477@onstor-exch02.onstor.net>; Wed, 30 Jan 2008 14:59:18 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: "system upgrade" vs. script for Cougar
Date: Wed, 30 Jan 2008 14:59:18 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E080582F3@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E080582B5@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: "system upgrade" vs. script for Cougar
Thread-Index: AchjeLrBICsQ0QpVRuCkwfhovZLTWgADOG+wAAFhEuA=
References: <BB375AF679D4A34E9CA8DFA650E2B04E08057FE7@onstor-exch02.onstor.net><BB375AF679D4A34E9CA8DFA650E2B04E08058164@onstor-exch02.onstor.net> <20080130114556.70c017b3@ripper.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E080582B5@onstor-exch02.onstor.net>
From: "Tim Gardner" <tim.gardner@onstor.com>
To: "Rich LaReau" <rich.lareau@onstor.com>
Cc: "Larry Scheer" <larry.scheer@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>

Larry owns system upgrade so he gets to decide what goes into the spec.
He will use as input the info you have compiled as well as the vague
and ambiguous requirements from the MRD.
He will start working on the spec after finishing his current task.

-----Original Message-----
From: Rich LaReau=20
Sent: Wednesday, January 30, 2008 1:40 PM
To: Tim Gardner
Cc: Larry Scheer; Andy Sharp
Subject: RE: "system upgrade" vs. script for Cougar


The request for making the initialization default stemmed from two
concerns (or three depending on how you count them.) =20

1) Making it fast.  If the verification/full upgrade choice is a wash
anyway, I don't think anyone will care much.  We'd rather have something
rock-solid reliable and slow than anything that might require having to
re-do an upgrade.

2) Making it clean and simple.  Some think it would be good to have
upgrades start clean, without leftover cores, logs etc. that have been
accumulated.  I'm not sure this is a big problem-- do we know if that's
been a problem resulting in failed upgrades?


As for the lifetime of the CF card, is that still an issue?  Eric
mentioned that failed cards almost always show the damage to the /var
part of the partition that gets written to over and over anyway.  If
that's the case there might not be much to be gained by being stingy
with writes on upgrades.


Tim, did you mentioned that a new spec for this was going to get
written?  How is that done, and who decides what goes into the spec?

Thanks,
Rich


-----Original Message-----
From: Andy Sharp=20
Sent: Wednesday, January 30, 2008 11:46 AM
To: Tim Gardner
Cc: Rich LaReau; Larry Scheer
Subject: Re: "system upgrade" vs. script for Cougar

On Wed, 30 Jan 2008 11:22:50 -0800 "Tim Gardner"
<tim.gardner@onstor.com> wrote:

> We need to talk about #1.
> We have spend considerable time perfecting verify install so that it=20
> can detect the files that need to be upgraded and only upgrade those=20
> files. Always doing a clean install renders all this functionality=20
> useless and will increase the upgrade time.

There is another reason that logic was so carefully sweated over, by
several developers.  When you format the CF card and write everthing
anew, you seriously take a chunk out of the CF card's write budget.
These things have only so many writes in them before they go south.
Normally, the card itself manages "write blocks" and tries to do
balancing, but if you write over the entire thing, it basically
nullifies that and shortens the overall life of the card.  So we may
want to stick to the current strategy for reliability reasons.  Perhaps
a honest-to-betsy design review meeting with bells and whatnot should
happen on this.

One thing I've noticed about upgrade: it attracts a lot of armchair
designing.  Experience working on this subsystem has taught me that what
often seems like simple and obvious often turns out to be not so
straight forward after study and testing.

> > -----Original Message-----
> > From: Rich LaReau
> > Sent: Wednesday, January 30, 2008 9:14 AM
> > To: Larry Scheer; Tim Gardner; Andy Sharp
> > Cc: Andrew LeFebvre
> > Subject: FW: "system upgrade" vs. script for Cougar
> >=20
> >=20
> > Hi guys,
> >=20
> > At yesterday's Cougar meeting Tim said that he (?) is working on a
> design
> > doc for "system upgrade".  I just wanted to ask if there was=20
> > anything
> left
> > for me to do here.  I filed an ECR which Larry was going to review.
> Here
> > was the request:
> >=20
> >=20
> > TED 21999
> > Would like to get a few changes made to "system upgrade" behavior
> (unclear
> > if these should be separate ECR's)
> >=20
> > 1)  Combine "system copy init" to default "system upgrade".  We want

> > default behavior to be to initialize secondary flash and then run=20
> > the upgrade, all in one step.  (Include a flag which will disable=20
> > the initialization)
> >=20
> > 2)  Make secondary flash upgrade default.  Include flag which=20
> > upgrades primary flash.  (Obviously, the "init" step above will=20
> > never run under this option.)
> >=20
> > 3)  "system upgrade" should be able to recognize an initialized=20
> > flash,
> and
> > skip the step where it scans for which files it needs to download.
> The
> > flash is empty, everything needs to be downloaded, so we should skip
> this
> > step to save some time.
> >=20
> >=20
> > Was there anything else?
> > Rich
> >=20
> >=20
> >=20
> > -----Original Message-----
> > From: Rich LaReau
> > Sent: Friday, January 25, 2008 1:34 PM
> > To: Eric Barrett; Andy Sharp; Dennis Arellano
> > Cc: Sandrine Boulanger; Larry Scheer; Vikas Saini; Sudheesh Nair;=20
> > Tim Gardner; Andrew LeFebvre; Caeli Collins; Paul Hammer; Brian=20
> > Stark Subject: RE: "system upgrade" vs. script for Cougar
> >=20
> >=20
> > I filed an ECR to capture the requests for the system upgrade.
> > Please have a look and let me know if this will meet our needs, and=20
> > if it
> needs
> > more information, etc.
> >=20
> > TED00021999
> >=20
> > Thanks for all the help and valuable input!
> > Rich
> >=20
> >=20
> > -----Original Message-----
> > From: Eric Barrett
> > Sent: Thursday, January 24, 2008 5:39 PM
> > To: Andy Sharp; Dennis Arellano
> > Cc: Sandrine Boulanger; Rich LaReau; Larry Scheer; Vikas Saini;
> Sudheesh
> > Nair; Tim Gardner; Andrew LeFebvre; Caeli Collins; Paul Hammer;=20
> > Brian Stark
> > Subject: RE: "system upgrade" vs. script for Cougar
> >=20
> > That kind of exception makes a lot of sense for internal use, but=20
> > not
> much
> > for customers.  Why not also make that an 'exception' flag, akin to
> what
> > Tim suggested about making the secondary the default target, and the

> > primary usable via -p?
> >=20
> >=20
> > -----Original Message-----
> > From: Andy Sharp
> > Sent: Thursday, January 24, 2008 5:31 PM
> > To: Dennis Arellano
> > Cc: Sandrine Boulanger; Rich LaReau; Larry Scheer; Eric Barrett;=20
> > Vikas Saini; Sudheesh Nair; Tim Gardner; Andrew LeFebvre; Caeli=20
> > Collins;
> Paul
> > Hammer; Brian Stark
> > Subject: Re: "system upgrade" vs. script for Cougar
> >=20
> > I like all of what I'm hearing, except for the bug part, of course.
> > Nice bit of teamwork.
> >=20
> > As for Tim's question, I can picture instances where there are=20
> > mods/customizations, bashrc files, etc., that I wouldn't want wiped
> out
> > just to do an upgrade, so I think keeping it a two stepper is OK.
> Keep in
> > mind that on Linux, the 'init' step takes 30s or less, so the impact
> is
> > lower and the impatience factor is also lower.  In the near future.
> > ~:^)
> >=20
> > Cheers,
> >=20
> > a
> >=20
> > On Thu, 24 Jan 2008 17:17:31 -0800 "Dennis Arellano"
> > <dennis.arellano@onstor.com> wrote:
> >=20
> > > Sounds like we are close to closure.
> > >
> > > Rich, when you have all the details worked out, we need to sit=20
> > > down and write up the new Upgrade Procedure.
> > >
> > > Thanks, Dennis
> > >
> > > -----Original Message-----
> > > From: Sandrine Boulanger
> > > Sent: Thursday, January 24, 2008 4:53 PM
> > > To: Rich LaReau; Larry Scheer; Andy Sharp; Eric Barrett
> > > Cc: Vikas Saini; Sudheesh Nair; Tim Gardner; Dennis Arellano;=20
> > > Andrew LeFebvre; Caeli Collins; Paul Hammer; Brian Stark
> > > Subject: RE: "system upgrade" vs. script for Cougar
> > >
> > > There a bug that ins some cases, after a copy init, a system=20
> > > upgrade -s does not work with the error "not compatible with this=20
> > > system" or similar. I ran into it this morning. I had to manually=20
> > > mount / from secondary flash and copy /version from primary to=20
> > > secondary to be
> able
> > > to upgrade. This needs to be fixed if we want to use this=20
> > > procedure.
> A
> > > defect was filed months ago.
> > >
> > > -----Original Message-----
> > > From: Rich LaReau
> > > Sent: Thursday, January 24, 2008 4:49 PM
> > > To: Larry Scheer; Andy Sharp; Sandrine Boulanger; Eric Barrett
> > > Cc: Vikas Saini; Sudheesh Nair; Tim Gardner; Dennis Arellano;=20
> > > Andrew LeFebvre; Caeli Collins; Paul Hammer; Brian Stark
> > > Subject: RE: "system upgrade" vs. script for Cougar
> > >
> > >
> > >
> > > Hi all,
> > >
> > > After further discussion with Larry, we think that we have a=20
> > > simple and robust solution with the current implementation of the=20
> > > "system upgrade" command.  In particular, we should consider=20
> > > making our primary supported upgrade process the following:
> > >
> > > < get logs, insure backups are good... etc.>
> > >
> > > Then run the two commands:
> > >
> > > > system copy init
> > > > system upgrade ...
> > >
> > > This essentially replicates what the script is doing.  The first=20
> > > command will make a clean CF with empty partitions.  The second=20
> > > will make a clean installation.  (I've tested this in 3.2, and=20
> > > there's still a wasted slow step of "checking to see what's there"

> > > before
> the
> > > install, but Larry says we can clean that up easily. I'll file the
> > > ECR.)
> > >
> > > We could consider standardizing on this method for 3.2 patch
> releases,
> > > and make any subsequent changes in time for Cougar, if necessary.
> > >
> > > Would this meet everyone's requirements?
> > >
> > > Rich
> > >
> > >
> > > -----Original Message-----
> > > From: Rich LaReau
> > > Sent: Thursday, January 24, 2008 9:04 AM
> > > To: Larry Scheer; Andy Sharp
> > > Cc: Vikas Saini; Sudheesh Nair; Tim Gardner; Dennis Arellano;=20
> > > Andrew LeFebvre; Caeli Collins; Paul Hammer; Brian Stark
> > > Subject: RE: "system upgrade" vs. script for Cougar
> > >
> > >
> > > Thanks Andy and Larry, for the comments.  I guess it's hard for me
> to
> > > say one way or the other whether the current "system upgrade"
> process
> > > is good enough for everyone's concerns.  I believe that customers
> have
> > > been using the scripts exclusively since they were available, so=20
> > > we have no data otherwise.  The admittedly old perceptions of the
> "system
> > > upgrade" commands have not been replaced by good experience in the

> > > field.
> > >
> > > We could try to use the non-script process for patch releases for
> 3.2
> > > and see how well they go.  Although any feedback we get might be=20
> > > too late to incorporate back into Cougar, if needed.
> > >
> > > Anybody else have other suggestions?
> > >
> > > Rich
> > >
> > >
> > > -----Original Message-----
> > > From: Larry Scheer
> > > Sent: Wednesday, January 23, 2008 5:42 PM
> > > To: Andy Sharp; Rich LaReau
> > > Cc: Vikas Saini; Sudheesh Nair; Tim Gardner; Dennis Arellano;=20
> > > Andrew LeFebvre; Caeli Collins; Paul Hammer; Brian Stark
> > > Subject: RE: "system upgrade" vs. script for Cougar
> > >
> > > Thanks Andy for forwarding this message to me; shame on you Rich,
> for
> > > not sending me a copy of your original message. ;-)
> > >
> > > I placed my comments in line below.
> > >
> > > Larry
> > >
> > > -----Original Message-----
> > > From: Andy Sharp
> > > Sent: Wednesday, January 23, 2008 3:32 PM
> > > To: Rich LaReau
> > > Cc: Vikas Saini; Sudheesh Nair; Tim Gardner; Dennis Arellano;=20
> > > Andrew LeFebvre; Larry Scheer; Caeli Collins; Paul Hammer; Brian=20
> > > Stark Subject: Re: "system upgrade" vs. script for Cougar
> > >
> > > On Wed, 23 Jan 2008 15:02:01 -0800 "Rich LaReau"
> > > <rich.lareau@onstor.com> wrote:
> > >
> > > >
> > > > Hi team,
> > > >
> > > > Yesterday after the core team meeting I reviewed the issues that

> > > > Support had with the "system upgrade" procedure.  The concerns=20
> > > > and requirements appear to come down to a few issues.  I wanted=20
> > > > to get them out for review, and then ask you what our next steps

> > > > need to be.
> > > >
> > > > The issues which are of CS concern are:
> > > >
> > > > 1)   Failed upgrades.  Before we used the scripts, it was common
> > > > that we would get upgrades which outright failed, or worse,
> appeared
> > > > to work but contained some corruption.  If we are to rely on
> "system
> > > > upgrade" then these issues need to be resolved.  By comparison,
> the
> > > > scripts completely blank the secondary flash and then copy=20
> > > > everything in fresh.  This avoids problems with copying cruft=20
> > > > (old core files in odd places, repair attempts, lost+found=20
> > > > files, etc.)
> > >
> > > I think this is largely an old perception.  I personally know of=20
> > > all these problems you mentioned being fixed at least by the 3.0=20
> > > release if not sooner. At this point in time, and consider that=20
> > > this is the person who originally wrote the install script saying=20
> > > this, if I it was my customer, and I was upgrading from a 3.x to a

> > > 3.x, I would
> use
> > > system upgrade.  Or flash_install with auto-upgrade.
> > >
> > > Note that there is a workflow that uses system upgrade that does
> what
> > > you mention in terms of wiping the CF card and installing fresh.
> > > Doing a _system copy init_ followed by a _system upgrade_.  Code=20
> > > was specifically added to make this possible.
> > >
> > > [LCS] I completely agree with Andy. Your perceptions are very old.
> The
> > > problems with system upgrade corrupting installations were fixed=20
> > > in Release 2.1. The ability to erase a flash and install (as Andy
> > > mentions) was introduced in release 3.0 when the layout of the=20
> > > flash changed.
> > >
> > > > 2)  Time of upgrade.  This problem was primarily the result of=20
> > > > needing
> > >
> > > > to download the entire tarball twice from the ftp server.  The=20
> > > > scripts
> > >
> > > > are able to work with only one download.
> > > >
> > >
> > > [LCS] The double download problem was fixed in 3.0. The cw_install

> > > script was created because the 2.X system upgrade could not=20
> > > reliably upgrade the system to 3.0 due to the SSC running out of=20
> > > memory. A
> side
> > > benefit of the cw_install script is it ran in less time. But it=20
> > > was designed to get customers from 2.X and earlier to 3.0 in a=20
> > > single unmonitored step. I am working on a faster upgrade but I=20
> > > can't
> promise
> > > anything for system W (BSD based releases.)
> > >
> > > > 3) Ease of use.  We do a lot of upgrades, so the process needs=20
> > > > to
> be
> > > > simple-- both in the upgrade itself and the recovery process in
> case
> > > > something fails.
> > >
> > > As I understand it, many SEs make a flash with flash_install.sh
> before
> > > they head to the customer site, and set it to auto-upgrade when it
> is
> > > booted. That means that you can just insert the flash card in the=20
> > > secondary slot, and do a reboot.  When it boots, it copies the=20
> > > configuration info from the other flash card.  For down time, this
> is
> > > definitely the fastest way to go, but obviously there are a lot of

> > > caveats to that as well, not the least of which is logistical,=20
> > > which prevents us (I'm told) from making this our default upgrade
> procedure.
> > >
> > > [LCS] I'm with you on this one Rich!
> > >
> > > > Some people had suggestions about how we might implement a good=20
> > > > system
> > >
> > > > upgrade process:
> > >
> > > Excellent.  Ideas always most welcome.  Keep 'em coming.  Often.
> > >
> > > > A)  The system upgrade command should look in the downloaded
> tarball
> > > > for a script to execute.  If it exists, the script is run; if=20
> > > > not, it uses the old method.  That means if we do other flash=20
> > > > card layouts, or
> > >
> > > > even more drastic changes, the script can manage all of that,=20
> > > > and
> it
> > > > will be tied to the release bundle rather than having to be
> inserted
> > > > earlier in the development cycle.
> > >
> > > This is the way it actually works now.  Believe me when I say that

> > > this has been long thought of and desired.  Because of the current

> > > design of executing a script from the upgrade tarball itself, many

> > > changes can be made in one release, but not all.  For instance,
> being
> > > able to use HTTP as well as FTP, well, that's going into the next=20
> > > release, but it won't help when upgrading to that release.  But=20
> > > eventually it will help.
> > >
> > > [LCS] This is the way system upgrade has been working since 2.1. A

> > > change was just checked into the dev branch to allow the System W
> (BSD
> > > based) release to use HTTP. But remember, hen you run system=20
> > > upgrade it is the older nfxsh without the new capability that is=20
> > > doing the download.
> > >
> > > > B)  Standardize to the WebUI and replace the need for an FTP
> server
> > > > all together.  We could specify that the tar ball and md5 file=20
> > > > are in the same directory pick them both up and check them=20
> > > > before install commences, therefore ruling out a corrupt tar=20
> > > > ball as a problem for the installation. If we also had a new=20
> > > > upgrade script
> we
> > > > could have the process look for this before commencing the=20
> > > > installation as well, replace the current install script and=20
> > > > then start the process again. So the process would be, go to the

> > > > WebUI, open the system upgrade page, point it the directory=20
> > > > containing
> the
> > > > scripts and click on the version to install, then sit back get a

> > > > coffee/tea and watch the status bar go straight up to 90% before

> > > > waiting another 30 minutes for
> > >
> > > > the process to complete (sorry this isn't MS is it!)
> > >
> > > Sounds good to me!  Let us know when you've got something to test!
> > > And yes, that is on my list of faves but earliest would be post
> Cougar
> > > release, and no, it's not on any schedule.  Keep in mind that=20
> > > implementing this is no small project, and I'm not even including
> the
> > > WebUI development part.  Ease of use is not always easy to do ~:^)
> > >
> > > Cheers,
> > >
> > > a
