X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C75251.26DCF46A@onstor-exch02.onstor.net>; Fri, 16 Feb 2007 22:04:50 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C75251.26DCF46A"
References: <BB375AF679D4A34E9CA8DFA650E2B04E28F57F@onstor-exch02.onstor.net>
Content-class: urn:content-classes:message
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]
Date: Fri, 16 Feb 2007 22:03:35 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E023B313C@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: corruption and upgrade workflow for Lambo [and 1.3.3.?]
Thread-Index: AcdSOUGBTFM/IMgPRAWU1tGYFef91wAF2nxpAAATxH8=
From: "Paul Hammer" <paul.hammer@onstor.com>
To: "Eric Barrett" <eric.barrett@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>,
	"Chris Vandever" <chris.vandever@onstor.com>
Cc: "Tim Gardner" <tim.gardner@onstor.com>,
	"Caeli Collins" <caeli.collins@onstor.com>,
	"Ed Kwan" <ed.kwan@onstor.com>,
	"Jay Michlin" <jay.michlin@onstor.com>,
	"Larry Scheer" <larry.scheer@onstor.com>,
	"dl-Software" <dl-software@onstor.com>,
	"Sandrine Boulanger" <sandrine.boulanger@onstor.com>,
	"Raj Kumar" <raj.kumar@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C75251.26DCF46A
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hopefully we will never have to actually do the upgrade a second, in a 4 =
node cluster it would add hours to the upgrade window.

________________________________

From: Eric Barrett
Sent: Fri 2/16/2007 9:01 PM
To: Andy Sharp; Chris Vandever
Cc: Tim Gardner; Caeli Collins; Ed Kwan; Jay Michlin; Larry Scheer; Paul =
Hammer; dl-Software
Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]



Andy, I think our savior here is that the old upgrade program won't =
attempt to re-install a file if it's unchanged.  So if the corruption =
chance is, say, 1/5000, then the chance of the possibly 1 or 2 files =
which were corrupted being re-corrupted a second time is small.

Unless I'm missing something...


-----Original Message-----
From: Andy Sharp
To: Chris Vandever
CC: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
Larry Scheer; Paul Hammer; dl-Software
Sent: Fri Feb 16 18:13:46 2007
Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]


On Fri, 16 Feb 2007 18:01:35 -0800 "Chris Vandever"
<chris.vandever@onstor.com> wrote:

> My understanding was that the number of files that fail the compare is
> small in comparison with the total number of files that need to be
> upgraded, thus the second upgrade should get everything remaining
> without any problem.

Based on what I've been seeing, I would characterize it as "the number
of files corrupted is small" but doesn't have any relation to the
number being upgraded.  The max number of upgrade iterations using the
method I describe below is 2.  The max number using the corruption
prone method is ... ?

I'm just talkin' 'bout what I bin seen.

> ChrisV
>
> -----Original Message-----
> From: Andy Sharp
> Sent: Friday, February 16, 2007 2:28 PM
> To: Tim Gardner
> Cc: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Larry Scheer;
> Paul Hammer; dl-Software
> Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]
>
>
> On Fri, 16 Feb 2007 14:19:32 -0800 "Tim Gardner"
> <tim.gardner@onstor.com> wrote:
>
> > The documented procedure is to upgrade the secondary flash, run a
> > system compare, and if
> > corrupted files are found, upgrade again. Once you have a successful
> > compare, reboot from the secondary flash.
>
> What I'm concerned about is that the 'upgrade again' is still the
> corruption prone upgrade process.  It is quite possible, I might even
> hazard a 'likely', that a user will have to execute that loop many
> times before chancing on a lucky upgrade that doesn't corrupt
> anything.
>
> > -----Original Message-----
> > From: Andy Sharp
> > Sent: Friday, February 16, 2007 1:19 PM
> > To: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Tim Gardner;
> > Larry Scheer; Paul Hammer; dl-Software
> > Subject: corruption and upgrade workflow for Lambo [and 1.3.3.?]
> >
> > Howdy,
> >
> > Since I've been messing about with the upgrade code a bunch for
> > Delorean, I've been doing a lot of upgrades in the past several days
> > in the process of doing unit testing, and one thing I've noticed is
> > that upgrades from 1.3.3 to 2.2 or later always find several files
> > that are corrupted after the upgrade.
> >
> > This is because the upgrade process has a corruption problem, as we
> > all know, which was fixed in 2.2 (and possibly some version of
> > 1.3.3?). However, when you upgrade to 2.2 you use the old,
> > corruption prone, upgrade process.
> >
> > Therefore, I believe the workflow for upgrading from a
> > non-upgrade-fixed release to a fixed release requires that you
> > actually upgrade twice.  You must be running the new version when
> > you upgrade the second time.  So, for the sake of brevity, I will
> > just mention 1.3.3 -> 2.2+ in the following:
> >
> > 1.  Upgrade from 1.3.3 or 2.1 to 2.2
> > 2.  Boot 2.2
> >     Note: you may have problems at this point, since any file
> > could conceivably be corrupted, including one of the .bin boot
> > images for the TXRX or FP processors.  If necessary, log in quickly
> >     after rebooting and kill pm in order to keep the system from
> >     rebooting itself before you can execute the next step.
> > 3.  Upgrade to 2.2 again.  You may use the same tar ball you did in
> >     step 1.
> >
> > Please set aside a decent amount of time for this: upgrades in 2.2
> > are not fast.  It downloads the tarball twice and verifies the
> > entire system twice for each upgrade.  I am fixing these issues in
> > Delorean so we won't have to live with this for too terribly long.
> >
> > Cheers,
> >
> > a



------_=_NextPart_001_01C75251.26DCF46A
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML dir=3Dltr><HEAD><TITLE>Re: corruption and upgrade workflow for =
Lambo [and 1.3.3.?]</TITLE>=0A=
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dunicode">=0A=
<META content=3D"MSHTML 6.00.2900.3020" name=3DGENERATOR></HEAD>=0A=
<BODY>=0A=
<DIV id=3DidOWAReplyText19835 dir=3Dltr>=0A=
<DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>Hopefully we =
will never have to actually do the upgrade a second, in a 4 node cluster =
it would add hours to the upgrade window.</FONT></DIV></DIV>=0A=
<DIV dir=3Dltr><BR>=0A=
<HR tabIndex=3D-1>=0A=
<FONT face=3DTahoma size=3D2><B>From:</B> Eric Barrett<BR><B>Sent:</B> =
Fri 2/16/2007 9:01 PM<BR><B>To:</B> Andy Sharp; Chris =
Vandever<BR><B>Cc:</B> Tim Gardner; Caeli Collins; Ed Kwan; Jay Michlin; =
Larry Scheer; Paul Hammer; dl-Software<BR><B>Subject:</B> Re: corruption =
and upgrade workflow for Lambo [and 1.3.3.?]<BR></FONT><BR></DIV>=0A=
<DIV>=0A=
<P><FONT size=3D2>Andy, I think our savior here is that the old upgrade =
program won't attempt to re-install a file if it's unchanged.&nbsp; So =
if the corruption chance is, say, 1/5000, then the chance of the =
possibly 1 or 2 files which were corrupted being re-corrupted a second =
time is small.<BR><BR>Unless I'm missing =
something...<BR><BR><BR>-----Original Message-----<BR>From: Andy =
Sharp<BR>To: Chris Vandever<BR>CC: Tim Gardner; Caeli Collins; Eric =
Barrett; Ed Kwan; Jay Michlin; Larry Scheer; Paul Hammer; =
dl-Software<BR>Sent: Fri Feb 16 18:13:46 2007<BR>Subject: Re: corruption =
and upgrade workflow for Lambo [and 1.3.3.?]<BR><BR><BR>On Fri, 16 Feb =
2007 18:01:35 -0800 "Chris =
Vandever"<BR>&lt;chris.vandever@onstor.com&gt; wrote:<BR><BR>&gt; My =
understanding was that the number of files that fail the compare =
is<BR>&gt; small in comparison with the total number of files that need =
to be<BR>&gt; upgraded, thus the second upgrade should get everything =
remaining<BR>&gt; without any problem.<BR><BR>Based on what I've been =
seeing, I would characterize it as "the number<BR>of files corrupted is =
small" but doesn't have any relation to the<BR>number being =
upgraded.&nbsp; The max number of upgrade iterations using the<BR>method =
I describe below is 2.&nbsp; The max number using the =
corruption<BR>prone method is ... ?<BR><BR>I'm just talkin' 'bout what I =
bin seen.<BR><BR>&gt; ChrisV<BR>&gt;<BR>&gt; -----Original =
Message-----<BR>&gt; From: Andy Sharp<BR>&gt; Sent: Friday, February 16, =
2007 2:28 PM<BR>&gt; To: Tim Gardner<BR>&gt; Cc: Caeli Collins; Eric =
Barrett; Ed Kwan; Jay Michlin; Larry Scheer;<BR>&gt; Paul Hammer; =
dl-Software<BR>&gt; Subject: Re: corruption and upgrade workflow for =
Lambo [and 1.3.3.?]<BR>&gt;<BR>&gt;<BR>&gt; On Fri, 16 Feb 2007 14:19:32 =
-0800 "Tim Gardner"<BR>&gt; &lt;tim.gardner@onstor.com&gt; =
wrote:<BR>&gt;<BR>&gt; &gt; The documented procedure is to upgrade the =
secondary flash, run a<BR>&gt; &gt; system compare, and if<BR>&gt; &gt; =
corrupted files are found, upgrade again. Once you have a =
successful<BR>&gt; &gt; compare, reboot from the secondary =
flash.<BR>&gt;<BR>&gt; What I'm concerned about is that the 'upgrade =
again' is still the<BR>&gt; corruption prone upgrade process.&nbsp; It =
is quite possible, I might even<BR>&gt; hazard a 'likely', that a user =
will have to execute that loop many<BR>&gt; times before chancing on a =
lucky upgrade that doesn't corrupt<BR>&gt; anything.<BR>&gt;<BR>&gt; =
&gt; -----Original Message-----<BR>&gt; &gt; From: Andy Sharp<BR>&gt; =
&gt; Sent: Friday, February 16, 2007 1:19 PM<BR>&gt; &gt; To: Caeli =
Collins; Eric Barrett; Ed Kwan; Jay Michlin; Tim Gardner;<BR>&gt; &gt; =
Larry Scheer; Paul Hammer; dl-Software<BR>&gt; &gt; Subject: corruption =
and upgrade workflow for Lambo [and 1.3.3.?]<BR>&gt; &gt;<BR>&gt; &gt; =
Howdy,<BR>&gt; &gt;<BR>&gt; &gt; Since I've been messing about with the =
upgrade code a bunch for<BR>&gt; &gt; Delorean, I've been doing a lot of =
upgrades in the past several days<BR>&gt; &gt; in the process of doing =
unit testing, and one thing I've noticed is<BR>&gt; &gt; that upgrades =
from 1.3.3 to 2.2 or later always find several files<BR>&gt; &gt; that =
are corrupted after the upgrade.<BR>&gt; &gt;<BR>&gt; &gt; This is =
because the upgrade process has a corruption problem, as we<BR>&gt; &gt; =
all know, which was fixed in 2.2 (and possibly some version of<BR>&gt; =
&gt; 1.3.3?). However, when you upgrade to 2.2 you use the old,<BR>&gt; =
&gt; corruption prone, upgrade process.<BR>&gt; &gt;<BR>&gt; &gt; =
Therefore, I believe the workflow for upgrading from a<BR>&gt; &gt; =
non-upgrade-fixed release to a fixed release requires that you<BR>&gt; =
&gt; actually upgrade twice.&nbsp; You must be running the new version =
when<BR>&gt; &gt; you upgrade the second time.&nbsp; So, for the sake of =
brevity, I will<BR>&gt; &gt; just mention 1.3.3 -&gt; 2.2+ in the =
following:<BR>&gt; &gt;<BR>&gt; &gt; 1.&nbsp; Upgrade from 1.3.3 or 2.1 =
to 2.2<BR>&gt; &gt; 2.&nbsp; Boot 2.2<BR>&gt; &gt; &nbsp;&nbsp;&nbsp; =
Note: you may have problems at this point, since any file<BR>&gt; &gt; =
could conceivably be corrupted, including one of the .bin boot<BR>&gt; =
&gt; images for the TXRX or FP processors.&nbsp; If necessary, log in =
quickly<BR>&gt; &gt; &nbsp;&nbsp;&nbsp; after rebooting and kill pm in =
order to keep the system from<BR>&gt; &gt; &nbsp;&nbsp;&nbsp; rebooting =
itself before you can execute the next step.<BR>&gt; &gt; 3.&nbsp; =
Upgrade to 2.2 again.&nbsp; You may use the same tar ball you did =
in<BR>&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp; step 1.<BR>&gt; &gt;<BR>&gt; =
&gt; Please set aside a decent amount of time for this: upgrades in =
2.2<BR>&gt; &gt; are not fast.&nbsp; It downloads the tarball twice and =
verifies the<BR>&gt; &gt; entire system twice for each upgrade.&nbsp; I =
am fixing these issues in<BR>&gt; &gt; Delorean so we won't have to live =
with this for too terribly long.<BR>&gt; &gt;<BR>&gt; &gt; =
Cheers,<BR>&gt; &gt;<BR>&gt; &gt; a<BR></FONT></P></DIV></BODY></HTML>
------_=_NextPart_001_01C75251.26DCF46A--
