X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C75479.350B03C4@onstor-exch02.onstor.net>; Mon, 19 Feb 2007 15:56:36 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C75479.350B03C4"
References: <20070216142730.10a7902b@ripper.onstor.net><BB375AF679D4A34E9CA8DFA650E2B04E0138C465@onstor-exch02.onstor.net> <20070216181346.14f6cbe5@ripper.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E023B314D@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E02176022@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E0A9138@onstor-exch02.onstor.net>
Content-class: urn:content-classes:message
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]
Date: Mon, 19 Feb 2007 15:56:36 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E028D1B@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: corruption and upgrade workflow for Lambo [and 1.3.3.?]
Thread-Index: AcdSOUGBTFM/IMgPRAWU1tGYFef91wAiyNLDAAcj9qAAYYRmKAAEC4mk
From: "Shamsudeen Jeseem" <jeseem@onstor.com>
To: "Larry Scheer" <larry.scheer@onstor.com>,
	"Sandrine Boulanger" <sandrine.boulanger@onstor.com>,
	"Paul Hammer" <paul.hammer@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>,
	"Chris Vandever" <chris.vandever@onstor.com>
Cc: "Tim Gardner" <tim.gardner@onstor.com>,
	"Caeli Collins" <caeli.collins@onstor.com>,
	"Eric Barrett" <eric.barrett@onstor.com>,
	"Ed Kwan" <ed.kwan@onstor.com>,
	"Jay Michlin" <jay.michlin@onstor.com>,
	"dl-Software" <dl-software@onstor.com>,
	"Raj Kumar" <raj.kumar@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C75479.350B03C4
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Seeing a lot of mails, and am puzzled about the swap space limitations.
so can someone clarify a few questions ( I might be asking very silly =
questions :)=20

During upgrade process, if we are short of swap,
1. why not use the swap space of secondary flash as well.=20
 that gives an additional 20 mb (ofcourse i mean add swap dynamically)



2. if we are still short, we can create a file (in /var of secondary =
flash) and use that as swap file.
   we can safely claim 20 Mb atleast from the 32 Mb of /var.

-jeseem

-----Original Message-----
From: Larry Scheer
Sent: Mon 2/19/2007 2:00 PM
To: Sandrine Boulanger; Paul Hammer; Andy Sharp; Chris Vandever
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
dl-Software; Raj Kumar
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]
=20
I might as well add my 2 bits to this thread FWIW...
=20
The corruption is caused by /usr/bin/install in what appears to be a low =
memory situation when the SSC is swapping. (Andy if you have seen the =
corruption occur outside of this scenario I would like to know about =
it.) In all my hours of observing the "broken" install process, file =
corruption only occurred when memory used was at the point the SSC =
needed to use swap space or was already swapping. A second upgrade of =
the same release never failed to correctly upgrade the corrupted files =
on the flash.

Andy does have a point that there are no guarantees that the broken =
upgrade will always get it "right" a second time, even if the number of =
files to upgrade are small.

But, I agree with Sandrine and Eric that the current proceedure is fine.

Given there are no guarantees with the old upgrade the best we can do is =
try to get the filer being upgraded into a quiescent state as much as =
possible. A reboot before upgrading the flash, if it is not in the =
recommended proceedures, should be done if a second upgrade is needed. =
It actually should be done before the initial upgrade. Killing PM would =
really clear up demands on memory use during upgrade, but I have heard =
that is undesirable for several reasons I can't remember at this moment. =


If anyone is absoulutely paranoid about the old upgrade program and =
wants to "guarantee" there are no file corruption problems then I =
suggest extracting /usr/bin/install from the 2.2 or 2.3 distribution and =
copying it to the primary flash and making sure the permissions are set =
correctly: chmod 0555 chown root:bin (-r-xr-xr-x root bin.) But I doubt =
this is acceptable to anyone in CS to be done in the field. Only =
developers, QE, or in-house installations should consider this as a =
work-around.=20

Replacing /usr/bin/install with a fixed version only adresses the file =
corruption issues. There still are three or four more deficiencies with =
the broken upgrade command. I won't bore you with the details of those =
problems. If anyone wants more information on those drop me a note, or =
review the Lamborghini TOI presentation.

Larry

-----Original Message-----
From: Sandrine Boulanger
Sent: Sat 2/17/2007 2:14 PM
To: Paul Hammer; Andy Sharp; Chris Vandever
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
Larry Scheer; dl-Software; Raj Kumar
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]
=20
I think the current upgrade procedure is fine. If support finds out =
customers too often encounter the discrepancy, then we can consider a =
double upgrade. But then they completely lose their ability to go back =
since we would overwrite the original flashes.


-----Original Message-----
From: Paul Hammer
Sent: Sat 2/17/2007 10:49 AM
To: Andy Sharp; Chris Vandever
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
Larry Scheer; dl-Software; Raj Kumar; Sandrine Boulanger
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]
=20
Adding Raj and Sandrine to the thread in case we want to consider this =
in Delorean.

________________________________

From: Andy Sharp
Sent: Fri 2/16/2007 6:13 PM
To: Chris Vandever
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
Larry Scheer; Paul Hammer; dl-Software
Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]




On Fri, 16 Feb 2007 18:01:35 -0800 "Chris Vandever"
<chris.vandever@onstor.com> wrote:

> My understanding was that the number of files that fail the compare is
> small in comparison with the total number of files that need to be
> upgraded, thus the second upgrade should get everything remaining
> without any problem.

Based on what I've been seeing, I would characterize it as "the number
of files corrupted is small" but doesn't have any relation to the
number being upgraded.  The max number of upgrade iterations using the
method I describe below is 2.  The max number using the corruption
prone method is ... ?

I'm just talkin' 'bout what I bin seen.

> ChrisV
>
> -----Original Message-----
> From: Andy Sharp
> Sent: Friday, February 16, 2007 2:28 PM
> To: Tim Gardner
> Cc: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Larry Scheer;
> Paul Hammer; dl-Software
> Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]
>
>
> On Fri, 16 Feb 2007 14:19:32 -0800 "Tim Gardner"
> <tim.gardner@onstor.com> wrote:
>
> > The documented procedure is to upgrade the secondary flash, run a
> > system compare, and if
> > corrupted files are found, upgrade again. Once you have a successful
> > compare, reboot from the secondary flash.
>
> What I'm concerned about is that the 'upgrade again' is still the
> corruption prone upgrade process.  It is quite possible, I might even
> hazard a 'likely', that a user will have to execute that loop many
> times before chancing on a lucky upgrade that doesn't corrupt
> anything.
>
> > -----Original Message-----
> > From: Andy Sharp
> > Sent: Friday, February 16, 2007 1:19 PM
> > To: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Tim Gardner;
> > Larry Scheer; Paul Hammer; dl-Software
> > Subject: corruption and upgrade workflow for Lambo [and 1.3.3.?]
> >
> > Howdy,
> >
> > Since I've been messing about with the upgrade code a bunch for
> > Delorean, I've been doing a lot of upgrades in the past several days
> > in the process of doing unit testing, and one thing I've noticed is
> > that upgrades from 1.3.3 to 2.2 or later always find several files
> > that are corrupted after the upgrade.
> >
> > This is because the upgrade process has a corruption problem, as we
> > all know, which was fixed in 2.2 (and possibly some version of
> > 1.3.3?). However, when you upgrade to 2.2 you use the old,
> > corruption prone, upgrade process.
> >
> > Therefore, I believe the workflow for upgrading from a
> > non-upgrade-fixed release to a fixed release requires that you
> > actually upgrade twice.  You must be running the new version when
> > you upgrade the second time.  So, for the sake of brevity, I will
> > just mention 1.3.3 -> 2.2+ in the following:
> >
> > 1.  Upgrade from 1.3.3 or 2.1 to 2.2
> > 2.  Boot 2.2
> >     Note: you may have problems at this point, since any file
> > could conceivably be corrupted, including one of the .bin boot
> > images for the TXRX or FP processors.  If necessary, log in quickly
> >     after rebooting and kill pm in order to keep the system from
> >     rebooting itself before you can execute the next step.
> > 3.  Upgrade to 2.2 again.  You may use the same tar ball you did in
> >     step 1.
> >
> > Please set aside a decent amount of time for this: upgrades in 2.2
> > are not fast.  It downloads the tarball twice and verifies the
> > entire system twice for each upgrade.  I am fixing these issues in
> > Delorean so we won't have to live with this for too terribly long.
> >
> > Cheers,
> >
> > a






------_=_NextPart_001_01C75479.350B03C4
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7652.24">
<TITLE>RE: corruption and upgrade workflow for Lambo [and =
1.3.3.?]</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<BR>

<P><FONT SIZE=3D2>Seeing a lot of mails, and am puzzled about the swap =
space limitations.<BR>
so can someone clarify a few questions ( I might be asking very silly =
questions :)<BR>
<BR>
During upgrade process, if we are short of swap,<BR>
1. why not use the swap space of secondary flash as well.<BR>
&nbsp;that gives an additional 20 mb (ofcourse i mean add swap =
dynamically)<BR>
<BR>
<BR>
<BR>
2. if we are still short, we can create a file (in /var of secondary =
flash) and use that as swap file.<BR>
&nbsp;&nbsp; we can safely claim 20 Mb atleast from the 32 Mb of =
/var.<BR>
<BR>
-jeseem<BR>
<BR>
-----Original Message-----<BR>
From: Larry Scheer<BR>
Sent: Mon 2/19/2007 2:00 PM<BR>
To: Sandrine Boulanger; Paul Hammer; Andy Sharp; Chris Vandever<BR>
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
dl-Software; Raj Kumar<BR>
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]<BR>
<BR>
I might as well add my 2 bits to this thread FWIW...<BR>
<BR>
The corruption is caused by /usr/bin/install in what appears to be a low =
memory situation when the SSC is swapping. (Andy if you have seen the =
corruption occur outside of this scenario I would like to know about =
it.) In all my hours of observing the &quot;broken&quot; install =
process, file corruption only occurred when memory used was at the point =
the SSC needed to use swap space or was already swapping. A second =
upgrade of the same release never failed to correctly upgrade the =
corrupted files on the flash.<BR>
<BR>
Andy does have a point that there are no guarantees that the broken =
upgrade will always get it &quot;right&quot; a second time, even if the =
number of files to upgrade are small.<BR>
<BR>
But, I agree with Sandrine and Eric that the current proceedure is =
fine.<BR>
<BR>
Given there are no guarantees with the old upgrade the best we can do is =
try to get the filer being upgraded into a quiescent state as much as =
possible. A reboot before upgrading the flash, if it is not in the =
recommended proceedures, should be done if a second upgrade is needed. =
It actually should be done before the initial upgrade. Killing PM would =
really clear up demands on memory use during upgrade, but I have heard =
that is undesirable for several reasons I can't remember at this =
moment.<BR>
<BR>
If anyone is absoulutely paranoid about the old upgrade program and =
wants to &quot;guarantee&quot; there are no file corruption problems =
then I suggest extracting /usr/bin/install from the 2.2 or 2.3 =
distribution and copying it to the primary flash and making sure the =
permissions are set correctly: chmod 0555 chown root:bin (-r-xr-xr-x =
root bin.) But I doubt this is acceptable to anyone in CS to be done in =
the field. Only developers, QE, or in-house installations should =
consider this as a work-around.<BR>
<BR>
Replacing /usr/bin/install with a fixed version only adresses the file =
corruption issues. There still are three or four more deficiencies with =
the broken upgrade command. I won't bore you with the details of those =
problems. If anyone wants more information on those drop me a note, or =
review the Lamborghini TOI presentation.<BR>
<BR>
Larry<BR>
<BR>
-----Original Message-----<BR>
From: Sandrine Boulanger<BR>
Sent: Sat 2/17/2007 2:14 PM<BR>
To: Paul Hammer; Andy Sharp; Chris Vandever<BR>
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
Larry Scheer; dl-Software; Raj Kumar<BR>
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]<BR>
<BR>
I think the current upgrade procedure is fine. If support finds out =
customers too often encounter the discrepancy, then we can consider a =
double upgrade. But then they completely lose their ability to go back =
since we would overwrite the original flashes.<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: Paul Hammer<BR>
Sent: Sat 2/17/2007 10:49 AM<BR>
To: Andy Sharp; Chris Vandever<BR>
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
Larry Scheer; dl-Software; Raj Kumar; Sandrine Boulanger<BR>
Subject: RE: corruption and upgrade workflow for Lambo [and 1.3.3.?]<BR>
<BR>
Adding Raj and Sandrine to the thread in case we want to consider this =
in Delorean.<BR>
<BR>
________________________________<BR>
<BR>
From: Andy Sharp<BR>
Sent: Fri 2/16/2007 6:13 PM<BR>
To: Chris Vandever<BR>
Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; =
Larry Scheer; Paul Hammer; dl-Software<BR>
Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]<BR>
<BR>
<BR>
<BR>
<BR>
On Fri, 16 Feb 2007 18:01:35 -0800 &quot;Chris Vandever&quot;<BR>
&lt;chris.vandever@onstor.com&gt; wrote:<BR>
<BR>
&gt; My understanding was that the number of files that fail the compare =
is<BR>
&gt; small in comparison with the total number of files that need to =
be<BR>
&gt; upgraded, thus the second upgrade should get everything =
remaining<BR>
&gt; without any problem.<BR>
<BR>
Based on what I've been seeing, I would characterize it as &quot;the =
number<BR>
of files corrupted is small&quot; but doesn't have any relation to =
the<BR>
number being upgraded.&nbsp; The max number of upgrade iterations using =
the<BR>
method I describe below is 2.&nbsp; The max number using the =
corruption<BR>
prone method is ... ?<BR>
<BR>
I'm just talkin' 'bout what I bin seen.<BR>
<BR>
&gt; ChrisV<BR>
&gt;<BR>
&gt; -----Original Message-----<BR>
&gt; From: Andy Sharp<BR>
&gt; Sent: Friday, February 16, 2007 2:28 PM<BR>
&gt; To: Tim Gardner<BR>
&gt; Cc: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Larry =
Scheer;<BR>
&gt; Paul Hammer; dl-Software<BR>
&gt; Subject: Re: corruption and upgrade workflow for Lambo [and =
1.3.3.?]<BR>
&gt;<BR>
&gt;<BR>
&gt; On Fri, 16 Feb 2007 14:19:32 -0800 &quot;Tim Gardner&quot;<BR>
&gt; &lt;tim.gardner@onstor.com&gt; wrote:<BR>
&gt;<BR>
&gt; &gt; The documented procedure is to upgrade the secondary flash, =
run a<BR>
&gt; &gt; system compare, and if<BR>
&gt; &gt; corrupted files are found, upgrade again. Once you have a =
successful<BR>
&gt; &gt; compare, reboot from the secondary flash.<BR>
&gt;<BR>
&gt; What I'm concerned about is that the 'upgrade again' is still =
the<BR>
&gt; corruption prone upgrade process.&nbsp; It is quite possible, I =
might even<BR>
&gt; hazard a 'likely', that a user will have to execute that loop =
many<BR>
&gt; times before chancing on a lucky upgrade that doesn't corrupt<BR>
&gt; anything.<BR>
&gt;<BR>
&gt; &gt; -----Original Message-----<BR>
&gt; &gt; From: Andy Sharp<BR>
&gt; &gt; Sent: Friday, February 16, 2007 1:19 PM<BR>
&gt; &gt; To: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Tim =
Gardner;<BR>
&gt; &gt; Larry Scheer; Paul Hammer; dl-Software<BR>
&gt; &gt; Subject: corruption and upgrade workflow for Lambo [and =
1.3.3.?]<BR>
&gt; &gt;<BR>
&gt; &gt; Howdy,<BR>
&gt; &gt;<BR>
&gt; &gt; Since I've been messing about with the upgrade code a bunch =
for<BR>
&gt; &gt; Delorean, I've been doing a lot of upgrades in the past =
several days<BR>
&gt; &gt; in the process of doing unit testing, and one thing I've =
noticed is<BR>
&gt; &gt; that upgrades from 1.3.3 to 2.2 or later always find several =
files<BR>
&gt; &gt; that are corrupted after the upgrade.<BR>
&gt; &gt;<BR>
&gt; &gt; This is because the upgrade process has a corruption problem, =
as we<BR>
&gt; &gt; all know, which was fixed in 2.2 (and possibly some version =
of<BR>
&gt; &gt; 1.3.3?). However, when you upgrade to 2.2 you use the old,<BR>
&gt; &gt; corruption prone, upgrade process.<BR>
&gt; &gt;<BR>
&gt; &gt; Therefore, I believe the workflow for upgrading from a<BR>
&gt; &gt; non-upgrade-fixed release to a fixed release requires that =
you<BR>
&gt; &gt; actually upgrade twice.&nbsp; You must be running the new =
version when<BR>
&gt; &gt; you upgrade the second time.&nbsp; So, for the sake of =
brevity, I will<BR>
&gt; &gt; just mention 1.3.3 -&gt; 2.2+ in the following:<BR>
&gt; &gt;<BR>
&gt; &gt; 1.&nbsp; Upgrade from 1.3.3 or 2.1 to 2.2<BR>
&gt; &gt; 2.&nbsp; Boot 2.2<BR>
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp; Note: you may have problems at this =
point, since any file<BR>
&gt; &gt; could conceivably be corrupted, including one of the .bin =
boot<BR>
&gt; &gt; images for the TXRX or FP processors.&nbsp; If necessary, log =
in quickly<BR>
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp; after rebooting and kill pm in order =
to keep the system from<BR>
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp; rebooting itself before you can =
execute the next step.<BR>
&gt; &gt; 3.&nbsp; Upgrade to 2.2 again.&nbsp; You may use the same tar =
ball you did in<BR>
&gt; &gt;&nbsp;&nbsp;&nbsp;&nbsp; step 1.<BR>
&gt; &gt;<BR>
&gt; &gt; Please set aside a decent amount of time for this: upgrades in =
2.2<BR>
&gt; &gt; are not fast.&nbsp; It downloads the tarball twice and =
verifies the<BR>
&gt; &gt; entire system twice for each upgrade.&nbsp; I am fixing these =
issues in<BR>
&gt; &gt; Delorean so we won't have to live with this for too terribly =
long.<BR>
&gt; &gt;<BR>
&gt; &gt; Cheers,<BR>
&gt; &gt;<BR>
&gt; &gt; a<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C75479.350B03C4--
