AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20070219150037.7a1c5502@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<jeseem@onstor.com>,<larry.scheer@onstor.com>,<sandrine.boulanger@onstor.com>,<paul.hammer@onstor.com>,<chris.vandever@onstor.com>,<tim.gardner@onstor.com>,<caeli.collins@onstor.com>,<eric.barrett@onstor.com>,<ed.kwan@onstor.com>,<jay.michlin@onstor.com>,<dl-software@onstor.com>,<raj.kumar@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E028D1B@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 19 Feb 2007 15:01:12 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Shamsudeen Jeseem" <jeseem@onstor.com>
Cc: "Larry Scheer" <larry.scheer@onstor.com>, "Sandrine Boulanger"
 <sandrine.boulanger@onstor.com>, "Paul Hammer" <paul.hammer@onstor.com>,
 "Chris Vandever" <chris.vandever@onstor.com>, "Tim Gardner"
 <tim.gardner@onstor.com>, "Caeli Collins" <caeli.collins@onstor.com>, "Eric
 Barrett" <eric.barrett@onstor.com>, "Ed Kwan" <ed.kwan@onstor.com>, "Jay
 Michlin" <jay.michlin@onstor.com>, "dl-Software" <dl-software@onstor.com>,
 "Raj Kumar" <raj.kumar@onstor.com>
Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]
Message-ID: <20070219150112.417396a9@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E028D1B@onstor-exch02.onstor.net>
References: <20070216142730.10a7902b@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0138C465@onstor-exch02.onstor.net>
	<20070216181346.14f6cbe5@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E023B314D@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E02176022@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0A9138@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E028D1B@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

We've fixed the upgrade process already, the problem is how do
we reliably get a fixed version of the software on the filer.

And, yes, I've seen plenty of corruption without any memory pressure.

a


 On Mon, 19 Feb 2007 14:56:36 -0800 "Shamsudeen Jeseem"
<jeseem@onstor.com> wrote:

> 
> Seeing a lot of mails, and am puzzled about the swap space
> limitations. so can someone clarify a few questions ( I might be
> asking very silly questions :) 
> 
> During upgrade process, if we are short of swap,
> 1. why not use the swap space of secondary flash as well. 
>  that gives an additional 20 mb (ofcourse i mean add swap dynamically)
> 
> 
> 
> 2. if we are still short, we can create a file (in /var of secondary
> flash) and use that as swap file. we can safely claim 20 Mb atleast
> from the 32 Mb of /var.
> 
> -jeseem
> 
> -----Original Message-----
> From: Larry Scheer
> Sent: Mon 2/19/2007 2:00 PM
> To: Sandrine Boulanger; Paul Hammer; Andy Sharp; Chris Vandever
> Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin;
> dl-Software; Raj Kumar Subject: RE: corruption and upgrade workflow
> for Lambo [and 1.3.3.?] 
> I might as well add my 2 bits to this thread FWIW...
>  
> The corruption is caused by /usr/bin/install in what appears to be a
> low memory situation when the SSC is swapping. (Andy if you have seen
> the corruption occur outside of this scenario I would like to know
> about it.) In all my hours of observing the "broken" install process,
> file corruption only occurred when memory used was at the point the
> SSC needed to use swap space or was already swapping. A second
> upgrade of the same release never failed to correctly upgrade the
> corrupted files on the flash.
> 
> Andy does have a point that there are no guarantees that the broken
> upgrade will always get it "right" a second time, even if the number
> of files to upgrade are small.
> 
> But, I agree with Sandrine and Eric that the current proceedure is
> fine.
> 
> Given there are no guarantees with the old upgrade the best we can do
> is try to get the filer being upgraded into a quiescent state as much
> as possible. A reboot before upgrading the flash, if it is not in the
> recommended proceedures, should be done if a second upgrade is
> needed. It actually should be done before the initial upgrade.
> Killing PM would really clear up demands on memory use during
> upgrade, but I have heard that is undesirable for several reasons I
> can't remember at this moment. 
> 
> If anyone is absoulutely paranoid about the old upgrade program and
> wants to "guarantee" there are no file corruption problems then I
> suggest extracting /usr/bin/install from the 2.2 or 2.3 distribution
> and copying it to the primary flash and making sure the permissions
> are set correctly: chmod 0555 chown root:bin (-r-xr-xr-x root bin.)
> But I doubt this is acceptable to anyone in CS to be done in the
> field. Only developers, QE, or in-house installations should consider
> this as a work-around. 
> 
> Replacing /usr/bin/install with a fixed version only adresses the
> file corruption issues. There still are three or four more
> deficiencies with the broken upgrade command. I won't bore you with
> the details of those problems. If anyone wants more information on
> those drop me a note, or review the Lamborghini TOI presentation.
> 
> Larry
> 
> -----Original Message-----
> From: Sandrine Boulanger
> Sent: Sat 2/17/2007 2:14 PM
> To: Paul Hammer; Andy Sharp; Chris Vandever
> Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin;
> Larry Scheer; dl-Software; Raj Kumar Subject: RE: corruption and
> upgrade workflow for Lambo [and 1.3.3.?] 
> I think the current upgrade procedure is fine. If support finds out
> customers too often encounter the discrepancy, then we can consider a
> double upgrade. But then they completely lose their ability to go
> back since we would overwrite the original flashes.
> 
> 
> -----Original Message-----
> From: Paul Hammer
> Sent: Sat 2/17/2007 10:49 AM
> To: Andy Sharp; Chris Vandever
> Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin;
> Larry Scheer; dl-Software; Raj Kumar; Sandrine Boulanger Subject: RE:
> corruption and upgrade workflow for Lambo [and 1.3.3.?] 
> Adding Raj and Sandrine to the thread in case we want to consider
> this in Delorean.
> 
> ________________________________
> 
> From: Andy Sharp
> Sent: Fri 2/16/2007 6:13 PM
> To: Chris Vandever
> Cc: Tim Gardner; Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin;
> Larry Scheer; Paul Hammer; dl-Software Subject: Re: corruption and
> upgrade workflow for Lambo [and 1.3.3.?]
> 
> 
> 
> 
> On Fri, 16 Feb 2007 18:01:35 -0800 "Chris Vandever"
> <chris.vandever@onstor.com> wrote:
> 
> > My understanding was that the number of files that fail the compare
> > is small in comparison with the total number of files that need to
> > be upgraded, thus the second upgrade should get everything remaining
> > without any problem.
> 
> Based on what I've been seeing, I would characterize it as "the number
> of files corrupted is small" but doesn't have any relation to the
> number being upgraded.  The max number of upgrade iterations using the
> method I describe below is 2.  The max number using the corruption
> prone method is ... ?
> 
> I'm just talkin' 'bout what I bin seen.
> 
> > ChrisV
> >
> > -----Original Message-----
> > From: Andy Sharp
> > Sent: Friday, February 16, 2007 2:28 PM
> > To: Tim Gardner
> > Cc: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Larry Scheer;
> > Paul Hammer; dl-Software
> > Subject: Re: corruption and upgrade workflow for Lambo [and 1.3.3.?]
> >
> >
> > On Fri, 16 Feb 2007 14:19:32 -0800 "Tim Gardner"
> > <tim.gardner@onstor.com> wrote:
> >
> > > The documented procedure is to upgrade the secondary flash, run a
> > > system compare, and if
> > > corrupted files are found, upgrade again. Once you have a
> > > successful compare, reboot from the secondary flash.
> >
> > What I'm concerned about is that the 'upgrade again' is still the
> > corruption prone upgrade process.  It is quite possible, I might
> > even hazard a 'likely', that a user will have to execute that loop
> > many times before chancing on a lucky upgrade that doesn't corrupt
> > anything.
> >
> > > -----Original Message-----
> > > From: Andy Sharp
> > > Sent: Friday, February 16, 2007 1:19 PM
> > > To: Caeli Collins; Eric Barrett; Ed Kwan; Jay Michlin; Tim
> > > Gardner; Larry Scheer; Paul Hammer; dl-Software
> > > Subject: corruption and upgrade workflow for Lambo [and 1.3.3.?]
> > >
> > > Howdy,
> > >
> > > Since I've been messing about with the upgrade code a bunch for
> > > Delorean, I've been doing a lot of upgrades in the past several
> > > days in the process of doing unit testing, and one thing I've
> > > noticed is that upgrades from 1.3.3 to 2.2 or later always find
> > > several files that are corrupted after the upgrade.
> > >
> > > This is because the upgrade process has a corruption problem, as
> > > we all know, which was fixed in 2.2 (and possibly some version of
> > > 1.3.3?). However, when you upgrade to 2.2 you use the old,
> > > corruption prone, upgrade process.
> > >
> > > Therefore, I believe the workflow for upgrading from a
> > > non-upgrade-fixed release to a fixed release requires that you
> > > actually upgrade twice.  You must be running the new version when
> > > you upgrade the second time.  So, for the sake of brevity, I will
> > > just mention 1.3.3 -> 2.2+ in the following:
> > >
> > > 1.  Upgrade from 1.3.3 or 2.1 to 2.2
> > > 2.  Boot 2.2
> > >     Note: you may have problems at this point, since any file
> > > could conceivably be corrupted, including one of the .bin boot
> > > images for the TXRX or FP processors.  If necessary, log in
> > > quickly after rebooting and kill pm in order to keep the system
> > > from rebooting itself before you can execute the next step.
> > > 3.  Upgrade to 2.2 again.  You may use the same tar ball you did
> > > in step 1.
> > >
> > > Please set aside a decent amount of time for this: upgrades in 2.2
> > > are not fast.  It downloads the tarball twice and verifies the
> > > entire system twice for each upgrade.  I am fixing these issues in
> > > Delorean so we won't have to live with this for too terribly long.
> > >
> > > Cheers,
> > >
> > > a
> 
> 
> 
> 
> 
