AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20070108114031.482f2c5c@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<jay.michlin@onstor.com>,<larry.scheer@onstor.com>,<tim.gardner@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E578AEE@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 8 Jan 2007 11:52:05 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Jay Michlin" <jay.michlin@onstor.com>
Cc: Larry Scheer <larry.scheer@onstor.com>, Tim Gardner
 <tim.gardner@onstor.com>
Subject: Re: Something that we might need to recommend to CS
Message-ID: <20070108115205.430505cd@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E578AEE@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E0A90EC@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E578AEE@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

It's hard for me to know what to recommend since I don't know what the
current recommended work flow is for upgrading.  I've been saying for a
couple of weeks now, and I just get blank stares back in return, which
tells me that there is something wrong with my suggestion, that mayhaps
we should recommend that they reboot *before* they upgrade.  But I
guess that suggestion might not make sense because the current work
flow of doing an upgrade already contains a reboot at some point in the
process before the upgrade takes place?

The point of rebooting is that all the daemons will have just started,
and be using minimal memory, hence allowing the upgrade to complete
unfettered by OOM problems.  I originally got this idea when I started
hearing that upgrades that had problems were "fixed" by doing the
upgrade a second time, and it seems to me that there is an implied
reboot in between the two upgrade attempts.

One thing I've noticed is that there are various memory leaks in nfxsh.
If they run nfxsh and do several operations (like system copy all -i
and so forth) and then do an upgrade, nfxsh itself might be hogging a
lot of memory by that time.  So another thought is to tell them to exit
and re-enter nfxsh before doing an upgrade.

Cheers,

a


On Sat, 6 Jan 2007 22:06:17 -0800 "Jay Michlin"
<jay.michlin@onstor.com> wrote:

> This is consistent with the point I've been making about enlisting QA
> and CS as partners in whatever fix we recommend. I think that's in
> everyone's interest, ours, QA's, CS's, the company's, and the
> customers'.
> 
> ________________________________
> 
> From: Larry Scheer
> Sent: Sat 1/6/2007 2:03 PM
> To: Jay Michlin; Tim Gardner; Andy Sharp
> Subject: Something that we might need to recommend to CS
> 
> 
> 
> I just wanted to send this idea to you for further discussion while I
> was thinking about it.
> 
> I have been testing upgrade and various upgrade simulations. One of
> the things I am seeing regularly is ssc panics when memory gets
> low/maxed.
> 
> One of the things we might want to recommend to QA and Customer
> Service is to have users shut down pm even if they are upgrading the
> secondary flash. Low memory could be one of the reasons for the file
> corruption, but that is just speculation at this time. However, the
> real issue is, because the distribution is so big a system with any
> kind of SSC memory load could  run out of memory during an install.
> 
> I am running both a debug version of BSD and NFX code so I am using
> 152 Mbytes of ram every time I run an upgrade or a simulation of
> upgrade. With every upgrade the system will start using swap and
> frequently the SSC will run out of memory and panic and this is when
> the only activity on the filer is idle ssc daemons and the upgrade
> program. (I have my virtual servers disabled on one filer all volumes
> off-line on the other.)
> 
> Let's talk some about what else can be done next week.
> 
> Larry
> 
> 
> 
> 
> 
