X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C73365.1B973C54@onstor-exch02.onstor.net>; Mon, 8 Jan 2007 12:39:35 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: Something that we might need to recommend to CS
Date: Mon, 8 Jan 2007 12:39:34 -0800
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E01F15E33@onstor-exch02.onstor.net>
In-Reply-To: <20070108115205.430505cd@ripper.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Something that we might need to recommend to CS
thread-index: AcczXnlPJPN0PyQHRTyimH2ZScnycAABiecA
From: "Jay Michlin" <jay.michlin@onstor.com>
To: "Andy Sharp" <andy.sharp@onstor.com>
Cc: "Larry Scheer" <larry.scheer@onstor.com>,
	"Tim Gardner" <tim.gardner@onstor.com>

Andy,

The reason you're getting blank stares is not that there is anything
wrong with your suggestion. Rather, the reason is that we need to make
the suggestion when the right set of people are in the room. Development
alone, even with QA can't decide. It takes Customer Support along with
us. And since there aren't very many meetings defined where all these
people come together, mostly you get blank stares. That's why I keep
emphasizing that in the issue at hand we must find a way to get
Development, QA and CS working together. And I expect to be the one to
drive this. I just haven't got it done quite yet.

jay=20

-----Original Message-----
From: Andrew Sharp [mailto:andy.sharp@onstor.com]=20
Sent: Monday, January 08, 2007 11:52 AM
To: Jay Michlin
Cc: Larry Scheer; Tim Gardner
Subject: Re: Something that we might need to recommend to CS

It's hard for me to know what to recommend since I don't know what the
current recommended work flow is for upgrading.  I've been saying for a
couple of weeks now, and I just get blank stares back in return, which
tells me that there is something wrong with my suggestion, that mayhaps
we should recommend that they reboot *before* they upgrade.  But I guess
that suggestion might not make sense because the current work flow of
doing an upgrade already contains a reboot at some point in the process
before the upgrade takes place?

The point of rebooting is that all the daemons will have just started,
and be using minimal memory, hence allowing the upgrade to complete
unfettered by OOM problems.  I originally got this idea when I started
hearing that upgrades that had problems were "fixed" by doing the
upgrade a second time, and it seems to me that there is an implied
reboot in between the two upgrade attempts.

One thing I've noticed is that there are various memory leaks in nfxsh.
If they run nfxsh and do several operations (like system copy all -i and
so forth) and then do an upgrade, nfxsh itself might be hogging a lot of
memory by that time.  So another thought is to tell them to exit and
re-enter nfxsh before doing an upgrade.

Cheers,

a


On Sat, 6 Jan 2007 22:06:17 -0800 "Jay Michlin"
<jay.michlin@onstor.com> wrote:

> This is consistent with the point I've been making about enlisting QA=20
> and CS as partners in whatever fix we recommend. I think that's in=20
> everyone's interest, ours, QA's, CS's, the company's, and the=20
> customers'.
>=20
> ________________________________
>=20
> From: Larry Scheer
> Sent: Sat 1/6/2007 2:03 PM
> To: Jay Michlin; Tim Gardner; Andy Sharp
> Subject: Something that we might need to recommend to CS
>=20
>=20
>=20
> I just wanted to send this idea to you for further discussion while I=20
> was thinking about it.
>=20
> I have been testing upgrade and various upgrade simulations. One of=20
> the things I am seeing regularly is ssc panics when memory gets=20
> low/maxed.
>=20
> One of the things we might want to recommend to QA and Customer=20
> Service is to have users shut down pm even if they are upgrading the=20
> secondary flash. Low memory could be one of the reasons for the file=20
> corruption, but that is just speculation at this time. However, the=20
> real issue is, because the distribution is so big a system with any=20
> kind of SSC memory load could  run out of memory during an install.
>=20
> I am running both a debug version of BSD and NFX code so I am using
> 152 Mbytes of ram every time I run an upgrade or a simulation of=20
> upgrade. With every upgrade the system will start using swap and=20
> frequently the SSC will run out of memory and panic and this is when=20
> the only activity on the filer is idle ssc daemons and the upgrade=20
> program. (I have my virtual servers disabled on one filer all volumes=20
> off-line on the other.)
>=20
> Let's talk some about what else can be done next week.
>=20
> Larry
>=20
>=20
>=20
>=20
>=20
