AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<sandrine.boulanger@onstor.com>,<raj.kumar@onstor.com>,<tim.gardner@onstor.com>,<jonathan.goldick@onstor.com>,<jobi.ariyamannil@onstor.com>,<james.kahn@onstor.com>,<paul.hammer@onstor.com>,<john.rogers@onstor.com>,<brian.stark@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E09CB8220@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 9 Jun 2008 18:52:33 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Sandrine Boulanger" <sandrine.boulanger@onstor.com>
Cc: "Raj Kumar" <raj.kumar@onstor.com>, "Tim Gardner"
 <tim.gardner@onstor.com>, "Jonathan Goldick" <jonathan.goldick@onstor.com>,
 "Jobi Ariyamannil" <jobi.ariyamannil@onstor.com>, "James Kahn"
 <james.kahn@onstor.com>, "Paul Hammer" <paul.hammer@onstor.com>, "John
 Rogers" <john.rogers@onstor.com>, "Brian Stark" <brian.stark@onstor.com>
Subject: Re: Consider #24153 for 3.3 Beta?
Message-ID: <20080609185233.59a0bbeb@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09CB8220@onstor-exch02.onstor.net>
References: <20080609183951.3d9953d8@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E09CB8220@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Mon, 9 Jun 2008 18:43:02 -0700 "Sandrine Boulanger"
<sandrine.boulanger@onstor.com> wrote:

> The cluster upgrade has to happen without downtime, that's why we
> always upgrade one node at a time. So your workaround does not apply,
> unless you ask the customers to upgrade all nodes at the same time
> and reboot them at exactly the same time, which will imply 5 minutes
> of downtime.

Your talking GA, not beta though.

> -----Original Message-----
> From: Andy Sharp 
> Sent: Monday, June 09, 2008 6:40 PM
> To: Sandrine Boulanger
> Cc: Raj Kumar; Tim Gardner; Jonathan Goldick; Jobi Ariyamannil; James
> Kahn; Paul Hammer; John Rogers; Brian Stark
> Subject: Re: Consider #24153 for 3.3 Beta?
> 
> I don't think there's much reason to hold beta up for this, since the
> obvious "work around" for this is to upgrade the other node.  But it
> sounds like this bug could stand a bit more investigation first to
> find exactly why it's happening.
> 
> Cheers,
> 
> a
> 
> 
> On Mon, 9 Jun 2008 18:26:56 -0700 "Sandrine Boulanger"
> <sandrine.boulanger@onstor.com> wrote:
> 
> > TED00024153 MD Upgrade: Continous spm crashes
> > 
> > We may have to consider this a MF for Beta, since it seems to happen
> > every time the cluster is in half-upgraded mode (one node running
> > 3.3 and the other one still running the old version). Manny also
> > saw it when upgrading a cluster from 3.2.0.5 to 3.3.
> > We have new lun states in 3.3, like foreign_free, foreign_used,
> > outCluster_free, outCluster_used, that are unknown to release prior
> > to 3.3. So when the 3.3 nodes send this list to the other node
> > (which is PCC at that time and running the active SPM), then SPM
> > dies saying the state is invalid.
> > I don't know if this would happen if all the luns that the cluster
> > sees are zoned (in that case, luns can be only in free or used
> > states, assuming they are all labeled, and those states are common
> > with older releases). That could be why HCL did not see that during
> > their cluster upgrade tests (I'm investigating that part).
> > I just checked sfinfo config files from mktg3 and Dogfood, it seems
> > that they see only free and used states, so my statement above might
> > be wrong, it might fail in any case.
> > 
> > A solution will involve making 3.3 lun code backward compatible,
> > i.e. identify who is requesting the lun list (from which version),
> > and send either the old or the new state.
