AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080609184431.16c92aca@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<raj.kumar@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E0422927B@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 9 Jun 2008 18:45:13 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Raj Kumar" <raj.kumar@onstor.com>
Subject: Re: Consider #24153 for 3.3 Beta?
Message-ID: <20080609184513.0ab4598e@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E0422927B@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E09CB821C@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0422927B@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Raj,

Won't this work?:

Upgrade the non-PCC nodes after moving all the vsvrs to the PCC.
Do an upgrade -s on the PCC node.
Do a reboot -s on the PCC.  Will not one of the other nodes take over
at this point?


On Mon, 9 Jun 2008 18:34:36 -0700 "Raj Kumar" <raj.kumar@onstor.com>
wrote:

> When we run into this, the cluster will be unusable. Since its spm,
> luns wont be discovered, even we can not failover vsvrs from 3204 to
> 3.3.
> 
> ________________________________
> 
> From: Sandrine Boulanger
> Sent: Mon 6/9/2008 6:26 PM
> To: Raj Kumar; Tim Gardner; Jonathan Goldick; Jobi Ariyamannil; Andy
> Sharp; James Kahn Cc: Paul Hammer; John Rogers
> Subject: Consider #24153 for 3.3 Beta?
> 
> 
> 
> TED00024153 MD Upgrade: Continous spm crashes
> 
> We may have to consider this a MF for Beta, since it seems to happen
> every time the cluster is in half-upgraded mode (one node running 3.3
> and the other one still running the old version). Manny also saw it
> when upgrading a cluster from 3.2.0.5 to 3.3.
> 
> We have new lun states in 3.3, like foreign_free, foreign_used,
> outCluster_free, outCluster_used, that are unknown to release prior
> to 3.3. So when the 3.3 nodes send this list to the other node (which
> is PCC at that time and running the active SPM), then SPM dies saying
> the state is invalid.
> 
> I don't know if this would happen if all the luns that the cluster
> sees are zoned (in that case, luns can be only in free or used
> states, assuming they are all labeled, and those states are common
> with older releases). That could be why HCL did not see that during
> their cluster upgrade tests (I'm investigating that part).
> 
> I just checked sfinfo config files from mktg3 and Dogfood, it seems
> that they see only free and used states, so my statement above might
> be wrong, it might fail in any case.
> 
> A solution will involve making 3.3 lun code backward compatible, i.e.
> identify who is requesting the lun list (from which version), and
> send either the old or the new state.
> 
