AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080609183715.682de365@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<sandrine.boulanger@onstor.com>,<raj.kumar@onstor.com>,<tim.gardner@onstor.com>,<jonathan.goldick@onstor.com>,<jobi.ariyamannil@onstor.com>,<james.kahn@onstor.com>,<paul.hammer@onstor.com>,<john.rogers@onstor.com>,<brian.stark@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E09CB821C@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 9 Jun 2008 18:39:51 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Sandrine Boulanger" <sandrine.boulanger@onstor.com>
Cc: "Raj Kumar" <raj.kumar@onstor.com>, "Tim Gardner"
 <tim.gardner@onstor.com>, "Jonathan Goldick" <jonathan.goldick@onstor.com>,
 "Jobi Ariyamannil" <jobi.ariyamannil@onstor.com>, "James Kahn"
 <james.kahn@onstor.com>, "Paul Hammer" <paul.hammer@onstor.com>, "John
 Rogers" <john.rogers@onstor.com>, Brian Stark <brian.stark@onstor.com>
Subject: Re: Consider #24153 for 3.3 Beta?
Message-ID: <20080609183951.3d9953d8@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09CB821C@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E09CB821C@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

I don't think there's much reason to hold beta up for this, since the
obvious "work around" for this is to upgrade the other node.  But it
sounds like this bug could stand a bit more investigation first to find
exactly why it's happening.

Cheers,

a


On Mon, 9 Jun 2008 18:26:56 -0700 "Sandrine Boulanger"
<sandrine.boulanger@onstor.com> wrote:

> TED00024153 MD Upgrade: Continous spm crashes
> 
> We may have to consider this a MF for Beta, since it seems to happen
> every time the cluster is in half-upgraded mode (one node running 3.3
> and the other one still running the old version). Manny also saw it
> when upgrading a cluster from 3.2.0.5 to 3.3.
> We have new lun states in 3.3, like foreign_free, foreign_used,
> outCluster_free, outCluster_used, that are unknown to release prior to
> 3.3. So when the 3.3 nodes send this list to the other node (which is
> PCC at that time and running the active SPM), then SPM dies saying the
> state is invalid.
> I don't know if this would happen if all the luns that the cluster
> sees are zoned (in that case, luns can be only in free or used states,
> assuming they are all labeled, and those states are common with older
> releases). That could be why HCL did not see that during their cluster
> upgrade tests (I'm investigating that part).
> I just checked sfinfo config files from mktg3 and Dogfood, it seems
> that they see only free and used states, so my statement above might
> be wrong, it might fail in any case.
> 
> A solution will involve making 3.3 lun code backward compatible, i.e.
> identify who is requesting the lun list (from which version), and send
> either the old or the new state.
