AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080723142614.7f08c39b@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<chris.vandever@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E0AE229B3@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 23 Jul 2008 14:26:40 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Chris Vandever" <chris.vandever@onstor.com>
Subject: Re: #24654 (Cougar Migration, systems went into reboot loop for
 some time)
Message-ID: <20080723142640.55879248@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E0AE229B3@onstor-exch02.onstor.net>
References: <ONSTOR-EXCH01YfU8DN000030c2@onstor-exch01.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E0AE229B3@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Hi Chris,

I can tell you feel strongly about this, but I think you might be
barking up the wrong tree.  My guess is that you didn't get the gist of
what I was doing, nor take into account my full comments.

The bottom line is I'm trying to close this bug out because in my
estimation it's already been fixed by several checkins from multiple
people.  This after spending 30-40 minutes looking over some of the
elogs.  There were multiple separate crashes, and also some deliberate
reboots from some corners.  The crash tab was filled in after that and
only takes into account one crash/reboot.

This has nothing to do with anybody screwing up, nor did anyone.  There
also is nothing at anyone's feet, except maybe QA to try a migration
from a bobcat to a recent submittal and see if something like this
happens again.  If it does (doubtful) I promise it won't be assigned to
you.  Or blamed on you.  Everything is always blamed on me, you know
that.

Bugs are a daily part of every software engineers job, yes?  It almost
seems like you take it personally, when you really can't do that, even
if you had designed, implemented and maintained all the clustering
related code from day one.  Which isn't the case.

I hope we're good.  If not, let me know.  I would never try to throw
you, or anyone else, under a bus.  Well, maybe that one guy...

Cheers,

a


On Wed, 23 Jul 2008 14:03:08 -0700 "Chris Vandever"
<chris.vandever@onstor.com> wrote:

> Andy, I really wish you would get your facts straight before you start
> laying problems at my feet.  This is not the first time you've done
> this.  When I move a defect out of clustering I try to include
> excerpts from the elogs to support my analysis.  Sometimes I screw
> up, but at least I provide data for the next person to build upon or
> point out the error of my ways.
> 
> In this case there was a separate defect for the clustering problem on
> MD.  If you looked at the description in the defect you would see that
> the reboot problem occurred before they even configured the second
> node. Had you looked at the crash info tab you would have seen it was
> an FP crash.  Clustering cannot cause an FP crash.  Period.
> 
> If you continue to kick defects into my area without sufficient
> justification, I will simply kick them back to you and say, "Prove
> it."
> 
> ChrisV
> 
> -----Original Message-----
> From: andy.sharp@onstor.com [mailto:andy.sharp@onstor.com] 
> Sent: Wednesday, July 23, 2008 1:10 PM
> To: Jean-Paul Junod
> Cc: Andy Sharp; Andy Sharp; Chris Vandever
> Subject: Defect TED00024654 Cougar Migration, systems went into reboot
> loop for some time. Not_Reproducible
> 
> Note_Entry: 
> I seem to remember multiple cluster issues in and around,
> including one more closely identified at one or two beta
> sites.  However, I believe they have all been fixed at
> this point.
> 
> Also, the reboot loop was the downstream result of many
> issues coming together besides one or two minor cluster
> bugs that may have contributed.  The LUN masking issue was
> a contributor, the lack of an identifiable management volume,
> and a few others I can't remember.
> 
> Sending this back as NR as I believe it has been fixed in
> the course of many checkins.
> 
> Please retest as necessary.
> 
> Files_list_entry: 
> Project: 
> 
