AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20081201124133.45f4ce6c@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<ed.kwan@onstor.com>,<larry.scheer@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB5175DA0B6BE@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Mon, 1 Dec 2008 12:41:44 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Ed Kwan <ed.kwan@onstor.com>
Cc: Larry Scheer <larry.scheer@onstor.com>
Subject: Re: Please spin patch 4.0.1.0 submittal 19
Message-ID: <20081201124144.67c73b89@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB5175DA0B6BE@exch1.onstor.net>
References: <2779531E7C760D4491C96305019FEEB5175DA0B6B2@exch1.onstor.net>
	<2779531E7C760D4491C96305019FEEB5175D6201FE@exch1.onstor.net>
	<2779531E7C760D4491C96305019FEEB5175DA0B6BE@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Sorry guys, I'm out sick today.  Prognosis for tomorrow: the same.

On Mon, 1 Dec 2008 12:09:27 -0800 Ed Kwan <ed.kwan@onstor.com> wrote:

> I actually tried to call Andy 10 minutes ago about this, but he
> didn't answer his phone. I think we'll need a sub 20 for the log
> replay data corruption issue, so we can pull Andy's change in at that
> time.
> 
> _____________________________________________
> From: Larry Scheer
> Sent: Monday, December 01, 2008 12:05 PM
> To: Ed Kwan
> Cc: Andy Sharp
> Subject: RE: Please spin patch 4.0.1.0 submittal 19
> 
> Ed,
>    Andy checked in a fix for the /etc/hosts fix to be done by system
> upgrade. Should that be in submittal 19 as well? It only was checked
> into the dev branch.
> 
> Larry
> 
> _____________________________________________
> From: Ed Kwan
> Sent: Monday, December 01, 2008 12:01 PM
> To: Larry Scheer
> Cc: Chris Vandever; Sandrine Boulanger; John Rogers
> Subject: Please spin patch 4.0.1.0 submittal 19
> 
> 
> 
> _____________________________________________
> From: Chris Vandever
> Sent: Monday, December 01, 2008 11:55 AM
> To: Ed Kwan
> Subject: RE: Mightydog ea issues: new pm available
> 
> Change #31285 checked into dev & integrated as #31286 into r401rel.
> Ready to build.
> 
> _____________________________________________
> From: Sandrine Boulanger
> Sent: Sunday, November 30, 2008 1:01 PM
> To: Sandrine Boulanger; Chris Vandever; John Rogers
> Cc: Ed Kwan; Jonathan Goldick
> Subject: RE: Mightydog ea issues: new pm available
> 
> No new errors, warnings or crashes since Thursday. I checked the
> elogs again for the new messages and did not find any instance. We
> can update MD with that change, but we have no definitive answer if
> it's a fix for the ea issue. I saw MD had multiple crashes; it's
> probably always the same, like the one below, we can instrument and
> try on Cougar soak.
> 
> _____________________________________________
> From: Sandrine Boulanger
> Sent: Thursday, November 27, 2008 2:38 PM
> To: Chris Vandever; John Rogers
> Cc: Ed Kwan; Jonathan Goldick
> Subject: RE: Mightydog ea issues: new pm available
> 
> I haven't hit the condition yet; I checked elogs for any of the new
> messages below. Once again I had 2 FP crashes on g2r8 similar to
> 25716 and many others, this is another one we need to figure out very
> soon.
> 
> 
> crashdump_begin: Thu Nov 27 11:21:37 2008
> 
> 
> Crashdump:
> ----------
> ECC/Bus Error exception
> NMI : Watchdog Timeout NMI
> Image Version : NFP_FP : EverON-4.0.1.0CG : Wed Nov 19 14:41:59 2008
> PROM Version  : PROM_SIBYTE_CG : Cougar-prom-1.0.8 : Thu Jul 31
> 17:59:23 2008 Boot Time  : Wed Nov 26 17:44:20 GMT 2008
> Crash Time : Thu Nov 27 11:15:34 GMT 2008
> 
> crashdump_begin: Thu Nov 27 14:29:27 2008
> 
> 
> Crashdump:
> ----------
> ECC/Bus Error exception
> NMI : Watchdog Timeout NMI
> Image Version : NFP_FP : EverON-4.0.1.0CG : Wed Nov 19 14:41:59 2008
> PROM Version  : PROM_SIBYTE_CG : Cougar-prom-1.0.8 : Thu Jul 31
> 17:59:23 2008 Boot Time  : Thu Nov 27 11:20:45 GMT 2008
> Crash Time : Thu Nov 27 14:25:48 GMT 2008
> 
> _____________________________________________
> From: Chris Vandever
> Sent: Wednesday, November 26, 2008 5:22 PM
> To: Sandrine Boulanger; John Rogers
> Cc: Ed Kwan; Jonathan Goldick
> Subject: Mightydog ea issues: new pm available
> 
> I have a modified version of pm that will try to make sure a process
> is really dead before aborting its rmc sessions.  If the process was
> NOT really dead, it will log a message and not abort the rmc
> sessions.  I have also added an elog if we get an error from rmc on
> one of pm's sessions and have to abort it.
> 
> The new log messages are:
> 
> "pm_sess_aborting: pid %d missing from proc list; still alive"
> 
> and
> 
> "pm_handle_failure: pid %d terminated"
> 
> where %d is the pid we're having trouble with.  There should be no
> other messages from either function.
> 
> The new pm (cougar, opt, but not stripped) is in
> ~chrisv/forSandrine.  The corresponding source is in
> ~chrisv/p4/r401/nfx-tree.  (The build directory is Build-cg if you
> need it.)
> 
> I will not be available Thurs., but will be around the rest of the
> weekend if there are any issues.
> 
> ChrisV
> 
> 
