AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080923145442.0743dd25@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<john.rogers@onstor.com>,<larry.scheer@onstor.com>,<sandrine.boulanger@onstor.com>,<brian.stark@onstor.com>,<dl-mightydog-alert@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E09C429CF@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 23 Sep 2008 14:55:41 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "John Rogers" <john.rogers@onstor.com>
Cc: "Larry Scheer" <larry.scheer@onstor.com>, "Sandrine Boulanger"
 <sandrine.boulanger@onstor.com>, "Brian Stark" <brian.stark@onstor.com>,
 "dl-mightydog-alert" <dl-mightydog-alert@onstor.com>
Subject: Re: plan for debugging exim hangs on dogfood
Message-ID: <20080923145541.0d7d9df8@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09C429CF@onstor-exch02.onstor.net>
References: <20080918173836.73527af4@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E09C429CF@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Prior to doing step 2 (system compare) I would like you (John) and
Larry to audit all the files in Larry's list.  It should old.  The hosts
file listed below is definitely broken so I'm wondering what others
might be broken.  The mgmt-bus entry could easily be causing problems
as well as the other incorrect entries.



On Tue, 23 Sep 2008 09:22:36 -0700 "John Rogers"
<john.rogers@onstor.com> wrote:

>  Lets tackle step 2 today. But before we get too far down the path I
> have a few questions. The mail problem only seems to exist on node
> Dogfood and I;ve reviewed the configuration files on each system. I
> made one modification via nfxsh to the resolver. There was a mismatch
> between mktg3's resolver on ssc and dogfood's resolver on ssc. The
> modification I made was to match dogfood's ssc resolver tomktg3's
> resolver, which seemed more complete and accurate. Although there
> wasn't a blatant error or miss-configuration, for the sake of
> neatness I made that change.
> 
> I also noticed a difference in the hosts file between mktg3 and
> dogfood. Dogfood's host file seems to be in error. There is an ipv6
> statement in there and the old addresses for mgmt planes and vsvr
> paths. I would like to clean that file up and I need to know what
> should be correct.
> 
> Mktg3's host file:
> #   $OpenBSD: hosts,v 1.7 2000/08/15 09:36:34 itojun Exp $
> #
> # Host Database
> # This file should contain the addresses and aliases
> # for local hosts that share this file.
> # It is used only for "ifconfig" and other operations
> # before the nameserver is started.
> #
> # RFC 1918 specifies that these networks are "internal". They can
> # be used behind an ipnat(4) redirector.
> # 10.0.0.0      10.255.255.255
> # 172.16.0.0    172.31.255.255
> # 192.168.0.0   192.168.255.255
> #
> 127.0.0.1       localhost
> 
> 10.0.2.3 mktg3.sc0 mktg3
> 10.10.0.2 mktg3.sc1
> 
> Dogfood's host file:
> #   $OpenBSD: hosts,v 1.7 2000/08/15 09:36:34 itojun Exp $
> #
> # Host Database
> # This file should contain the addresses and aliases
> # for local hosts that share this file.
> # It is used only for "ifconfig" and other operations
> # before the nameserver is started.
> #
> # RFC 1918 specifies that these networks are "internal". They can
> # be used behind an ipnat(4) redirector.
> # 10.0.0.0      10.255.255.255
> # 172.16.0.0    172.31.255.255
> # 192.168.0.0   192.168.255.255
> #
> ::1             localhost localhost.my.domain
> 127.0.0.1       localhost localhost.my.domain
> 192.168.254.254 noname
> 192.168.192.1   sc0.mgmt-plane
> 
> 10.0.2.2 Dogfood.sc0 Dogfood
> 10.10.0.1 Dogfood.sc1 Dogfood
> 
> -----Original Message-----
> From: Andy Sharp 
> Sent: Thursday, September 18, 2008 5:39 PM
> To: John Rogers
> Cc: Larry Scheer; Sandrine Boulanger; Brian Stark
> Subject: plan for debugging exim hangs on dogfood
> 
> Step 1. Would be great if there was a dev machine that could be used
> to try and reproduce this.  Facts seem to indicate that it needs to be
> a cluster.
> 
> Step 1a.  If we can find a QA machine that demonstrates the same
> symptoms, we can use that instead of dogfood.
> 
> Step 2. I would like a system compare to be done on dogfood against a
> build of the correct version.  I want Larry to do the build and the
> system compare (or at least be present) because I know he will get
> everything precisely nailed down like the right source checked out,
> the system compare command with the right arguments and so forth, and
> can interpret the results.
> 
> Step 3. I would like to install a special version of exim on dogfood
> (or equivalent QA machine if there is one).  First I would like to
> just see if the problem reproduces with this special version I have
> put together.  If so, then I will want to run exim with gdb to debug
> some of the code paths.  I would need to be on the machine for about
> a day, but it shouldn't effect the operation of the machine except for
> autosupport emails.
> 
> I don't anticipate it being necessary, but I might need to NFS mount
> some source for gdb.
> 
> Exit strategy: restore the original version of exim unless the special
> version does not demonstrate the problem.  Unmount any extraneous NFS
> mounts.
> 
