AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20081126114218.6f79eb35@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<sandrine.boulanger@onstor.com>,<john.rogers@onstor.com>,<timothy.swenson@onstor.com>,<ed.kwan@onstor.com>,<dl-IT@onstor.com>,<dl-Cougar@onstor.com>,<dl-CougarCore@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB5175D5BE30C@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 26 Nov 2008 11:44:19 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Sandrine Boulanger <sandrine.boulanger@onstor.com>
Cc: John Rogers <john.rogers@onstor.com>, Timothy Swenson
 <timothy.swenson@onstor.com>, Ed Kwan <ed.kwan@onstor.com>, dl-IT
 <dl-IT@onstor.com>, dl-Cougar <dl-Cougar@onstor.com>, dl-Cougar Core Team
 <dl-CougarCore@onstor.com>
Subject: Re: ** PROBLEM alert - troll/check_syslog is CRITICAL **
Message-ID: <20081126114419.6e79965a@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB5175D5BE30C@exch1.onstor.net>
References: <2779531E7C760D4491C96305019FEEB5175DA0AD9D@exch1.onstor.net>
	<2779531E7C760D4491C96305019FEEB5175D5BE30C@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

I'm guessing those [defect 25869] were caused by exim still not working
quite right at that time.  John, get a core of cluster daemon and hav
Max take a look at it to see if he can tell us what's making the
cluster daemon have problems.

On Tue, 25 Nov 2008 18:14:11 -0800 Sandrine Boulanger
<sandrine.boulanger@onstor.com> wrote:

> Same symptoms as defect 25869 that has tcpdump and cores, it's not
> assigned yet.
> 
> -----Original Message-----
> From: John Rogers 
> Sent: Tuesday, November 25, 2008 5:53 PM
> To: Timothy Swenson; Ed Kwan
> Cc: dl-IT; dl-Cougar; dl-Cougar Core Team
> Subject: RE: ** PROBLEM alert - troll/check_syslog is CRITICAL **
> Importance: High
> 
> Havent heard anything quite yet. I'm getting the cluster2 errors
> again.
> 
> Nov 24 12:20:19 dogfood : 0:0:evm:INFO: evm_closeSess : NCM session
> closed Nov 24 12:20:19 dogfood : 0:0:evm:INFO: evm_rcvRmcMsg : NCM
> session open success Nov 24 12:20:19 dogfood : 0:0:sdm:INFO:
> sdm_rcvRmcMsg : RMC open successful (0) Nov 24 12:20:23 dogfood :
> 0:0:ea:NOTICE: ea_closeSess: NCM session closed Nov 24 12:20:24
> dogfood : 0:0:ea:NOTICE: ea_rcvRmcMsg: NCM session open success Nov
> 24 12:20:35 dogfood : 0:0:snmpd:INFO: getVolumeSummary: got rsp
> status error (0) Nov 24 12:21:08 dogfood last message repeated 14
> times Nov 24 12:21:27 mktg3 : 0:0:cluster2:INFO:
> cluster_clientSendRmcRpc: Error sending rpc to clusterrpc, flags
> 820a, name , rc -19, retrying... Nov 24 12:21:27 mktg3 :
> 0:0:cluster2:INFO: cluster_clientSendRmcRpc: Retry worked to
> clusterrpc, flags 8e02, name Nov 24 12:21:30 dogfood :
> 0:0:snmpd:NOTICE: getEnvInfo: Failed to get PS/Fan info - rc=0 Nov 24
> 12:21:37 mktg3 : 0:0:snmpd:NOTICE: getEnvInfo: Failed to get PS/Fan
> info - rc=0 Nov 24 12:22:08 dogfood : 0:0:snmpd:NOTICE: getEnvInfo:
> Failed to get PS/Fan info - rc=0 Nov 24 12:22:10 mktg3 :
> 0:0:snmpd:NOTICE: getEnvInfo: Failed to get PS/Fan info - rc=0 Nov 24
> 12:22:38 dogfood last message repeated 2 times Nov 24 12:22:38
> dogfood : 0:0:cluster2:ERROR: main: rcv ncm msg Nov 24 12:22:38
> dogfood : 0:0:evm:INFO: evm_closeSess : NCM session closed Nov 24
> 12:22:38 dogfood : 0:0:evm:INFO: evm_rcvRmcMsg : NCM session open
> success Nov 24 12:22:38 dogfood : 0:0:sdm:INFO: sdm_rcvRmcMsg : RMC
> open successful (0)
> 
> -----Original Message-----
> From: Timothy Swenson 
> Sent: Tuesday, November 25, 2008 4:17 PM
> To: Ed Kwan
> Cc: John Rogers
> Subject: FW: ** PROBLEM alert - troll/check_syslog is CRITICAL **
> 
> Can someone give John a hand with DogFood?  Has anyone looked at the
> ea core to see if this is a known issue?
> 
> Thanks,
> 
> Tim
>  
> 
> -----Original Message-----
> From: John Rogers 
> Sent: Tuesday, November 25, 2008 3:23 PM
> To: Sandrine Boulanger; 'nagios@onstor.com'; dl-mightydog-alert
> Subject: Re: ** PROBLEM alert - troll/check_syslog is CRITICAL **
> 
> I did that earlier and its happening again. Did abyone look at the
> new core. 
> 
> 
> 
> ----- Original Message -----
> From: Sandrine Boulanger
> To: John Rogers; 'nagios@onstor.com' <nagios@onstor.com>;
> dl-mightydog-alert Sent: Tue Nov 25 15:21:39 2008
> Subject: RE: ** PROBLEM alert - troll/check_syslog is CRITICAL **
> 
> Let's try to kill EA with SIGBUS this time (-7), as Danqing put in
> the notes.
> 
> -----Original Message-----
> From: John Rogers
> Sent: Tuesday, November 25, 2008 2:07 PM
> To: 'nagios@onstor.com'; dl-mightydog-alert
> Subject: Re: ** PROBLEM alert - troll/check_syslog is CRITICAL **
> 
> Nother ea issue
> 
> ----- Original Message -----
> From: nagios@onstor.com <nagios@onstor.com>
> To: dl-mightydog-alert
> Sent: Tue Nov 25 13:36:36 2008
> Subject: ** PROBLEM alert - troll/check_syslog is CRITICAL **
> 
> ***** Nagios  *****
> 
> Notification Type: PROBLEM
> 
> Service: check_syslog
> Host: troll
> Address: 10.0.0.244
> State: CRITICAL
> 
> Date/Time: Tue Nov 25 13:36:36 PST 2008
> 
> Additional Info:
> 
> CRITICAL - (130 errors in read_elog.protocol-2008-11-24-08-08-09) -
> Nov 24 08:08:08 dogfood : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[s_eng_old], rc[8]  ...
