AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20081111101517.0da7a7b0@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<sandrine.boulanger@onstor.com>,<john.rogers@onstor.com>,<dl-CougarCore@onstor.com>,<dl-mightydog-alert@onstor.com>,<ed.kwan@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB5175D5BE228@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 11 Nov 2008 10:15:57 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Sandrine Boulanger <sandrine.boulanger@onstor.com>
Cc: John Rogers <john.rogers@onstor.com>, dl-Cougar Core Team
 <dl-CougarCore@onstor.com>, dl-mightydog-alert
 <dl-mightydog-alert@onstor.com>, Ed Kwan <ed.kwan@onstor.com>
Subject: Re: Status of  R4.0.1.0 Submittal 17 on Cougar soak
Message-ID: <20081111101557.7218e512@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB5175D5BE228@exch1.onstor.net>
References: <2779531E7C760D4491C96305019FEEB5175D5BE225@exch1.onstor.net>
	<2779531E7C760D4491C96305019FEEB5175D5BE228@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

I don't know if it's coincidence, but 6.25am is log rolling time.  As
for what might be causing the errors in the first place, I'm not the
person to ask.  Max or Chris might know more.

On Tue, 11 Nov 2008 10:08:54 -0800 Sandrine Boulanger
<sandrine.boulanger@onstor.com> wrote:

> I just noticed that all the errors stopped around 6:30 on all nodes,
> so I'm not sure what triggered and stopped this behavior. Right now
> there are no errors.
> 
> ________________________________
> From: Sandrine Boulanger
> Sent: Tuesday, November 11, 2008 9:45 AM
> To: Sandrine Boulanger; John Rogers; dl-Cougar Core Team;
> dl-mightydog-alert Cc: Ed Kwan
> Subject: RE: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> We made progress in the sense that there are no hung exim processes
> this morning, and all nodes except g11r10 have 0 frozen messages.
> g11r10:/var/log/onstor# exiqgrep -z -c 29 matches out of 37 messages
> 
> g2r8:/var/log/onstor# ps ax | grep exim
>  1318 ?        Ss     0:00 /usr/sbin/exim4 -bd -q30m
>  9709 pts/0    R+     0:00 grep exim
> g2r8:/var/log/onstor#
> 
> But all nodes are showing errors after 3 hours, like below. Andy, is
> there anything we can look at in terms of resources that would
> explain those errors?
> 
> 
> 
> 
> g2r8:/var/log/onstor# tail -f messages |grep -i error
> 
> 
> 
> 
> Nov 10 20:00:32 g2r8 : 0:0:evm:ERROR: evm_allocLunReq: Allocate LUN
> failed for volume[g2r8-vs1-vol2], rc[9] Nov 10 20:00:32 g2r8 :
> 0:0:evm:ERROR: evm_growVol: Error allocating LUN's for
> volume[g2r8-vs1-vol2] Nov 10 20:00:32 g2r8 : 0:0:evm:ERROR:
> evm_fsysFullReqProc: Src volume grow failed for
> filesystem[g2r8-vs1-vol2] Nov 10 20:01:16 g2r8 : 1:5:efs:ERROR: 916:
> FS: g2r8-vs1-vol2-m 0x1079000000130 - amDeltaInstallSetup - mirror -
> FS AM TARGET: no snapshot to map to live (last valid snap id 10);
> baseline will be needed Nov 10 20:01:16 g2r8 : 1:3:efs:ERROR: 917:
> FS: g2r8-vs1-vol2-m 0x1079000000130 - amDeltaInstallSetup - mirror -
> FS AM TARGET: Invalidate cache failed with fs_status 49 Nov 10
> 20:01:16 g2r8 : 1:3:sanm_ag:WARNING: 918:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v2] delta install
> setup rsp error 49. Nov 10 20:01:21 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v2] Nov 10 20:01:49 g2r8 :
> 1:3:efs:ERROR: 973: FS: g2r8-vs1-vol1-m 0x107900000012f -
> amDeltaInstallSetup - mirror - FS AM TARGET: no snapshot to map to
> live (last valid snap id 9); baseline will be needed Nov 10 20:01:49
> g2r8 : 1:2:efs:ERROR: 974: FS: g2r8-vs1-vol1-m 0x107900000012f -
> amDeltaInstallSetup - mirror - FS AM TARGET: Invalidate cache failed
> with fs_status 49 Nov 10 20:01:49 g2r8 : 1:2:sanm_ag:WARNING: 975:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v1] delta install
> setup rsp error 49. Nov 10 20:02:05 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v1] Nov 10 22:00:54 g2r8 :
> 1:3:efs:ERROR: 1465: FS: g2r8-vs1-vol1-m 0x107900000012f -
> amDeltaInstallSetup - mirror - FS AM TARGET: no snapshot to map to
> live (last valid snap id 9); baseline will be needed Nov 10 22:00:54
> g2r8 : 1:3:efs:ERROR: 1466: FS: g2r8-vs1-vol1-m 0x107900000012f -
> amDeltaInstallSetup - mirror - FS AM TARGET: Invalidate cache failed
> with fs_status 49 Nov 10 22:00:54 g2r8 : 1:3:sanm_ag:WARNING: 1467:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v1] delta install
> setup rsp error 49. Nov 10 22:01:05 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v1] Nov 10 22:35:11 g2r8 :
> 0:0:evm:ERROR: evm_allocLunReq: Allocate LUN failed for
> volume[g2r8-vs1-vol2], rc[9] Nov 10 22:35:11 g2r8 : 0:0:evm:ERROR:
> evm_growVol: Error allocating LUN's for volume[g2r8-vs1-vol2] Nov 10
> 22:35:11 g2r8 : 0:0:evm:ERROR: evm_fsysFullReqProc: Src volume grow
> failed for filesystem[g2r8-vs1-vol2] Nov 10 23:01:03 g2r8 :
> 1:2:efs:ERROR: 1748: FS: g2r8-vs1-vol2-m 0x1079000000130 -
> amDeltaInstallSetup - mirror - FS AM TARGET: no snapshot to map to
> live (last valid snap id 10); baseline will be needed Nov 10 23:01:03
> g2r8 : 1:5:efs:ERROR: 1749: FS: g2r8-vs1-vol2-m 0x1079000000130 -
> amDeltaInstallSetup - mirror - FS AM TARGET: Invalidate cache failed
> with fs_status 49 Nov 10 23:01:03 g2r8 : 1:5:sanm_ag:WARNING: 1750:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v2] delta install
> setup rsp error 49. Nov 10 23:01:06 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v2] Nov 10 23:06:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 10
> 23:06:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 10 23:10:27 g2r8 : 0:0:auth_agent:ERROR:
> authen_ldapVsGetDomain[5117]: Cannot get LDAP domain info for VS=11.
> Failed to find NISD. Nov 10 23:10:39 g2r8 : 0:0:auth_agent:ERROR:
> authen_ldapVsGetDomain[5127]: Cannot get LDAP domain info for VS=13.
> Not associated with an LDAP domain. Nov 10 23:12:21 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 10
> 23:12:21 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 10 23:18:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 10 23:18:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 10 23:30:14 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 10 23:30:14 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 10 23:30:17 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 10 23:30:17 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 10 23:36:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 10 23:36:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 10 23:48:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 10 23:48:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 10 23:54:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 10 23:54:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 00:06:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 00:06:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 00:12:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 00:12:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 00:18:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 00:18:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 00:30:18 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 00:30:18 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 00:30:31 g2r8 : 0:0:cluster2:INFO:
> cluster_clientSendRmcRpc: Error sending rpc to clusterrpc, flags
> 820a, name nfxsh-6609, rc -19, retrying... Nov 11 00:36:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 00:36:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 00:42:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 00:42:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 00:48:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 00:48:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 00:54:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 00:54:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:00:17 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:00:17 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:06:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:06:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:14:29 g2r8 : 0:0:evm:ERROR: evm_allocLunReq:
> Allocate LUN failed for volume[g2r8-vs1-vol2], rc[9] Nov 11 01:14:29
> g2r8 : 0:0:evm:ERROR: evm_growVol: Error allocating LUN's for
> volume[g2r8-vs1-vol2] Nov 11 01:14:30 g2r8 : 0:0:evm:ERROR:
> evm_fsysFullReqProc: Src volume grow failed for
> filesystem[g2r8-vs1-vol2] Nov 11 01:18:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:18:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:24:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:24:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:30:14 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:30:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:30:18 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:30:18 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:36:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:36:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:48:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:48:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 01:54:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 01:54:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 02:00:20 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 02:00:20 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 02:02:14 g2r8 : 1:2:efs:ERROR: 2325: FS:
> g2r8-vs1-vol1-m 0x107900000012f - amDeltaInstallSetup - mirror - FS
> AM TARGET: no snapshot to map to live (last valid snap id 9);
> baseline will be needed Nov 11 02:02:14 g2r8 : 1:5:efs:ERROR: 2326:
> FS: g2r8-vs1-vol1-m 0x107900000012f - amDeltaInstallSetup - mirror -
> FS AM TARGET: Invalidate cache failed with fs_status 49 Nov 11
> 02:02:14 g2r8 : 1:5:sanm_ag:WARNING: 2327:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v1] delta install
> setup rsp error 49. Nov 11 02:02:24 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v1] Nov 11 02:06:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 02:06:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 02:12:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 02:12:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 02:18:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 02:18:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 02:24:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 02:24:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 02:30:19 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 02:30:19 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 02:30:32 g2r8 : 0:0:cluster2:INFO:
> cluster_clientSendRmcRpc: Error sending rpc to clusterrpc, flags
> 820a, name nfxsh-2614, rc -19, retrying... Nov 11 02:42:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 02:42:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 02:54:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 02:54:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 03:18:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 03:18:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 03:24:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 03:24:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 03:30:19 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 03:30:19 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 03:30:31 g2r8 : 0:0:cluster2:INFO:
> cluster_clientSendRmcRpc: Error sending rpc to clusterrpc, flags
> 820a, name nfxsh-17336, rc -19, retrying... Nov 11 03:36:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 03:36:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 03:42:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 03:42:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 03:48:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 03:48:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 04:00:19 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 04:00:19 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 04:01:38 g2r8 : 1:4:efs:ERROR: 2862: FS:
> g2r8-vs1-vol1-m 0x107900000012f - amDeltaInstallSetup - mirror - FS
> AM TARGET: no snapshot to map to live (last valid snap id 9);
> baseline will be needed Nov 11 04:01:38 g2r8 : 1:4:efs:ERROR: 2863:
> FS: g2r8-vs1-vol1-m 0x107900000012f - amDeltaInstallSetup - mirror -
> FS AM TARGET: Invalidate cache failed with fs_status 49 Nov 11
> 04:01:38 g2r8 : 1:4:sanm_ag:WARNING: 2864:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v1] delta install
> setup rsp error 49. Nov 11 04:01:48 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v1] Nov 11 04:04:31 g2r8 :
> 0:0:evm:ERROR: evm_allocLunReq: Allocate LUN failed for
> volume[g2r8-vs1-vol1], rc[9] Nov 11 04:04:31 g2r8 : 0:0:evm:ERROR:
> evm_growVol: Error allocating LUN's for volume[g2r8-vs1-vol1] Nov 11
> 04:04:31 g2r8 : 0:0:evm:ERROR: evm_fsysFullReqProc: Src volume grow
> failed for filesystem[g2r8-vs1-vol1] Nov 11 04:06:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 04:06:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 04:18:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 04:18:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 04:24:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 04:24:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 04:30:20 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 04:30:20 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 04:30:32 g2r8 : 0:0:cluster2:INFO:
> cluster_clientSendRmcRpc: Error sending rpc to clusterrpc, flags
> 820a, name nfxsh-31856, rc -19, retrying... Nov 11 04:33:20 g2r8 :
> 0:0:evm:ERROR: evm_allocLunReq: Allocate LUN failed for
> volume[g2r8-vs1-vol2], rc[9] Nov 11 04:33:20 g2r8 : 0:0:evm:ERROR:
> evm_growVol: Error allocating LUN's for volume[g2r8-vs1-vol2] Nov 11
> 04:33:20 g2r8 : 0:0:evm:ERROR: evm_fsysFullReqProc: Src volume grow
> failed for filesystem[g2r8-vs1-vol2] Nov 11 04:36:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 04:36:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 04:42:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 04:42:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 04:48:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 04:48:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 04:54:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 04:54:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 05:00:33 g2r8 : 1:4:efs:ERROR: 3133: FS:
> g2r8-vs1-vol2-m 0x1079000000130 - amDeltaInstallSetup - mirror - FS
> AM TARGET: no snapshot to map to live (last valid snap id 10);
> baseline will be needed Nov 11 05:00:33 g2r8 : 1:2:efs:ERROR: 3134:
> FS: g2r8-vs1-vol2-m 0x1079000000130 - amDeltaInstallSetup - mirror -
> FS AM TARGET: Invalidate cache failed with fs_status 49 Nov 11
> 05:00:33 g2r8 : 1:2:sanm_ag:WARNING: 3135:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v2] delta install
> setup rsp error 49. Nov 11 05:00:43 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v2] Nov 11 05:06:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 05:06:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 05:12:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 05:12:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 05:18:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 05:18:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 05:30:19 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 05:30:19 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 05:30:31 g2r8 : 0:0:cluster2:INFO:
> cluster_clientSendRmcRpc: Error sending rpc to clusterrpc, flags
> 820a, name nfxsh-13930, rc -19, retrying... Nov 11 05:36:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 05:36:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 05:42:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 05:42:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 05:48:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 05:48:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 05:54:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 05:54:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 06:00:19 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 06:00:19 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 06:01:04 g2r8 : 1:3:efs:ERROR: 3416: FS:
> g2r8-vs1-vol1-m 0x107900000012f - amDeltaInstallSetup - mirror - FS
> AM TARGET: no snapshot to map to live (last valid snap id 9);
> baseline will be needed Nov 11 06:01:04 g2r8 : 1:4:efs:ERROR: 3417:
> FS: g2r8-vs1-vol1-m 0x107900000012f - amDeltaInstallSetup - mirror -
> FS AM TARGET: Invalidate cache failed with fs_status 49 Nov 11
> 06:01:05 g2r8 : 1:4:sanm_ag:WARNING: 3418:
> sanm_agProcDeltaInstallSetupRsp: mirror [g2r8-v1v1] delta install
> setup rsp error 49. Nov 11 06:01:13 g2r8 : 0:0:sanm:ERROR: SANM:
> Aborting session for mirror[g2r8-v1v1] Nov 11 06:06:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no reply bck -1 Nov 11
> 06:06:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot
> get cluster rec, code 30 Nov 11 06:18:16 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 06:18:16 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30 Nov 11 06:24:15 g2r8 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 Nov 11 06:24:15 g2r8 :
> 0:0:cluster2:ERROR: cluster_getFilerNameList: cannot get cluster rec,
> code 30
> 
> [1]+  Stopped                 tail -f messages | grep -i error
> 
> ________________________________
> From: Sandrine Boulanger
> Sent: Monday, November 10, 2008 5:57 PM
> To: Sandrine Boulanger; John Rogers; dl-Cougar Core Team;
> dl-mightydog-alert Cc: Ed Kwan
> Subject: RE: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> A few changes later...
> A new change has been installed on all 4 nodes of the Cougar soak
> today. After rebooting them all (about an hour ago), so far they are
> behaving. No exim process hung, autosupport messages are sent, and no
> cluster2 errors so far (crossing fingers). We'll let this run
> overnight and I'll send an update tomorrow morning.
> 
> ________________________________
> From: Sandrine Boulanger
> Sent: Saturday, November 08, 2008 11:38 AM
> To: Sandrine Boulanger; John Rogers; dl-Cougar Core Team;
> dl-mightydog-alert Cc: Ed Kwan
> Subject: RE: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> This morning, 3 out of 4 nodes have many exim4 processes hung, and
> one of them is getting "mta queue full" and is no longer sending
> autosupport emails. I just updated the /etc/hosts file of each node
> to 127.0.0.1 localhost <sc0 ip> nodename nodename.sc0
> as Andy recommended. I'm waiting for instructions to proceed further.
> 
> 
> ________________________________
> From: Sandrine Boulanger
> Sent: Friday, November 07, 2008 5:06 PM
> To: Sandrine Boulanger; John Rogers; dl-Cougar Core Team;
> dl-mightydog-alert Cc: Ed Kwan
> Subject: RE: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> 2 out of 4 nodes have been rebooted since we tested crashdump panic
> on those nodes. Looking at elogs, the cluster errors were gone. The
> latest exim package was installed after sub#17, and installing a
> package does not require a reboot. However, Andy suspects there could
> have been something leftover so I also rebooted the other 2 nodes.
> I'll keep monitoring the 4 nodes.
> 
> PS: Raj, since g12r10 does not see any luns, it kept complaining
> about the core and mgmt volumes. I force deleted those to clear the
> elog and be able to monitor things more easily. When we figure out
> why sp2.0 is down on this node, we'll need to re-create them.
> 
> ________________________________
> From: Sandrine Boulanger
> Sent: Friday, November 07, 2008 1:39 PM
> To: John Rogers; dl-Cougar Core Team; dl-mightydog-alert
> Subject: RE: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> FP crash can be ignored, Raj had run a "crashdump panic" to test core
> generation since on MD it took too long on one node. Core generation
> works fine on Cougar soak, and it worked on MD too on mktg3.
> 
> ________________________________
> From: John Rogers
> Sent: Friday, November 07, 2008 1:05 PM
> To: Sandrine Boulanger; dl-Cougar Core Team; dl-mightydog-alert
> Subject: Re: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> 
> Fantastic news!
> 
> ________________________________
> From: Sandrine Boulanger
> To: Sandrine Boulanger; dl-Cougar Core Team
> Sent: Fri Nov 07 12:26:19 2008
> Subject: RE: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> It looks like we reproduced similar behavior than MD on Cougar soak,
> which is running sub#17 and latest exim4 package.
> 
> On g2r8 - There was  FP crash this morning. One of the CPU had
> autoreboot off so it did not restart by itself, so I rebooted it.
> I'll see what I can get from the core.
> 
> Nov  7 10:06:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 10:12:15 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 10:12:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 10:18:15 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 10:18:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 10:24:15 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 10:24:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 10:30:19 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 10:30:19 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 10:30:31 g2r8 : 0:0:cluster2:INFO: cluster_clientSendRmcRpc:
> Error sending rpc to clusterrpc, flags 820a, name nfxsh-19988, rc
> -19, retrying...
> 
> Nov  7 10:42:15 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 10:42:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 10:48:15 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 10:48:15 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 11:00:16 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 11:00:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 11:06:16 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 11:06:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 11:12:16 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 11:12:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 11:18:16 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 11:18:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 11:30:13 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 11:30:13 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 11:30:16 g2r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 11:30:16 g2r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 11:41:25 g2r8 : 0:0:sanm:ERROR: SANM: FP NIM down. Aborting
> all mirror sessions.
> 
> Nov  7 11:41:25 g2r8 : 0:0:sanm:ERROR: SANM: FP NIM down. Aborting
> all mirror sessions.
> 
> On g1r8
> 
> Nov  6 16:30:19 g1r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  6 16:30:19 g1r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  6 16:30:31 g1r8 : 0:0:cluster2:INFO: cluster_clientSendRmcRpc:
> Error sending rpc to clusterrpc, flags 820a, name nfxsh-12633, rc
> -19, retrying...
> 
> Nov  6 16:31:11 g1r8 : 0:0:snmpd:INFO: getVolumeDetail: got bad rsp
> error (type=8315 volId=0)
> 
> Nov  6 16:31:11 g1r8 : 0:0:snmpd:INFO: getVolumeDetail: got bad rsp
> error (type=8315 volId=0)
> 
> Nov  6 16:31:11 g1r8 : 0:0:snmpd:INFO: getVolumeDetail: got bad rsp
> error (type=8315 volId=0)
> 
> ...
> 
> Nov  7 12:06:16 g1r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 12:06:16 g1r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> Nov  7 12:18:16 g1r8 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey:
> no reply bck -1
> 
> Nov  7 12:18:16 g1r8 : 0:0:cluster2:ERROR: cluster_getFilerNameList:
> cannot get cluster rec, code 30
> 
> G1r1 volume show is failing, likely because of those ea errors:
> 
> Nov  7 12:20:57 g11r10 : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[g1r8-vs1-vol1], rc[8]
> 
> Nov  7 12:21:07 g11r10 : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[g1r8-vs1-vol1], rc[8]
> 
> Nov  7 12:21:07 g11r10 : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[g1r8-vs1-vol1], rc[8]
> 
> Nov  7 12:21:07 g11r10 : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[g1r8-vs1-vol1], rc[8]
> 
> Nov  7 12:21:17 g11r10 : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[g1r8-vs1-vol1], rc[8]
> 
> Nov  7 12:21:17 g11r10 : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[g1r8-vs1-vol1], rc[8]
> 
> Nov  7 12:21:17 g11r10 : 0:0:ea:ERROR: ea_getRunTimeVolInfo[1881]:
> Failed to get info for volume[g1r8-vs1-vol1], rc[8]
> 
> _____________________________________________
> From: Sandrine Boulanger
> Sent: Thursday, November 06, 2008 5:53 PM
> To: Sandrine Boulanger; dl-Cougar Core Team
> Subject: RE: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> I got a new exim4 package from Andy which is now installed on all
> nodes in Cougar soak. We'll monitor the status of the queue and # of
> processes running. I'll send an update tomorrow.
> 
> _____________________________________________
> From: Sandrine Boulanger
> Sent: Thursday, November 06, 2008 3:35 PM
> To: dl-Cougar Core Team
> Subject: Status of R4.0.1.0 Submittal 17 on Cougar soak
> 
> Cougar soak has been upgraded to sub#17. We have been increasing the
> schedule for autosupport reports to every 2 minutes. G12r10 had a lot
> of frozen messages in the queue yesterday night, but by this morning
> everything was cleared.
> 
> However, autosupport is no longer working on g11r10:
> 
> Nov  6 15:30:03 g11r10 : 0:0:asd:INFO: Rcvd Generate report request
> APP: (null)
> 
> Nov  6 15:30:03 g11r10 : 0:0:asd:ERROR: mta mail queue full
> 
> g11r10 diag> autosupport generate report
> 
> Report not generated, error 0xffffffff.
> 
> % Command failure.
> 
> g11r10 diag> system show chassis
> 
>  module     cpu         state
> 
> ----------------------------------------------
> 
>  SSC        SSC         UP
> 
>  NFPNIM     TXRX0       UP
> 
>             TXRX1       UP
> 
>             FP0         UP
> 
>             FP1         UP
> 
>             FP2         UP
> 
>             FP3         UP
> 
> ----------------------------------------------
> 
> g11r10 diag> exit
> 
> g11r10:~# exiqgrep -z -c
> 
> 121 matches out of 121 messages
> 
> g11r10:~# exim4 -bpc
> 
> 121
> 
> g11r10:~# ps ax | grep exim
> 
>   953 ?        S      0:00 /usr/sbin/exim4 -q
> 
>   966 ?        S      0:02 /usr/sbin/exim4 -q
> 
>  1261 ?        Ss     0:00 /usr/sbin/exim4 -bd -q30m
> 
> 10474 pts/0    R+     0:00 grep exim
> 
> g11r10:~#
> 
> _____________________________________________
> From: Larry Scheer
> Sent: Wednesday, November 05, 2008 3:47 PM
> To: dl-QA; dl-hcl-qa; dl-Cougar
> Subject: Build of R4.0.1.0 Submittal 17 is available for acceptance
> tests
> 
> Changes since last submittal
> 
> Branch r401rel
> 
> Change 31060 on 2008/11/05 by andys@ripper 'Integrate changelist
> 31059 from'
> 
> Change 31053 on 2008/11/04 by billn@billn-dev ' Change 31051 by
> billn@billn-de'
> 
> Defects fixed since last submittal
> 
> TED 25710 - [10206 - Onstor] Over 200 Exim processes running
> 
> TED 25761 - HP EVA4400, does not report paths as Primary/Failover
> even though TPGS is active
> 
> Location of images for submittal 17
> 
> R401rel build:
> 
> Source tree is here:
> 
> /n/Build-Trees/R4.0.1.0/EverON-4.0.1.0-110508-sub17
> 
> Images are here:
> 
> Cougar optimized:
> 
> http://10.2.0.21/upgrade/EverON-4.0.1.0CG.tar.gz
> 
> Cougar debug:
> 
> http://10.2.0.21/upgrade/EverON-4.0.1.0CGDBG.tar.gz
> 
