AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080123154302.7bf3fc84@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<chris.vandever@onstor.com>,<maxim.kozlovsky@onstor.com>,<jonathan.goldick@onstor.com>,<jobi.ariyamannil@onstor.com>,<amit.bothra@onstor.com>,<sandrine.boulanger@onstor.com>,<brian.baker@onstor.com>,<brian.deforest@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E03E9A565@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 23 Jan 2008 15:43:23 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Chris Vandever" <chris.vandever@onstor.com>
Cc: Maxim Kozlovsky <maxim.kozlovsky@onstor.com>, Jonathan Goldick
 <jonathan.goldick@onstor.com>, Jobi Ariyamannil
 <jobi.ariyamannil@onstor.com>, Amit Bothra <amit.bothra@onstor.com>,
 Sandrine Boulanger <sandrine.boulanger@onstor.com>, Brian Baker
 <brian.baker@onstor.com>, Brian DeForest <brian.deforest@onstor.com>
Subject: mightydog coffee breaks and sightings
Message-ID: <20080123154323.6d99aa2d@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E03E9A565@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E03E9A564@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E03E9A565@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Sightings.

Yesterday during one of mightydog's coffee breaks, a file appeared in
the directory I was working in (home directory on mightydog) that I did
nothing to create. It had the name of "1493" and a completely strange
mode, causing rsync, which I use constantly when in the compile/edit
cycle, to choke.  I know it just "appeared" in relation to the mightydog
sleepage because I used rsync just before and just after.  It didn't
choke before, and did after.  I just deleted the file and went on about
my business but the name seems too strangely similar to this 1491 and
well, maybe someone knows something or had a similar strange experience.

I think I have a screen log of when this happend, if anyone is
interested.  It wasn't Columbus day yesterday, was it?

And what's up with mightydog "going to sleep" for minutes at a time
lately?

Cheers,

a


On Wed, 23 Jan 2008 15:11:05 -0800 "Chris Vandever"
<chris.vandever@onstor.com> wrote:

> There's a shareName record for an NFS share named "vol_mgmt_1491/"
> which doesn't have the shareNfs and shareInfo records that should be
> associated with it.  I'll send instructions on how to delete it.
> 
> ChrisV
> 
> _____________________________________________
> From: Chris Vandever 
> Sent: Wednesday, January 23, 2008 2:12 PM
> To: Shin Irie; dl-cstech
> Subject: RE: cluster DB corruption?
> 
> I will check the clusDb and elogs in the zipped file, but in the
> meantime these messages:
> 
> 	Jan 23 12:05:53 bobcat1 : 0:0:cluster2:ERROR: sig_timer:
> contrl rpc timeout, restarting controller 
> 	Jan 23 12:05:53 bobcat1 : 0:0:pm:ERROR: pm_sig_handler:
> /usr/local/agile/bin/cluster_contrl (pid 30290) exited with status 0 
> 
> Indicate a known rmc problem, resulting in cluster_contrl exiting.
> The clustering errors after that are because clustering is restarting.
> 
> ChrisV
> 
> _____________________________________________
> From: Shin Irie 
> Sent: Wednesday, January 23, 2008 2:07 PM
> To: dl-cstech
> Subject: cluster DB corruption?
> 
> Hi,
> 
> I have a customer whose Bobcat takes long time to complete nfx
> commands. Also they cannot create a share for the management volume
> with the message "the share already exist, so system get all cannot
> be copied.  I only have /var/agile/messages (elog) now.  The Bobcat
> is a single node system. and running R3.1.0.7.
>  << File: elog_clusdb.zip >> 
> Following message are being logged a lot of times.  See attached zip
> file for elog and Cluster DB.
> 
> 	Jan 23 12:04:25 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs
> 5 exited. Restarting it 
> 
> This messages started around Jan 23 12:04 (see below).  Several
> cluster error messages are also logged.  The system admins were
> configuring the Bobcat from CLI and Web UI at the same time.
> Is this cluster DB corruption?  How can I recover this?
> 
> 	Jan 23 12:04:23 bobcat1 : 1: cmd[0]: vsvr set SNIPER :
> status[0] Jan 23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT
> Volume: name 'snipe-vol01', Id 0x000005d30000006a, Event 'Online', was
> offline for roughly 799 sec.
> 	Jan 23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT IP
> i/f: IP 192.167.5.1, Port bp0, State Up
> 	Jan 23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT IP
> i/f: IP 192.167.5.2, Port bp0, State Up
> 	Jan 23 12:04:25 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs
> 5 exited. Restarting it 
> 	Jan 23 12:04:57 bobcat1 last message repeated 18 times
> 	Jan 23 12:05:52 bobcat1 last message repeated 32 times
> 	Jan 23 12:05:53 bobcat1 : 0:0:cluster2:ERROR: sig_timer:
> contrl rpc timeout, restarting controller 
> 	Jan 23 12:05:53 bobcat1 : 0:0:pm:ERROR: pm_sig_handler:
> /usr/local/agile/bin/cluster_contrl (pid 30290) exited with status 0 
> 	Jan 23 12:05:54 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs
> 5 exited. Restarting it 
> 	Jan 23 12:06:03 bobcat1 last message repeated 5 times
> 	Jan 23 12:06:03 bobcat1 : 0:0:cluster2:ERROR:
> cluster_getRecordIdByKey: no reply bck -1 
> 	Jan 23 12:06:03 bobcat1 : 0:0:cluster2:ERROR:
> cluster_getFilerNameList: cannot get cluster rec, rcode 30 
> 	Jan 23 12:06:03 bobcat1 : 0:0:nfxsh:NOTICE: cmd[9]: vsvr show
> all : status[11]
> 	Jan 23 12:06:04 bobcat1 : 0:0:cluster2:ERROR:
> cluster_atomicUpdateRecord: no reply bck -1 
> 	Jan 23 12:06:04 bobcat1 : 0:0:cluster2:ERROR:
> cluster_releaseLock[3956]: Unable to update lock recId 12800, code 30 
> 	Jan 23 12:06:04 bobcat1 : 0:0:cluster2:ERROR:
> cluster_releaseGnsLock[2081]: Can't release GNS read lock, recId
> 12800, code 30 
> 
> 
> --
> Irie
> 
