X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C85E15.3AE902EC@onstor-exch02.onstor.net>; Wed, 23 Jan 2008 16:11:05 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C85E15.3AE902EC"
Content-class: urn:content-classes:message
Subject: RE: cluster DB corruption?
Date: Wed, 23 Jan 2008 16:11:05 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E03E9A565@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E03E9A564@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: cluster DB corruption?
Thread-Index: AcheDDr9iyS4rQaaSvimYhPi53zm6AAAD2YwAAIGjbA=
From: "Chris Vandever" <chris.vandever@onstor.com>
To: "Shin Irie" <shin.irie@onstor.com>,
	"dl-cstech" <dl-cstech@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C85E15.3AE902EC
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

There's a shareName record for an NFS share named "vol_mgmt_1491/" which
doesn't have the shareNfs and shareInfo records that should be
associated with it.  I'll send instructions on how to delete it.

ChrisV

_____________________________________________
From: Chris Vandever=20
Sent: Wednesday, January 23, 2008 2:12 PM
To: Shin Irie; dl-cstech
Subject: RE: cluster DB corruption?

I will check the clusDb and elogs in the zipped file, but in the
meantime these messages:

	Jan 23 12:05:53 bobcat1 : 0:0:cluster2:ERROR: sig_timer: contrl
rpc timeout, restarting controller=20
	Jan 23 12:05:53 bobcat1 : 0:0:pm:ERROR: pm_sig_handler:
/usr/local/agile/bin/cluster_contrl (pid 30290) exited with status 0=20

Indicate a known rmc problem, resulting in cluster_contrl exiting.  The
clustering errors after that are because clustering is restarting.

ChrisV

_____________________________________________
From: Shin Irie=20
Sent: Wednesday, January 23, 2008 2:07 PM
To: dl-cstech
Subject: cluster DB corruption?

Hi,

I have a customer whose Bobcat takes long time to complete nfx commands.
Also they cannot create a share for the management volume with the
message "the share already exist, so system get all cannot be copied.  I
only have /var/agile/messages (elog) now.  The Bobcat is a single node
system. and running R3.1.0.7.
 << File: elog_clusdb.zip >>=20
Following message are being logged a lot of times.  See attached zip
file for elog and Cluster DB.

	Jan 23 12:04:25 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs 5
exited. Restarting it=20

This messages started around Jan 23 12:04 (see below).  Several cluster
error messages are also logged.  The system admins were configuring the
Bobcat from CLI and Web UI at the same time.
Is this cluster DB corruption?  How can I recover this?

	Jan 23 12:04:23 bobcat1 : 1: cmd[0]: vsvr set SNIPER : status[0]
	Jan 23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT
Volume: name 'snipe-vol01', Id 0x000005d30000006a, Event 'Online', was
offline for roughly 799 sec.
	Jan 23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT IP
i/f: IP 192.167.5.1, Port bp0, State Up
	Jan 23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT IP
i/f: IP 192.167.5.2, Port bp0, State Up
	Jan 23 12:04:25 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs 5
exited. Restarting it=20
	Jan 23 12:04:57 bobcat1 last message repeated 18 times
	Jan 23 12:05:52 bobcat1 last message repeated 32 times
	Jan 23 12:05:53 bobcat1 : 0:0:cluster2:ERROR: sig_timer: contrl
rpc timeout, restarting controller=20
	Jan 23 12:05:53 bobcat1 : 0:0:pm:ERROR: pm_sig_handler:
/usr/local/agile/bin/cluster_contrl (pid 30290) exited with status 0=20
	Jan 23 12:05:54 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs 5
exited. Restarting it=20
	Jan 23 12:06:03 bobcat1 last message repeated 5 times
	Jan 23 12:06:03 bobcat1 : 0:0:cluster2:ERROR:
cluster_getRecordIdByKey: no reply bck -1=20
	Jan 23 12:06:03 bobcat1 : 0:0:cluster2:ERROR:
cluster_getFilerNameList: cannot get cluster rec, rcode 30=20
	Jan 23 12:06:03 bobcat1 : 0:0:nfxsh:NOTICE: cmd[9]: vsvr show
all : status[11]
	Jan 23 12:06:04 bobcat1 : 0:0:cluster2:ERROR:
cluster_atomicUpdateRecord: no reply bck -1=20
	Jan 23 12:06:04 bobcat1 : 0:0:cluster2:ERROR:
cluster_releaseLock[3956]: Unable to update lock recId 12800, code 30=20
	Jan 23 12:06:04 bobcat1 : 0:0:cluster2:ERROR:
cluster_releaseGnsLock[2081]: Can't release GNS read lock, recId 12800,
code 30=20


--
Irie


------_=_NextPart_001_01C85E15.3AE902EC
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7653.38">
<TITLE>RE: cluster DB corruption?</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">There</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">s a =
shareName record for an NFS share named</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&#8220;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">vol_mgmt_1491/</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&#8221;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial"> which doesn</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">t have the shareNfs and</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">s</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">hareInfo =
records that should be associated with it.</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&nbsp; I</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">ll send instructions on how to delete =
it.</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">ChrisV</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">From:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Chris Vandever<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Sent:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Wednesday, January 23, =
2008 2:12 PM<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">To:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Shin Irie; dl-cstech<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Subject:</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma"> RE: cluster DB corruption?</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">I will check the clusDb and elogs in the zipped file, but =
in the meantime these messages:</FONT></SPAN></P>
<UL>
<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">Jan 23 12:05:53 bobcat1 : 0:0:cluster2:ERROR: sig_timer: =
contrl rpc timeout, restarting controller </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">Jan 23 12:05:53 bobcat1 : 0:0:pm:ERROR: pm_sig_handler: =
/usr/local/agile/bin/cluster_contrl (pid 30290) exited with status 0 =
</FONT></SPAN></P>
</UL>
<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">Indicate a known rmc problem, resulting in cluster_contrl =
exiting.&nbsp; The clustering errors after that are because clustering =
is restarting.</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">ChrisV</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">From:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Shin Irie<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Sent:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Wednesday, January 23, =
2008 2:07 PM<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">To:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> dl-cstech<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Subject:</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma"> cluster DB corruption?</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Hi,</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">I =
have a customer whose Bobcat takes long time to complete nfx commands. =
Also they cannot create a share for the management volume with the =
message &quot;the share already exist, so system get all cannot be =
copied.&nbsp; I only have /var/agile/messages (elog) now.&nbsp; The =
Bobcat is a single node system. and running R3.1.0.7.</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us">&nbsp;&lt;&lt; File: =
elog_clusdb.zip &gt;&gt;</SPAN><SPAN LANG=3D"en-us"> </SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Following message are being logged a lot of times.&nbsp; =
See attached zip file for elog and Cluster DB.</FONT></SPAN></P>
<UL>
<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:04:25 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs 5 exited. =
Restarting it </FONT></SPAN></P>
</UL>
<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">This =
messages started around Jan 23 12:04 (see below).&nbsp; Several cluster =
error messages are also logged.&nbsp; The system admins were configuring =
the Bobcat from CLI and Web UI at the same time.</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Is =
this cluster DB corruption?&nbsp; How can I recover =
this?</FONT></SPAN></P>
<UL>
<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:04:23 bobcat1 : 1: cmd[0]: vsvr set SNIPER : =
status[0]</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT Volume: name =
'snipe-vol01', Id 0x000005d30000006a, Event 'Online', was offline for =
roughly 799 sec.</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT IP i/f: IP =
192.167.5.1, Port bp0, State Up</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:04:24 bobcat1 : 0:0:eventd:NOTICE: Process-EVENT IP i/f: IP =
192.167.5.2, Port bp0, State Up</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:04:25 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs 5 exited. =
Restarting it </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:04:57 bobcat1 last message repeated 18 times</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:05:52 bobcat1 last message repeated 32 times</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:05:53 bobcat1 : 0:0:cluster2:ERROR: sig_timer: contrl rpc timeout, =
restarting controller </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:05:53 bobcat1 : 0:0:pm:ERROR: pm_sig_handler: =
/usr/local/agile/bin/cluster_contrl (pid 30290) exited with status 0 =
</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:05:54 bobcat1 : 0:0:auth_agent:WARNING: nisd for vs 5 exited. =
Restarting it </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:06:03 bobcat1 last message repeated 5 times</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:06:03 bobcat1 : 0:0:cluster2:ERROR: cluster_getRecordIdByKey: no =
reply bck -1 </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:06:03 bobcat1 : 0:0:cluster2:ERROR: cluster_getFilerNameList: =
cannot get cluster rec, rcode 30 </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:06:03 bobcat1 : 0:0:nfxsh:NOTICE: cmd[9]: vsvr show all : =
status[11]</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:06:04 bobcat1 : 0:0:cluster2:ERROR: cluster_atomicUpdateRecord: no =
reply bck -1 </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:06:04 bobcat1 : 0:0:cluster2:ERROR: cluster_releaseLock[3956]: =
Unable to update lock recId 12800, code 30 </FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">Jan =
23 12:06:04 bobcat1 : 0:0:cluster2:ERROR: cluster_releaseGnsLock[2081]: =
Can't release GNS read lock, recId 12800, code 30 </FONT></SPAN></P>
<BR>
</UL>
<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">--</FONT></SPAN></P>

<P ALIGN=3DLEFT><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Irie</FONT></SPAN></P>

</BODY>
</HTML>
------_=_NextPart_001_01C85E15.3AE902EC--
