X-Sylpheed-Account-Id:1
S:andy.sharp@onstor.com
SCF:#mh/Mailbox/sent
X-Sylpheed-Sign:0
X-Sylpheed-Encrypt:0
X-Sylpheed-Privacy-System:
RMID:#mh/Mailbox/perfarce	0	WEBMAILFjR2qsPoZ76200002c2a@mail.onstor.com
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 19 Jun 2008 12:03:52 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: chrisv <chris.vandever@onstor.com>
Cc: Jonathan Goldick <jonathan.goldick@onstor.com>
Subject: Re: PERFORCE change 29771 for review
Message-ID: <20080619120352.774fe60f@ripper.onstor.net>
References: <WEBMAILFjR2qsPoZ76200002c2a@mail.onstor.com>
Organization: Onstor
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Hi Guys, I'm hoping you'll indulge me a couple of questions:

For (B), how do we protect ourselves from the following case - we lose
connectivity, and node 1 loses connectivity to storage and data
network while node 2 continues to serve data/clients; all the volumes
and vsvrs on node 1 are lost/deleted (user does a sys-config-reset,
etc.); connectivity is restored, and node 1's newer DB is sent to node
2, losing all the stuff?


On 19 Jun 2008 11:10:01 -0700 chrisv <chris.vandever@onstor.com> wrote:

> Change 29771 by chrisv@chrisv-dev2 on 2008/06/19 11:07:08
> 
> 	Fix defect #24331 (Split-brain recovery: clustering
> 	may take minutes to agree on a PCC and clusDb
> 	version):
> 

A)

> 	- Do not send a node down event for the PCC when
> 	  we lose quorum.  WE may have been the PCC.

B)

> 	- Accept a new clusDb if it is newer than ours
> 	  if it is from a member of the cluster, even if
> 	  it is not from the PCC.
> 	- When receiving a clusDb make sure the label is
> 	  invalid for the duration of the transfer.  We
> 	  do not want to act upon or propagate an
> 	  incomplete clusDb.

C)

> 	- Yield when sending our clusDb to the PCC.

D)

> 	- Make sure sending and receiving a clusDb are
> 	  mutually exclusive.

E)

> 	- If we get an error while receiving a new
> 	  clusDb, clear the fact that it is in transit, so
> 	  we'll resume beaconing and recovery properly.

F)

> 	- Continue polling even when we are receiving a
> 	  clusDb.
> 	
> 	Also, can't use inet_ntoa() more than once in an
> 	elog or printf as it uses a single temporary area
> 	for the string.
> 	
> 	Reviewed by JonG
> 
> Affected files ...
> 
> ... //depot/dev/nfx-tree/code/ssc-cluster/clusdb-tools.c#11 edit
> ... //depot/dev/nfx-tree/code/ssc-cluster/cluster-contrl-cfg.c#30 edit
> ... //depot/dev/nfx-tree/code/ssc-cluster/cluster-contrl.h#8 edit
> ... //depot/dev/nfx-tree/code/ssc-cluster/cluster-server.c#10 edit
> ... //depot/dev/nfx-tree/code/ssc-openafs-ubik/beacon.c#15 edit
> ... //depot/dev/nfx-tree/code/ssc-openafs-ubik/recovery.c#8 edit
> ... //depot/dev/nfx-tree/code/ssc-openafs-ubik/remote.c#8 edit
> ... //depot/dev/nfx-tree/code/ssc-openafs-ubik/ubik.p.h#12 edit
> 
> 
> http://liszt:1818/@md=d&cd=//depot/$c=G35@/29771?ac=10
