AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080618163714.31d017fa@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<chris.vandever@onstor.com>,<maxim.kozlovsky@onstor.com>,<jonathan.goldick@onstor.com>,<rendell.fong@onstor.com>,<james.kahn@onstor.com>,<jobi.ariyamannil@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	41437	BB375AF679D4A34E9CA8DFA650E2B04E03E9A910@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 18 Jun 2008 16:37:32 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Chris Vandever" <chris.vandever@onstor.com>
Cc: "Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>, "Jonathan Goldick"
 <jonathan.goldick@onstor.com>, "Rendell Fong" <rendell.fong@onstor.com>,
 "James Kahn" <james.kahn@onstor.com>, "Jobi Ariyamannil"
 <jobi.ariyamannil@onstor.com>
Subject: Re: software architecture question
Message-ID: <20080618163732.189f219f@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E03E9A910@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E0A82D903@onstor-exch02.onstor.net>
 <BB375AF679D4A34E9CA8DFA650E2B04E03E9A910@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

You can safely leave implementation details like how to deal with it
possibly taking a long time and not necessarily wanting to wait for it
to me, I've got some ideas there.

If pm is horrified and has decided to reboot the system, or
cluster-mumble wants to reboot for some non-catestrophic reason, then
perhaps we would at least want to try?  What's the end result
difference between offlining the volumes before reboot, and not
offlining?  If offlined, do they not fail over, assuming a cluster
configuration?

If not, then do we not have a need for an API routine of this type?  I
confess I couldn't fully grasp whether Chris was saying there was an
API type routine or not.


On Wed, 18 Jun 2008 16:11:57 -0700 "Chris Vandever"
<chris.vandever@onstor.com> wrote:

> The functionality already exists in VSD.  When we get a
> CLUSTER_MSG_RESTART we call vsd_procClusterRestartMsgReq(), which
> starts a transaction for vsd_clusterRestartReqProc(), which does a
> blocking call to vsd_disableAllVsProc().  We could change it to be a
> more appropriately named vsd msg instead of a clustering msg.  It
> would also be a good idea to review what it does -- it did what I
> needed for clustering, before rebooting the system.
> 
> That said, Max is correct that it can take awhile, so I ended up not
> using it as much as I'd intended.  (Currently, it's used when a node
> is added to or deleted from a cluster.  In both cases, it should not
> have any vsvrs other than the mgmt vsvr.  I had hoped to use it when
> losing the fs-ownership battle in split-brain situations before
> invalidating the clusDb and rebooting, but it took too long.)
> 
> ChrisV
> 
> -----Original Message-----
> From: Maxim Kozlovsky 
> Sent: Wednesday, June 18, 2008 4:05 PM
> To: Maxim Kozlovsky; Andy Sharp; Chris Vandever; Jonathan Goldick;
> Rendell Fong; James Kahn; Jobi Ariyamannil
> Subject: RE: software architecture question
> 
> Replace "online" with "offline".
> 
> >-----Original Message-----
> >From: Maxim Kozlovsky
> >Sent: Wednesday, June 18, 2008 4:05 PM
> >To: Andy Sharp; Chris Vandever; Jonathan Goldick; Rendell Fong; James
> Kahn;
> >Jobi Ariyamannil
> >Subject: RE: software architecture question
> >
> >It is not possible to make a general library function which onlines
> >the volumes. The problem is that volume online can take really long
> >time,
> and
> >you can not just block and wait for this to finish anywhere but in
> >the nfxsh which does not mind blocking.
> >
> >Can't really imagine the need to cleanly offline the volume anywhere
> but in
> >the nfxsh when the user says he/she really wants it. When reboot is
> done
> >automatically, it is usually because something is wrong, and when
> something
> >is wrong you really don't want to sit there and wait for the volumes
> >to
> be
> >offlined for unknown period of time.
> >
> >
> >>-----Original Message-----
> >>From: Andy Sharp
> >>Sent: Wednesday, June 18, 2008 3:58 PM
> >>To: Chris Vandever; Jonathan Goldick; Maxim Kozlovsky; Rendell Fong;
> James
> >>Kahn; Jobi Ariyamannil
> >>Subject: software architecture question
> >>
> >>Context:
> >>
> >>I'm working some more changes to consolidate reboot functionality in
> >>our userspace code, as first started by me, then progressed some
> >>more
> by
> >>Chris, and now back to me for more.
> >>
> >>The code paths involving _system upgrade -p_ (upgrade the primary
> >>flash) and _system reboot_ try to offline all the volumes (which is
> >>skipped in the _system reboot_ case if you supply the -f option).
> >>Possibly several other code paths should try to offline the volumes
> >>as well (???), so I'm considering how to move that functionality
> >>into a common reboot routine.
> >>
> >>Problem:
> >>
> >>The vol_offline_all() function, and another function it calls, are
> >>implemented currently in nfx-tree/ssc-nfxsh/cmd_vol.c.  I could just
> >>move that code into the sm-utils/sys-utils.c code I'm hacking on
> >>now, but I'm thinking shouldn't this sort of thing be some kind of
> >>vol API type routine?   Currently vol_offline_all() appears to be
> >>callable
> with
> >>a VSID and presumably offlines just that vsvr's volumes, or with -1
> >>to offline all volumes on this filer.
> >>
> >>Max mentioned that most likely the code that implements this should
> >>go in the vsd daemon, and perhaps that's true, but then we still
> >>would
> need
> >>an API type function that does the work of sending the message to
> >>vsd. Where would be the right place for such code to reside?
> >>
> >>Cheers,
> >>
> >>a
