X-Sylpheed-Account-Id:1
S:andy.sharp@onstor.com
SCF:#mh/Mailbox/sent
X-Sylpheed-Sign:0
X-Sylpheed-Encrypt:0
X-Sylpheed-Privacy-System:
RMID:#mh/Mailbox/design review	0	BB375AF679D4A34E9CA8DFA650E2B04E06991B36@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 13 Nov 2007 22:48:24 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Narayan Venkat" <narayan.venkat@onstor.com>
Cc: "Paul Hammer" <paul.hammer@onstor.com>, "Eric Barrett"
 <eric.barrett@onstor.com>, "Tim Gardner" <tim.gardner@onstor.com>, "Joshua
 Goldenhar" <joshua.goldenhar@onstor.com>, "Henry Lau"
 <henry.lau@onstor.com>, "dl-Design Review" <dl-designreview@onstor.com>,
 "Sandrine Boulanger" <sandrine.boulanger@onstor.com>, "Vikas Saini"
 <vikas.saini@onstor.com>, "Manohar Divate" <manohar.divate@onstor.com>,
 "Dennis Arellano" <dennis.arellano@onstor.com>
Subject: Re: snapshot autoremove question
Message-ID: <20071113224824.7e842a69@ripper.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E06794CBE@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E05DA1A8C@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E06882AE7@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E05BFEAB1@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E04344B70@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E06882F79@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E06882F82@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E06883A64@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E05BFEB18@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E06991B36@onstor-exch02.onstor.net>
Organization: Onstor
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Tue, 13 Nov 2007 20:11:30 -0800 "Narayan Venkat"
<narayan.venkat@onstor.com> wrote:

>  
> 
> Yeah, but if you are designing the capability then we may as well
> design it right.  You can make the same argument with Autogrow.
> Autogrow allows you to automatically grow a file system when it is
> full.  What is your definition of full in the case of Autogrow?  Why
> set a high water mark?
> 
>  
> 
> Let's not get wrapped around in semantics here.  Let's design in
> correctly.  Where is the data loss coming from?

Narayan, we are engineers here, semantics is what we do.  Getting them
precisely defined is how we get the features well implemented and users
able to accurately use them.  So, yes, let's absolutely get wrapped
around the semantics.

> 
>  
> 
> Narayan Venkat 
> Vice President, Marketing 
> ONStor, Inc. 
> Tel: (408) 963-2404 
> Cell: (408) 221-4297. 
> 
> ________________________________
> 
> From: Paul Hammer 
> Sent: Tuesday, November 13, 2007 8:07 PM
> To: Narayan Venkat; Eric Barrett; Tim Gardner; Joshua Goldenhar; Henry
> Lau; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate; Dennis Arellano
> Subject: RE: snapshot autoremove question
> 
>  
> 
> The data loss point is from me.
> 
>  
> 
> The MRD states:
> 
> Auto Remove Snapshots. We need a configurable ability to auto remove
> snapshots when filesystem gets full.
> 
>  
> 
> Now we are going down a path of deleting snapshots at any water mark
> and not conditionaly as the MRD stated when the file system is full.
> If this is what is/was really wanted we should have captured that in
> the MRD, at this point the spec and the MRD don't agree. Which is
> correct?
> 
>  
> 
> -Paul
> 
>  
> 
> ________________________________
> 
> From: Narayan Venkat
> Sent: Tue 11/13/2007 5:03 PM
> To: Eric Barrett; Tim Gardner; Paul Hammer; Joshua Goldenhar; Henry
> Lau; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate; Dennis Arellano
> Subject: RE: snapshot autoremove question
> 
> Since I am not in dl-design review, I am seeing this thread late.  I
> agree with Eric's statements that we should not impose an arbitrary
> requirement of autogrow before deleting a snapshot.  Where is the
> "secondary data loss" argument coming from?  I don't understand it.
> Why would we lose data if snapshot reclamation and Autogrow work as
> designed?  What am I missing?
> 
>  
> 
> I'd vote for keeping the design in such a way that Autogrow and
> Snapshot removal are independent and tunable separately.  This is how
> other vendors do it.  
> 
>  
> 
> Narayan Venkat 
> Vice President, Marketing 
> ONStor, Inc. 
> Tel: (408) 963-2404 
> Cell: (408) 221-4297. 
> 
> ________________________________
> 
> From: Eric Barrett 
> Sent: Monday, November 12, 2007 8:58 AM
> To: Tim Gardner; Paul Hammer; Joshua Goldenhar; Henry Lau; dl-Design
> Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate; Dennis Arellano
> Subject: RE: snapshot autoremove question
> 
>  
> 
> I have to add also that I don't agree with the "secondary data loss"
> argument.  ARHWM would be elective and therefore no different than any
> other cleanup job, such as a Unix system sweeping up files in /tmp,
> or a job to delete backup archive images.
> 
>  
> 
>  
> 
> ________________________________
> 
> From: Eric Barrett 
> Sent: Monday, November 12, 2007 8:54 AM
> To: Tim Gardner; Paul Hammer; Joshua Goldenhar; Henry Lau; dl-Design
> Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate; Dennis Arellano
> Subject: RE: snapshot autoremove question
> 
> My feedback to Henry before the meeting was that as an admin, I'd be
> pissed at the arbitrary requirement that I autogrow stuff before I
> automatically delete snapshots.  My own personal preference would be
> to delete snapshots first.  I can't imagine I'm the only one.
> 
>  
> 
> This is especially true since the implementation was changed so that
> we do NOT guarantee the user does not receive ENOSPC because of a
> snapshot's disk consumption.
> 
>  
> 
> If linking ARHWM and AGHWM increases complexity of the user
> interaction, complexity of user understanding, complexity of testing,
> complexity of implementation, AND in the end results in a LOSS of
> functionality, why do it that way?
> 
>  
> 
>  
> 
> ________________________________
> 
> From: Tim Gardner 
> Sent: Friday, November 09, 2007 11:15 PM
> To: Paul Hammer; Joshua Goldenhar; Henry Lau; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate; Dennis Arellano
> Subject: RE: snapshot autoremove question
> 
> Paul,
> 
>  
> 
> This idea was discussed in detail during the design review. The
> conclusion was that autogrow and
> 
> auto snapshot removal should not be coupled for several reasons.
> 
>  
> 
> Coupling them significantly increases the complexity of the design and
> implementation.
> 
> It will result in more test cases, not less, than the current
> proposal.
> 
> It will increase the complexity of the documentation.
> 
> It is more likely to result in support calls.
> 
>  
> 
> Consider the workflow where a user sets an ARHWM of 80% with autogrow
> disabled.
> 
> What do we do when the user later enables autogrow with a 90% HWM?
> 
> Do we disable snapshot auro removal because it has a lower HWM?
> 
> Do we silently ignore the 80% HWM and instead use a HWM above 90%?
> 
> Do we outright change the HWM to something larger then 80%?
> 
> De we set if back if autogrow is later disabled?
> 
> What HWM do we display to the user when a vol show is done?
> 
>  
> 
> It was also thought that customers may want to intentionally set the
> ARHWM lower than
> 
> the autogrow HWM. Without actually asking customers, we really don't
> know.
> 
> The consensus was that we should give customers the choice and utilize
> the best
> 
> practices guide to document implications of various settings.
> 
>  
> 
> Tim
> 
> 
>  
> 
> ________________________________
> 
> From: Paul Hammer
> Sent: Fri 11/9/2007 8:46 PM
> To: Joshua Goldenhar; Henry Lau; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate; Dennis Arellano
> Subject: RE: snapshot autoremove question
> 
> Hi All,
> 
>  
> 
> Like to have the design changed slightly, think that if the user has
> autogrow enabled that the HWM to trigger a snapshot deletion cannot be
> set to a lower value then the autogrow value (we do not want to lose
> secondary data sets, that is essentially data loss), the code must
> catch this issue and trigger an error message and prevent the cfg. The
> marketing  requirement (not spec) was to free up space by deleting
> snapshots if the customer was going to run out of disk space (i.e.
> autogrow turned off or out of luns with autogrow on), the requirement
> was not about when the deletion should be triggered (at what amount of
> capacity was left) or allowing a HWM for activating the deletion.
> Having the delete happen before the autogrow adds to many unnecessary
> test permutations.  Granted QA must test the condition where AG is
> set to on and no free luns are available, if the delete snapshots
> option is set in this case then the Snapshot  will be deleted to buy
> some head room.
> 
>  
> 
> Please keep in mind what the requirement stated (not the spec), don't
> want to get into a protracted discussion on would we allow snapshots
> to be deleted at any HWM, that would be a very different different
> requirement.
> 
>  
> 
> If the user is going without autogrow being enabled they can set the
> HWM to delete snapshots to what ever value they want, would be
> surprised that anyone would set it at anything other than 98% or so
> (given what the requirment states).
> 
>  
> 
> Thanks,
> 
>  
> 
> -Paul
> 
>  
> 
> ________________________________
> 
> From: Joshua Goldenhar
> Sent: Fri 11/9/2007 1:25 PM
> To: Henry Lau; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate; Dennis Arellano
> Subject: RE: snapshot autoremove question
> 
> Thanks Henry - I see that this mentions the fact that you cannot pin
> scheduled snapshots and mentions setting autogrow in such a way that
> infers it is independent.
> 
>  
> 
> I'm sure Dennis will work his magic to turn this into fabulous
> documentation ;-)
> 
> -Josh 
> 
> Josh Goldenhar 
> Phone: 408 963 2408, Cell: 408 547 7693 
> 
> ________________________________
> 
> From: Henry Lau 
> Sent: Friday, November 09, 2007 11:53 AM
> To: Joshua Goldenhar; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate
> Subject: RE: snapshot autoremove question
> 
>  
> 
> Hi Joshua,
> 
>  
> 
> Please check the best practice section in section 8.1 of the doc.
> 
> 
>  
> 
> /n/software/FileSystem/snapshot_management_R98_autoremove.doc
> 
>  
> 
> Thanks,
> 
> Henry
> 
>  
> 
> ________________________________
> 
> From: Joshua Goldenhar 
> Sent: Friday, November 09, 2007 11:35 AM
> To: Jobi Ariyamannil; John Keiffer; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate
> Subject: RE: snapshot autoremove question
> 
>  
> 
> Thanks Jobi.
> 
>  
> 
> These types of questions came up in the last design review and were
> discussed at length.
> 
>  
> 
> I think a simple rewording of your problem statement can illuminate
> the overall design principle we ended up sticking to:
> 
> "I would like to understand why we are willing to possibly delete all
> of a customer's snapshots, if they have set and reached their
> autoremoval high water mark (ARHWM)." Becomes: "...why is the
> customer willing to possibly delete all of their snapshots..."
> 
>  
> 
> As Jobi mentioned, the customer optionally turns this feature on. It's
> our job to make sure the documentation and best practices guides issue
> this warning and make operation as clear and understandable as
> possible.
> 
>  
> 
> If we come up with an arbitrary algorithm to preserve snapshots, there
> will always be a customer that will say "why did you do it THAT way?
> - I would rather have had XYZ..." 
> 
>  
> 
> I did not know we cannot pin the scheduled snapshots - I imagine we'll
> get an RFE in the future to auto-pin snapshots or set a preservation
> threshold on scheduled snapshots. 
> 
>  
> 
> For now though I really feel the simplicity of the feature as
> implemented makes it easy to understand and makes the "dangers" easy
> to understand also.
> 
> -Josh 
> 
> Josh Goldenhar 
> Phone: 408 963 2408, Cell: 408 547 7693 
> 
> ________________________________
> 
> From: Jobi Ariyamannil 
> Sent: Friday, November 09, 2007 10:13 AM
> To: John Keiffer; dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate
> Subject: RE: snapshot autoremove question
> 
>  
> 
> Snapshot auto removal is an optional feature somebody needs to turn
> on. If they set the HWM too low, they may end up losing the snapshots.
> 
> I don't see any need of setting that below 98%.  The need for removing
> snapshots automatically is needed when filesystem operates in close to
> full conditions and prevent applications running into ENOSPC because
> of the space pinned by snapshots.  By providing an option for the
> user to specify that threshold, we ended up with all these interesting
> possibilities.
> 
>  
> 
> Regards,
> 
> Jobi
> 
>  
> 
> ________________________________
> 
> From: John Keiffer 
> Sent: Friday, November 09, 2007 10:05 AM
> To: dl-Design Review
> Cc: Sandrine Boulanger; Vikas Saini; Manohar Divate
> Subject: snapshot autoremove question
> 
> [opens mouth]
> 
>  
> 
> I would like to understand why we are willing to possibly delete all
> of a customer's snapshots, if they have set and reached their
> autoremoval high water mark (ARHWM). 
> 
>  
> 
> It seems that if users in the field are keeping a lot of snapshots
> that this might not be a problem, since deleting some might free up
> the necessary space. If users in the field are only scheduling a
> smaller number of snapshots to be taken it is more likely that we
> might end up deleting them all. 
> 
>  
> 
> Example problem: I ran it an issue where after creating a large file
> (using pre-allocation), my volume usage was over the ARHWM. I then
> deleted the large file. However, because both the snapshot deletion
> and file deletion happen in the background, the snapshots deletion
> happened before the large file was removed and the volume usage went
> back down. So I ended up losing my file and my snapshots.
> 
>  
> 
> Since we cannot pin weekly, daily, or hourly snapshots, and they are
> likely to be the most current, it seems like we should keep at least
> one. Some of us in QA think it might be nice to keep one of each, but
> at a minimum it seems that we should at least keep the most current.
> If the last unpinned snapshot needs to be deleted in order to free up
> space, there are bigger issues to deal with.
> 
>  
> 
> [inserts foot]
> 
>  
> 
> Thank you,
> 
> John Keiffer
> 
>  
> 
>  
> 
