AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20070417115803.595dc674@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<jay.michlin@onstor.com>,<eric.crutchlow@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E034373A2@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 17 Apr 2007 11:59:05 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Jay Michlin" <jay.michlin@onstor.com>
Cc: "Eric Crutchlow" <eric.crutchlow@onstor.com>
Subject: Re: Installation Issues
Message-ID: <20070417115905.6cbd981e@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E034373A2@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E034373A2@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Jay, Eric,

I suspect everyone's in violent agreement about this, but I just want
to reiterate that some of the things in here just cry out for an
engineer to be on site during one or more of these installs so that we
can dig down to the heart of these matters.  I realize that
when/where/how this happens is a matter for Jay to determine vis-a-vis
resource allocation and schedules and other considerations, but it just
hurts me right down to the gut when I hear that we have lost sales
because of installation or configuration issues.  If we can just
capture these issues they are so fixable.  But it's not a one person
job: someone has to wave their hands at the customer while another
person digs into the problem on the filer ~:^)

Let me know if I can help,

a


On Tue, 17 Apr 2007 11:35:32 -0700 "Jay Michlin"
<jay.michlin@onstor.com> wrote:

> Hello all,
>  
> The email below is background reading about some of the problems our
> Porfessional Services team is seeing during installs and upgrades at
> customer sites. Most of that is invisible to us most of the time. And
> in the same way, most of our challenges in Software Development are
> invisible to field people most of the time. 
>  
> We'll all benefit by understanding better what life is like in other
> parts of the company, so I offer the email below for your interest.
> There are no assignments or action items in it. Just information worth
> seeing.
>  
> jay
>  
> 
> ________________________________
> 
> From: Eric Crutchlow 
> Sent: Tuesday, April 17, 2007 8:57 AM
> To: Jay Michlin; Paul Hammer; Sandrine Boulanger; Ed Kwan
> Cc: Caeli Collins
> Subject: RE: Installation Issues
> 
> 
> 
> Jay,
> 
>  
> 
> I appreciate your thoughts on the matter, and obviously I have not
> made my case. Please allow me to do so now.
> 
>  
> 
> Prior to Sudheesh's meeting, the field was encountering the problems
> we have outline below (the installation issues list). From that
> meeting and the subsequent meeting with your group, we have made
> strides to educate and focus our efforts. The investment of time has
> produced the installations issues list and help get several serious
> bugs addressed. I think we would all agree that the time spent in
> these meetings were productive and needed.
> 
>  
> 
> But lets review were the fields' time has been spent. Two recent
> examples will illustrate this well; BAE and Fandango. 
> 
>  
> 
> One year ago, we attempted to demo to BAE and the system failed. We
> lost a sale (I have not been given the details, but if it would be
> helpful, please let me know). A year later, BAE lets us do another
> demo. Scott Moyer (our SE) went to BAE and attempted to install a two
> node cluster. He hits the PM on initial config error and attempts to
> configure the system anyway. He basically goes through four flash
> cards and now has no way to continue the install. Bill Duffy and I
> are at a customer's site in San Diego performing an install. Bill
> gets called away to help Scott. Eventually, he cuts short his stay
> with me to immediately fly to Philadelphia and help. The customer is
> wondering what is going on and is getting ready to throw us out.
> Scott has wasted 8 hours. Bill now spends two days to get everything
> working as well as make sure the customer understands that the
> Bobcats are reliable and will work for them. Now ONStor has invested
> 24 hours on a possible sale.
> 
>  
> 
> Luckily, BAE is impressed with Bill's 'performance'. As fate would
> have it, BAE's current system fail shortly thereafter and they decide
> to purchase the systems ASAP. Bill helped with the migration and so
> far no issues. 
> 
>  
> 
> A few weeks ago I installed Fandango with a two node cluster and no
> issues. Last Thursday they contacted me and said that they changed
> their network addresses and would like to update there Bobcat's. The
> reaction to the fact that the only way they could do this was by
> resetting the boxes was not well received. But I assured them that
> since they haven't gone into production, this would not be a major
> problem and would only take 1.5 hours. I first broke the cluster,
> upgraded to 2.2.2.2, and then performed the sys config reset. I ran
> into the PM problem that Scott hit at BAE. Rebooting didn't work and
> I executed a second sys config reset which did. 1.5 turned into 4
> hours. We left the systems that day setup with no vservers except for
> the management ones. Later that night at 2am, there was a cpu crash
> (see issue #5). Now Fandango is waiting on us to explain this before
> going live. 
> 
>  
> 
> And so with the two meetings we have had, we have created the list and
> are addressing the issues. As outlined below, the future releases will
> do a better job. But Delorean is a month away and ONStor must address
> these issues in the current release. And we are. Again, these two
> short meetings have helped us make strides.
> 
>  
> 
> The GUI has always been an issue and one that I was working on with
> Charissa prior to the meetings. When the field reported issues with PM
> or the GUI, the response has been they were not reproducible. As such
> the field became very frustrated and has not filed defects. This is
> unacceptable. Charissa has been willing to work with me on this and is
> making progress to resolve it. She is a great example of your
> suggestion of ongoing engagements. 
> 
>  
> 
> But until the two meetings, this was the only ongoing engagement we
> had.
> 
>  
> 
> We have not directly addressed the fields' view of the support
> organization (CS, QA, Dev). We need to change that. The meetings were
> a start and well worth the time.
> 
>  
> 
> The field has also been told many times that items are WAD. My recent
> experience with Bill (defect 18502) illustrates this point. WAD is
> being touted too many times. This contributes to the 'mythology' of
> how our systems work and serves no purpose. As I stated before, if
> something is WAD, it must be documented and the documentation must be
> easily accessible to a user. In my opinion, this is either in the SAG
> or GUI documentation. Management needs to address this and
> communicate it to the team. There should not be (cannot be) confusion
> on what WAD is for any particular component of our systems. I do not
> underestimate the challenge of this nor am I proposing the meetings I
> am requesting directly address this.
> 
>  
> 
> And so I put it to the team; by having two meeting to create the
> installation issues list we have helped focus future development
> efforts as outlined by Jay. We are also addressing the issues in the
> current release (no resolutions as of yet, but as Jay has stated,
> everyone is committed to fixing them). Prior to the two meetings, our
> method for dealing with this was ongoing engagements. I believe this
> has not worked and we need to change. I am only asking for one
> meeting for one-half hour, every two weeks (my apologizes Jay if I
> mis-communicated this as a weekly meeting). I will create an agenda
> and keep minutes to ensure the meetings are organized and effective.
> Our goal will be to address the issues and improve teamwork. More
> importantly, I propose that if we can incorporate these effort into
> our other meetings and processes, we no longer need this one.
> 
>  
> 
> As I have illustrated with BAE and Fandango, we are already spending
> the time and have lost/jeopardized sales because of it. Is meeting
> once every two weeks not worth the investment? 
> 
> Regards, 
> 
> Eric Crutchlow 
> Professional Services Manager 
> t: 408-376-3113 
> f: 408-963-2409 
> m: 408-596-1155 
> 
> ________________________________
> 
> From: Eric Crutchlow
> Sent: Mon 4/16/2007 3:53 PM
> To: Paul Hammer; Sandrine Boulanger; Ed Kwan; Jay Michlin
> Subject: Installation Issues
> 
> When: Occurs every 2 weeks on Thursday effective 4/19/2007 from 1:00
> PM to 1:30 PM (GMT-08:00) Pacific Time (US & Canada).
> 
> Where: TBD 
> 
> *~*~*~*~*~*~*~*~*~* 
> 
> I'm following up with the installation issues meeting we had a few
> weeks ago to resolve the issues PS and the SE's are seeing in the
> field. The meeting will be every two weeks and I will keep minutes
> with action items. The goals are to resolve the current list (see
> below) and put processes in place to deal with future
> installation/eval issues as part of normal escalations. Once these
> two goals are reached, we can determine if there is any further need
> for additional meetings.
> 
> The following is a list of issues we currently are addressing: 
> 
> 1) Prior to init config on a new nas, get errors. Reboot off of 2nd
> flash and no problems. 
> 
> 2) Do init config 1st time. After reboot, the nas generates pm errors.
> If you reboot, you get init config, but this time the nas remembers
> the default route you entered the first time. 2nd reboot and
> everything is fine. We also see this with config resets.
> 
> 3) Upgrades from 1.3.3.x to 2.x seems to work fine, but when you try
> and move the vs from the 1.3.3.x server to the 2.x, won't work. A
> cluster show cluster on the 2.x nas reports the cluster name as
> 'na'.  You kill ncmd on the 1.3.3.x nas  and now cluster name is
> right and you can move vs.
> 
> 4) System copy all. Do a compare and get errors. If you reboot the
> nas, perform another copy all, compare works. We wonder if flash has
> space issues.
> 
> 5) New install of a two node cluster. One of the nodes will reboot
> with a cpu crash. Just happened at an install Bill did yesterday.
> 
> 6) An upgrade takes 15 - 20 minutes. Compare another 10 - 15. This
> takes a lot of time, especially if you get an error and have to start
> over. You can waste 1 - 1.5 hours this way.
> 
