AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080717115836.08795de0@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<john.rogers@onstor.com>,<raj.kumar@onstor.com>,<ed.kwan@onstor.com>,<vikas.saini@onstor.com>,<paul.hammer@onstor.com>,<brian.nguyen@onstor.com>,<jonathan.goldick@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E09C42770@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 17 Jul 2008 11:58:42 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "John Rogers" <john.rogers@onstor.com>
Cc: "Raj Kumar" <raj.kumar@onstor.com>, "Ed Kwan" <ed.kwan@onstor.com>,
 "Vikas Saini" <vikas.saini@onstor.com>, "Paul Hammer"
 <paul.hammer@onstor.com>, "Brian Nguyen" <brian.nguyen@onstor.com>,
 "Jonathan Goldick" <jonathan.goldick@onstor.com>
Subject: Re: Permabit script for corruption
Message-ID: <20080717115842.1954a618@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09C42770@onstor-exch02.onstor.net>
References: <BB375AF679D4A34E9CA8DFA650E2B04E0AF1DF9E@onstor-exch02.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E09C42770@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 17 Jul 2008 11:42:54 -0700 "John Rogers"
<john.rogers@onstor.com> wrote:

> I;ve checked in a fix, it was pretty massive, 200K lines. Andy can you
> review it.

Sure.  While I'm doing that, could you take a look at the kpi's and
maybe elogs from around that time, since developers don't normally
have access to that machine?

> copy zero to onstor Thu Jul 17 03:41:46 PDT 2008

> -----Original Message-----
> From: Raj Kumar 
> Sent: Thursday, July 17, 2008 11:09 AM
> To: Andy Sharp
> Cc: Ed Kwan; Vikas Saini; Paul Hammer; Brian Nguyen; Jonathan Goldick;
> John Rogers
> Subject: RE: Permabit script for corruption
> 
> Assuming that you are joking, try to find somebody who works in
> development engineering. 
> 
> -----Original Message-----
> From: Andy Sharp 
> Sent: Thursday, July 17, 2008 11:02 AM
> To: Raj Kumar
> Cc: Ed Kwan; Vikas Saini; Paul Hammer; Brian Nguyen; Jonathan Goldick;
> John Rogers
> Subject: Re: Permabit script for corruption
> 
> Good point.  John and Raj, you're up.
> 
> On Thu, 17 Jul 2008 10:20:32 -0700 "Raj Kumar" <raj.kumar@onstor.com>
> wrote:
> 
> > If anyone cares? Probably you should assign this to some one and
> > have them take a look?
> > 
> > -----Original Message-----
> > From: Andy Sharp 
> > Sent: Thursday, July 17, 2008 10:19 AM
> > To: Ed Kwan
> > Cc: Vikas Saini; Paul Hammer; Raj Kumar; Brian Nguyen; Jonathan
> > Goldick; John Rogers
> > Subject: Re: Permabit script for corruption
> > 
> > The "cp" command that shows last in the log file is still hung on my
> > system.  If anyone cares to take a look.  Talk about a "performance"
> > problem...
> > 
> > On Thu, 17 Jul 2008 10:11:07 -0700 "Ed Kwan" <ed.kwan@onstor.com>
> > wrote:
> > 
> > > Brian and I have been running the script against 3.1.1.3 and
> > > 3.2.0.6 since yesterday night and evening respectively, and we
> > > haven't seen any problem yet.
> > > 
> > > Also keep in mind Permabit is seeing I/O retries, high evm times,
> > > plus 4 disks failures in the past 2 months.  CS is planning to
> > > replace some of the DotHill array hardware.
> > > 
> > > > -----Original Message-----
> > > > From: Andy Sharp
> > > > Sent: Thursday, July 17, 2008 9:59 AM
> > > > To: Vikas Saini
> > > > Cc: Paul Hammer; Raj Kumar; Brian Nguyen; Ed Kwan; Jonathan
> > > > Goldick Subject: Re: Permabit script for corruption
> > > > 
> > > > Cuz it's all I got.  Who's gonna care at 3am?  Anyway, it seemed
> > > > to hang up after just a few iterations.  It was terminated on
> > > > schedule a couple hours later.
> > > > 
> > > > 
> > > > $ cat /home/andy/log.ripper
> > > > +++ seq 1 10000
> > > > ++ for x in '`seq 1 10000`'
> > > > ++ echo 'Time number: 1'
> > > > Time number: 1
> > > > +++ date
> > > > ++ echo 'copy random to onstor Thu Jul 17 03:22:50 PDT 2008'
> > > > copy random to onstor Thu Jul 17 03:22:50 PDT 2008
> > > > ++ cp /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > > 
> > > > real	5m1.846s
> > > > user	0m0.006s
> > > > sys	0m2.248s
> > > > +++ date
> > > > ++ echo 'diff 1 Thu Jul 17 03:27:52 PDT 2008'
> > > > diff 1 Thu Jul 17 03:27:52 PDT 2008
> > > > ++ diff /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > > 
> > > > real	4m45.411s
> > > > user	0m1.166s
> > > > sys	0m5.376s
> > > > ++ '[' 0 '!=' 0 ']'
> > > > +++ date
> > > > ++ echo 'copy zero to onstor Thu Jul 17 03:32:38 PDT 2008'
> > > > copy zero to onstor Thu Jul 17 03:32:38 PDT 2008
> > > > ++ cp /u1/deleteme.ripper.zero ./deleteme.ripper.target
> > > > 
> > > > real	0m0.367s
> > > > user	0m0.000s
> > > > sys	0m0.167s
> > > > +++ date
> > > > ++ echo 'diff 2 Thu Jul 17 03:32:38 PDT 2008'
> > > > diff 2 Thu Jul 17 03:32:38 PDT 2008
> > > > ++ diff /u1/deleteme.ripper.zero ./deleteme.ripper.target
> > > > 
> > > > real	0m0.002s
> > > > user	0m0.000s
> > > > sys	0m0.001s
> > > > ++ '[' 0 '!=' 0 ']'
> > > > ++ for x in '`seq 1 10000`'
> > > > ++ echo 'Time number: 2'
> > > > Time number: 2
> > > > +++ date
> > > > ++ echo 'copy random to onstor Thu Jul 17 03:32:38 PDT 2008'
> > > > copy random to onstor Thu Jul 17 03:32:38 PDT 2008
> > > > ++ cp /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > > 
> > > > real	4m39.908s
> > > > user	0m0.005s
> > > > sys	0m2.166s
> > > > +++ date
> > > > ++ echo 'diff 1 Thu Jul 17 03:37:18 PDT 2008'
> > > > diff 1 Thu Jul 17 03:37:18 PDT 2008
> > > > ++ diff /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > > 
> > > > real	4m27.721s
> > > > user	0m1.166s
> > > > sys	0m3.643s
> > > > ++ '[' 0 '!=' 0 ']'
> > > > +++ date
> > > > ++ echo 'copy zero to onstor Thu Jul 17 03:41:46 PDT 2008'
> > > > copy zero to onstor Thu Jul 17 03:41:46 PDT 2008
> > > > ++ cp /u1/deleteme.ripper.zero ./deleteme.ripper.target
> > > > 
> > > > 
> > > > That's all the farther it got.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On Thu, 17 Jul 2008 07:50:13 -0700 "Vikas Saini"
> > > > <vikas.saini@onstor.com> wrote:
> > > > 
> > > > > why against mightydog ? we should not use mightydog as our
> > > > > testing machine... we can run it against CS if needed..
> > > > >
> > > > >
> > > > > Vikas
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Andy Sharp
> > > > > Sent: Thu 7/17/2008 3:33 AM
> > > > > To: Paul Hammer
> > > > > Cc: Raj Kumar; Vikas Saini; Brian Nguyen; Ed Kwan; Jonathan
> > > > > Goldick Subject: Re: Permabit script for corruption
> > > > >
> > > > > Here is the unmangled script which I'm running against MD
> > > > > right now and for the next couple hours
> > > > >
> > > > > # /permabit/user is mounted from the onstor
> > > > > b=/permabit/user/trg/testing-trg
> > > > > # /u1 is a scratch area on local disk
> > > > > r=/u1/deleteme.`hostname`.random
> > > > > z=/u1/deleteme.`hostname`.zero
> > > > > t=$b/deleteme.`hostname`.target
> > > > > dd if=/dev/urandom of=$r count=1000000 bs=1024
> > > > > dd if=/dev/zero of=$z count=1 bs=1024
> > > > > mkdir $b ; cd $b
> > > > > for x in `seq 1 10000` ; do
> > > > >   echo "Time number: $x"
> > > > >   echo "copy 1 `date`"
> > > > >   time cp $r $t
> > > > >   echo "diff 1 `date`"
> > > > >   time diff $r $t
> > > > >   if [ $? != 0 ] ; then
> > > > >    echo broken
> > > > >   fi
> > > > >   echo "copy 2 `date`"
> > > > >   time cp $z $t
> > > > >   echo "diff 2 `date`"
> > > > >   time diff $z $t
> > > > >   if [ $? != 0 ] ; then
> > > > >    echo broken
> > > > >   fi
> > > > > done 2>&1 |tee $b/log.`hostname`
> > > > >
> > > > >
> > > > >
> > > > > On Wed, 16 Jul 2008 20:59:45 -0700 "Paul Hammer"
> > > > > <paul.hammer@onstor.com> wrote:
> > > > >
> > > > > > Hi Guys,
> > > > > >
> > > > > > If this really causes a corruption we have to figure it out
> > > > > > now. Can I ask that one of you take this action item to run
> > > > > > the script and see if this is really a tool that causes
> > > > > > corruptions with EverON? Please let me know who is taking on
> > > > > > this task. I only want us to run this script against
> > > > > > 4.0/3.3.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > -Paul
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Rich LaReau
> > > > > > Sent: 2008-07-16 13:58
> > > > > > To: dl-esc-l3
> > > > > > Subject: Permabit script for corruption
> > > > > >
> > > > > >
> > > > > > Hi Ed and team,
> > > > > >
> > > > > > This is the script that Permabit says they can use to
> > > > > > generate corruption.  I'll post a copy to the case and to
> > > > > > the associated defect.
> > > > > >
> > > > > > Rich
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Caeli Collins
> > > > > > Sent: Wednesday, July 16, 2008 1:37 PM
> > > > > > To: Rich LaReau
> > > > > > Subject: FW: script to reproduce the problem
> > > > > >
> > > > > > This is what I got
> > > > > >
> > > > > >
> > > > > > Caeli
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Tracy Gangwer [mailto:trg@permabit.com]
> > > > > > Sent: Wednesday, July 16, 2008 13:28
> > > > > > To: Caeli Collins
> > > > > > Cc: Clint McVey
> > > > > > Subject: script to reproduce the problem
> > > > > >
> > > > > > Caeli -
> > > > > >
> > > > > > Below is the script we discussed on the phone.   We are
> > > > > > seeing
> > > about
> > > > > > a 3% failure rate.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > trg
> > > > > >
> > > > > > # /permabit/user is mounted from the onstor
> > > > > > b=/permabit/user/trg/testing-trg # /u1 is a scratch area on
> > > > > > local disk r=/u1/deleteme.`hostname`.random
> > > z=/u1/deleteme.`hostname`.zero
> > > > > > t=$b/deleteme.`hostname`.target dd if=/dev/urandom of=$r
> > > > > > count=1000000 bs=1024 dd if=/dev/zero of=$z count=1 bs=1024
> > > > > > mkdir $b ; cd $b for x in `seq 1 10000` ; do
> > > > > >   echo "Time number: $x"
> > > > > >   echo "copy 1 `date`"
> > > > > >   time cp $r $t
> > > > > >   echo "diff 1 `date`"
> > > > > >   time diff $r $t
> > > > > >   if [ $? != 0 ] ; then
> > > > > >    echo broken
> > > > > >   fi
> > > > > >   echo "copy 2 `date`"
> > > > > >   time cp $z $t
> > > > > >   echo "diff 2 `date`"
> > > > > >   time diff $z $t
> > > > > >   if [ $? != 0 ] ; then
> > > > > >    echo broken
> > > > > >   fi
> > > > > > done 2>&1 |tee $b/log.`hostname`
> > > > > >
> > > > >
