AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080717151241.4126e999@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<john.rogers@onstor.com>,<ed.kwan@onstor.com>,<vikas.saini@onstor.com>,<paul.hammer@onstor.com>,<raj.kumar@onstor.com>,<brian.nguyen@onstor.com>,<jonathan.goldick@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E09C42773@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 17 Jul 2008 15:12:49 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: "John Rogers" <john.rogers@onstor.com>
Cc: "Ed Kwan" <ed.kwan@onstor.com>, "Vikas Saini" <vikas.saini@onstor.com>,
 "Paul Hammer" <paul.hammer@onstor.com>, "Raj Kumar" <raj.kumar@onstor.com>,
 "Brian Nguyen" <brian.nguyen@onstor.com>, "Jonathan Goldick"
 <jonathan.goldick@onstor.com>
Subject: Re: Permabit script for corruption
Message-ID: <20080717151249.4c8e97b5@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E09C42773@onstor-exch02.onstor.net>
References: <20080717101909.08a88175@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E09C42773@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 17 Jul 2008 15:03:50 -0700 "John Rogers"
<john.rogers@onstor.com> wrote:

> I initially didn't understand what you were talking about here. Let me
> get this right; You were running something called " Permabit script
> for corruption" against mightydog?
> 
> Is the intention of this script to cause corruption?

It's not a maliscious script if that's what you're wondering.  It
meerly copies a file to the server, compares it to the original, then
copies a different file to the server, and compares that to the
original.  In a loop.  It definitely should not ever fail to compare as
far as I can tell.  The script source is in this thread, if you scroll
down far enough.  It's quite trivial.

> -----Original Message-----
> From: Andy Sharp 
> Sent: Thursday, July 17, 2008 10:19 AM
> To: Ed Kwan
> Cc: Vikas Saini; Paul Hammer; Raj Kumar; Brian Nguyen; Jonathan
> Goldick; John Rogers
> Subject: Re: Permabit script for corruption
> 
> The "cp" command that shows last in the log file is still hung on my
> system.  If anyone cares to take a look.  Talk about a "performance"
> problem...
> 
> On Thu, 17 Jul 2008 10:11:07 -0700 "Ed Kwan" <ed.kwan@onstor.com>
> wrote:
> 
> > Brian and I have been running the script against 3.1.1.3 and 3.2.0.6
> > since yesterday night and evening respectively, and we haven't seen
> > any problem yet.
> > 
> > Also keep in mind Permabit is seeing I/O retries, high evm times,
> > plus 4 disks failures in the past 2 months.  CS is planning to
> > replace some of the DotHill array hardware.
> > 
> > > -----Original Message-----
> > > From: Andy Sharp
> > > Sent: Thursday, July 17, 2008 9:59 AM
> > > To: Vikas Saini
> > > Cc: Paul Hammer; Raj Kumar; Brian Nguyen; Ed Kwan; Jonathan
> > > Goldick Subject: Re: Permabit script for corruption
> > > 
> > > Cuz it's all I got.  Who's gonna care at 3am?  Anyway, it seemed
> > > to hang up after just a few iterations.  It was terminated on
> > > schedule a couple hours later.
> > > 
> > > 
> > > $ cat /home/andy/log.ripper
> > > +++ seq 1 10000
> > > ++ for x in '`seq 1 10000`'
> > > ++ echo 'Time number: 1'
> > > Time number: 1
> > > +++ date
> > > ++ echo 'copy random to onstor Thu Jul 17 03:22:50 PDT 2008'
> > > copy random to onstor Thu Jul 17 03:22:50 PDT 2008
> > > ++ cp /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > 
> > > real	5m1.846s
> > > user	0m0.006s
> > > sys	0m2.248s
> > > +++ date
> > > ++ echo 'diff 1 Thu Jul 17 03:27:52 PDT 2008'
> > > diff 1 Thu Jul 17 03:27:52 PDT 2008
> > > ++ diff /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > 
> > > real	4m45.411s
> > > user	0m1.166s
> > > sys	0m5.376s
> > > ++ '[' 0 '!=' 0 ']'
> > > +++ date
> > > ++ echo 'copy zero to onstor Thu Jul 17 03:32:38 PDT 2008'
> > > copy zero to onstor Thu Jul 17 03:32:38 PDT 2008
> > > ++ cp /u1/deleteme.ripper.zero ./deleteme.ripper.target
> > > 
> > > real	0m0.367s
> > > user	0m0.000s
> > > sys	0m0.167s
> > > +++ date
> > > ++ echo 'diff 2 Thu Jul 17 03:32:38 PDT 2008'
> > > diff 2 Thu Jul 17 03:32:38 PDT 2008
> > > ++ diff /u1/deleteme.ripper.zero ./deleteme.ripper.target
> > > 
> > > real	0m0.002s
> > > user	0m0.000s
> > > sys	0m0.001s
> > > ++ '[' 0 '!=' 0 ']'
> > > ++ for x in '`seq 1 10000`'
> > > ++ echo 'Time number: 2'
> > > Time number: 2
> > > +++ date
> > > ++ echo 'copy random to onstor Thu Jul 17 03:32:38 PDT 2008'
> > > copy random to onstor Thu Jul 17 03:32:38 PDT 2008
> > > ++ cp /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > 
> > > real	4m39.908s
> > > user	0m0.005s
> > > sys	0m2.166s
> > > +++ date
> > > ++ echo 'diff 1 Thu Jul 17 03:37:18 PDT 2008'
> > > diff 1 Thu Jul 17 03:37:18 PDT 2008
> > > ++ diff /u1/deleteme.ripper.random ./deleteme.ripper.target
> > > 
> > > real	4m27.721s
> > > user	0m1.166s
> > > sys	0m3.643s
> > > ++ '[' 0 '!=' 0 ']'
> > > +++ date
> > > ++ echo 'copy zero to onstor Thu Jul 17 03:41:46 PDT 2008'
> > > copy zero to onstor Thu Jul 17 03:41:46 PDT 2008
> > > ++ cp /u1/deleteme.ripper.zero ./deleteme.ripper.target
> > > 
> > > 
> > > That's all the farther it got.
> > > 
> > > 
> > > 
> > > 
> > > On Thu, 17 Jul 2008 07:50:13 -0700 "Vikas Saini"
> > > <vikas.saini@onstor.com> wrote:
> > > 
> > > > why against mightydog ? we should not use mightydog as our
> > > > testing machine... we can run it against CS if needed..
> > > >
> > > >
> > > > Vikas
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Andy Sharp
> > > > Sent: Thu 7/17/2008 3:33 AM
> > > > To: Paul Hammer
> > > > Cc: Raj Kumar; Vikas Saini; Brian Nguyen; Ed Kwan; Jonathan
> > > > Goldick Subject: Re: Permabit script for corruption
> > > >
> > > > Here is the unmangled script which I'm running against MD right
> > > > now and for the next couple hours
> > > >
> > > > # /permabit/user is mounted from the onstor
> > > > b=/permabit/user/trg/testing-trg
> > > > # /u1 is a scratch area on local disk
> > > > r=/u1/deleteme.`hostname`.random
> > > > z=/u1/deleteme.`hostname`.zero
> > > > t=$b/deleteme.`hostname`.target
> > > > dd if=/dev/urandom of=$r count=1000000 bs=1024
> > > > dd if=/dev/zero of=$z count=1 bs=1024
> > > > mkdir $b ; cd $b
> > > > for x in `seq 1 10000` ; do
> > > >   echo "Time number: $x"
> > > >   echo "copy 1 `date`"
> > > >   time cp $r $t
> > > >   echo "diff 1 `date`"
> > > >   time diff $r $t
> > > >   if [ $? != 0 ] ; then
> > > >    echo broken
> > > >   fi
> > > >   echo "copy 2 `date`"
> > > >   time cp $z $t
> > > >   echo "diff 2 `date`"
> > > >   time diff $z $t
> > > >   if [ $? != 0 ] ; then
> > > >    echo broken
> > > >   fi
> > > > done 2>&1 |tee $b/log.`hostname`
> > > >
> > > >
> > > >
> > > > On Wed, 16 Jul 2008 20:59:45 -0700 "Paul Hammer"
> > > > <paul.hammer@onstor.com> wrote:
> > > >
> > > > > Hi Guys,
> > > > >
> > > > > If this really causes a corruption we have to figure it out
> > > > > now. Can I ask that one of you take this action item to run
> > > > > the script and see if this is really a tool that causes
> > > > > corruptions with EverON? Please let me know who is taking on
> > > > > this task. I only want us to run this script against 4.0/3.3.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > -Paul
> > > > >
> > > > > -----Original Message-----
> > > > > From: Rich LaReau
> > > > > Sent: 2008-07-16 13:58
> > > > > To: dl-esc-l3
> > > > > Subject: Permabit script for corruption
> > > > >
> > > > >
> > > > > Hi Ed and team,
> > > > >
> > > > > This is the script that Permabit says they can use to generate
> > > > > corruption.  I'll post a copy to the case and to the
> > > > > associated defect.
> > > > >
> > > > > Rich
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Caeli Collins
> > > > > Sent: Wednesday, July 16, 2008 1:37 PM
> > > > > To: Rich LaReau
> > > > > Subject: FW: script to reproduce the problem
> > > > >
> > > > > This is what I got
> > > > >
> > > > >
> > > > > Caeli
> > > > >
> > > > > -----Original Message-----
> > > > > From: Tracy Gangwer [mailto:trg@permabit.com]
> > > > > Sent: Wednesday, July 16, 2008 13:28
> > > > > To: Caeli Collins
> > > > > Cc: Clint McVey
> > > > > Subject: script to reproduce the problem
> > > > >
> > > > > Caeli -
> > > > >
> > > > > Below is the script we discussed on the phone.   We are seeing
> > about
> > > > > a 3% failure rate.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > trg
> > > > >
> > > > > # /permabit/user is mounted from the onstor
> > > > > b=/permabit/user/trg/testing-trg # /u1 is a scratch area on
> > > > > local disk r=/u1/deleteme.`hostname`.random
> > z=/u1/deleteme.`hostname`.zero
> > > > > t=$b/deleteme.`hostname`.target dd if=/dev/urandom of=$r
> > > > > count=1000000 bs=1024 dd if=/dev/zero of=$z count=1 bs=1024
> > > > > mkdir $b ; cd $b for x in `seq 1 10000` ; do
> > > > >   echo "Time number: $x"
> > > > >   echo "copy 1 `date`"
> > > > >   time cp $r $t
> > > > >   echo "diff 1 `date`"
> > > > >   time diff $r $t
> > > > >   if [ $? != 0 ] ; then
> > > > >    echo broken
> > > > >   fi
> > > > >   echo "copy 2 `date`"
> > > > >   time cp $z $t
> > > > >   echo "diff 2 `date`"
> > > > >   time diff $z $t
> > > > >   if [ $? != 0 ] ; then
> > > > >    echo broken
> > > > >   fi
> > > > > done 2>&1 |tee $b/log.`hostname`
> > > > >
> > > >
