X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C867B7.607C0CB8@onstor-exch02.onstor.net>; Mon, 4 Feb 2008 22:24:27 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C867B7.607C0CB8"
References: <BB375AF679D4A34E9CA8DFA650E2B04E0812B3FF@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E0812B405@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E0812B708@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E0812B70D@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E0826481F@onstor-exch02.onstor.net>
Content-class: urn:content-classes:message
Subject: RE: 3.2 corruption update
Date: Mon, 4 Feb 2008 22:23:29 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E2AD95D@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: 3.2 corruption update
Thread-Index: Achmt8jM3FGfeB2VSrWjodNwBbWifwAVIiiAABPrViAAAIIZoAAAFptwAAA36iAAAH0vcAAAd3BAAAAJz+AAANGg4AAAsotgAAAchrAADBY0oAAADOxwAAHEYKAABXiSzA==
From: "Eric Barrett" <eric.barrett@onstor.com>
To: "Ed Kwan" <ed.kwan@onstor.com>,
	"Jonathan Goldick" <jonathan.goldick@onstor.com>,
	"Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>,
	"Paul Hammer" <paul.hammer@onstor.com>,
	"Danqing Jin" <danqing.jin@onstor.com>,
	"Brian Nguyen" <brian.nguyen@onstor.com>
Cc: "Andy Sharp" <andy.sharp@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C867B7.607C0CB8
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I'd like to do one more test before we send it out, on the latest =
kernel.org build.  I can help build and install such a beast tomorrow.


-----Original Message-----
From: Ed Kwan
Sent: Mon 2/4/2008 6:48 PM
To: Jonathan Goldick; Maxim Kozlovsky; Paul Hammer; Danqing Jin; Brian =
Nguyen
Cc: Andy Sharp; Eric Barrett
Subject: RE: 3.2 corruption update
=20
Adding Eric.

He expressed an interest in following up on this.

=20

________________________________

From: Jonathan Goldick=20
Sent: Monday, February 04, 2008 5:57 PM
To: Maxim Kozlovsky; Paul Hammer; Ed Kwan; Danqing Jin; Brian Nguyen
Cc: Andy Sharp
Subject: RE: 3.2 corruption update

=20

Let's package it up and see what the Linux community sais on this point, =
assuming there is no fix already available.

=20

CC'ing Andy since he reads more Linux news than I do.

=20

________________________________

From: Maxim Kozlovsky=20
Sent: Monday, February 04, 2008 5:55 PM
To: Paul Hammer; Ed Kwan; Danqing Jin; Brian Nguyen
Cc: Jonathan Goldick
Subject: RE: 3.2 corruption update

=20

I've checked that Linux clients to Linux server have the same problem.

=20

________________________________

From: Paul Hammer=20
Sent: Monday, February 04, 2008 12:10 PM
To: Maxim Kozlovsky; Ed Kwan; Danqing Jin; Brian Nguyen
Cc: Jonathan Goldick
Subject: RE: 3.2 corruption update

=20

Sounds good. Danqing and Brian N can you please do this ASAP?

=20

________________________________

From: Maxim Kozlovsky=20
Sent: Monday, February 04, 2008 12:08 PM
To: Maxim Kozlovsky; Paul Hammer; Ed Kwan; Danqing Jin
Cc: Jonathan Goldick
Subject: RE: 3.2 corruption update

=20

I'd like to see the 3.2 corruption where non-null characters where =
inserted reproduced while collecting a network trace on all the clients. =
I want to confirm that no client has written the bad data on its own =
initiative.

=20

________________________________

From: Maxim Kozlovsky=20
Sent: Monday, February 04, 2008 11:53 AM
To: Paul Hammer; Ed Kwan; Danqing Jin
Cc: Jonathan Goldick
Subject: RE: 3.2 corruption update

=20

The null thing appears to be a client bug. I have a trace that shows =
that, /homes/maximk/tmp/f1.eth.

=20

In the resulting file we had a sequence of 7 nulls at the offset 19292. =
Here is what the client done:

=20

Frame 4640-4643 - the client wrote at offset 16384 2908 bytes, resulting =
in file length 19292

=20

Frame 4644-4647 - couple of getattr calls - still returning file length =
19292

=20

Access call at frame 4648-4649  - returning the file length 19299. The =
file was updated by another client, this advanced the mtime/ctime.

=20

Frame 4652 - Now instead of reading back the new data, the client pads =
the missing data at 19292-19299 with zeroes and proceeds with writing at =
offset 16384 length 2922.=20

=20

I wonder if the client could pad the missing data with something else =
instead of zeroes as well. This will explain the 3.2 corruption.

=20

=20

________________________________

From: Paul Hammer=20
Sent: Monday, February 04, 2008 11:23 AM
To: Ed Kwan; Maxim Kozlovsky; Danqing Jin
Cc: Jonathan Goldick
Subject: RE: 3.2 corruption update

=20

Thanks, please let me know what the next steps are, we need to keep =
meeting and planning multiple times a day until we are close this one =
out.

=20

________________________________

From: Ed Kwan=20
Sent: Monday, February 04, 2008 11:21 AM
To: Paul Hammer; Maxim Kozlovsky; Danqing Jin
Cc: Jonathan Goldick
Subject: RE: 3.2 corruption update

=20

Discussing this right now in the Escalation Meeting.

=20

________________________________

From: Paul Hammer=20
Sent: Monday, February 04, 2008 11:10 AM
To: Maxim Kozlovsky; Ed Kwan; Danqing Jin
Cc: Jonathan Goldick
Subject: FW: 3.2 corruption update

=20

Could the nulls be a problem with the test code/application?

=20

________________________________

From: Danqing Jin=20
Sent: Monday, February 04, 2008 10:54 AM
To: Paul Hammer
Cc: Ed Kwan
Subject: RE: 3.2 corruption update

=20

Yes.  BTW, the corruption is more subtle than what we saw with 3.2.

=20

________________________________

From: Paul Hammer=20
Sent: Monday, February 04, 2008 10:48 AM
To: Danqing Jin
Cc: Ed Kwan
Subject: RE: 3.2 corruption update

Just to confirm, this was a clean file (new or un-corrupted), and now =
there are 10 instances of corruption? Please confirm asap.

=20

________________________________

From: Danqing Jin=20
Sent: Monday, February 04, 2008 10:47 AM
To: Paul Hammer
Cc: Ed Kwan
Subject: RE: 3.2 corruption update

=20

The test aborted around 4am because VPN connection on my home PC was =
disconnected, but there are up to 10 or so sightings of NULL inserted =
into the file.

=20

________________________________

From: Paul Hammer=20
Sent: Monday, February 04, 2008 10:31 AM
To: Danqing Jin
Cc: Ed Kwan
Subject: RE: 3.2 corruption update

Thanks, how did the over night testing go? Need an update as soon as you =
can get me one. Thanks,

=20

=20

-Paul

=20

________________________________

From: Danqing Jin=20
Sent: Monday, February 04, 2008 1:44 AM
To: Paul Hammer
Subject: RE: 3.2 corruption update

=20

Paul,

=20

Jobi talked to me this evening, and both of us were thinking that we =
probably should run the same test against 3.1 system as well just as =
reference (or as a blank test), which may give us one more data point.  =
So I started running the same test against my filer over night.  Some =
very initial result seems to indicate that issue we saw in sub 5 may =
also exist in 3.1 where some content is replaced by NULL characters (or =
the NULLs were inserted), I need still let the test run through the =
night to get a larger sample.

=20

Thanks,

-Danqing-

=20

________________________________

From: Paul Hammer=20
Sent: Sunday, February 03, 2008 2:55 PM
To: Ed Kwan; Jobi Ariyamannil; Sandrine Boulanger; Brian Nguyen; John =
Rogers; Eric Barrett; Danqing Jin; Jonathan Goldick; Maxim Kozlovsky
Cc: Caeli Collins; Bob Miller
Subject: 3.2 corruption update

Hi All,

=20

Please send your latest status updates to me or this list by 8:30 =
tomorrow morning; if you have any new news/understanding of the issue at =
hand. I want us to have the most current information so that Bob, Caeli =
and I can discuss our customer facing options in the morning.

=20

Thanks all,

=20

-Paul



------_=_NextPart_001_01C867B7.607C0CB8
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7653.38">
<TITLE>RE: 3.2 corruption update</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>I'd like to do one more test before we send it out, on =
the latest kernel.org build.&nbsp; I can help build and install such a =
beast tomorrow.<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: Ed Kwan<BR>
Sent: Mon 2/4/2008 6:48 PM<BR>
To: Jonathan Goldick; Maxim Kozlovsky; Paul Hammer; Danqing Jin; Brian =
Nguyen<BR>
Cc: Andy Sharp; Eric Barrett<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
Adding Eric.<BR>
<BR>
He expressed an interest in following up on this.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Jonathan Goldick<BR>
Sent: Monday, February 04, 2008 5:57 PM<BR>
To: Maxim Kozlovsky; Paul Hammer; Ed Kwan; Danqing Jin; Brian Nguyen<BR>
Cc: Andy Sharp<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
Let's package it up and see what the Linux community sais on this point, =
assuming there is no fix already available.<BR>
<BR>
<BR>
<BR>
CC'ing Andy since he reads more Linux news than I do.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Maxim Kozlovsky<BR>
Sent: Monday, February 04, 2008 5:55 PM<BR>
To: Paul Hammer; Ed Kwan; Danqing Jin; Brian Nguyen<BR>
Cc: Jonathan Goldick<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
I've checked that Linux clients to Linux server have the same =
problem.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Paul Hammer<BR>
Sent: Monday, February 04, 2008 12:10 PM<BR>
To: Maxim Kozlovsky; Ed Kwan; Danqing Jin; Brian Nguyen<BR>
Cc: Jonathan Goldick<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
Sounds good. Danqing and Brian N can you please do this ASAP?<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Maxim Kozlovsky<BR>
Sent: Monday, February 04, 2008 12:08 PM<BR>
To: Maxim Kozlovsky; Paul Hammer; Ed Kwan; Danqing Jin<BR>
Cc: Jonathan Goldick<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
I'd like to see the 3.2 corruption where non-null characters where =
inserted reproduced while collecting a network trace on all the clients. =
I want to confirm that no client has written the bad data on its own =
initiative.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Maxim Kozlovsky<BR>
Sent: Monday, February 04, 2008 11:53 AM<BR>
To: Paul Hammer; Ed Kwan; Danqing Jin<BR>
Cc: Jonathan Goldick<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
The null thing appears to be a client bug. I have a trace that shows =
that, /homes/maximk/tmp/f1.eth.<BR>
<BR>
<BR>
<BR>
In the resulting file we had a sequence of 7 nulls at the offset 19292. =
Here is what the client done:<BR>
<BR>
<BR>
<BR>
Frame 4640-4643 - the client wrote at offset 16384 2908 bytes, resulting =
in file length 19292<BR>
<BR>
<BR>
<BR>
Frame 4644-4647 - couple of getattr calls - still returning file length =
19292<BR>
<BR>
<BR>
<BR>
Access call at frame 4648-4649&nbsp; - returning the file length 19299. =
The file was updated by another client, this advanced the =
mtime/ctime.<BR>
<BR>
<BR>
<BR>
Frame 4652 - Now instead of reading back the new data, the client pads =
the missing data at 19292-19299 with zeroes and proceeds with writing at =
offset 16384 length 2922.<BR>
<BR>
<BR>
<BR>
I wonder if the client could pad the missing data with something else =
instead of zeroes as well. This will explain the 3.2 corruption.<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Paul Hammer<BR>
Sent: Monday, February 04, 2008 11:23 AM<BR>
To: Ed Kwan; Maxim Kozlovsky; Danqing Jin<BR>
Cc: Jonathan Goldick<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
Thanks, please let me know what the next steps are, we need to keep =
meeting and planning multiple times a day until we are close this one =
out.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Ed Kwan<BR>
Sent: Monday, February 04, 2008 11:21 AM<BR>
To: Paul Hammer; Maxim Kozlovsky; Danqing Jin<BR>
Cc: Jonathan Goldick<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
Discussing this right now in the Escalation Meeting.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Paul Hammer<BR>
Sent: Monday, February 04, 2008 11:10 AM<BR>
To: Maxim Kozlovsky; Ed Kwan; Danqing Jin<BR>
Cc: Jonathan Goldick<BR>
Subject: FW: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
Could the nulls be a problem with the test code/application?<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Danqing Jin<BR>
Sent: Monday, February 04, 2008 10:54 AM<BR>
To: Paul Hammer<BR>
Cc: Ed Kwan<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
Yes.&nbsp; BTW, the corruption is more subtle than what we saw with =
3.2.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Paul Hammer<BR>
Sent: Monday, February 04, 2008 10:48 AM<BR>
To: Danqing Jin<BR>
Cc: Ed Kwan<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
Just to confirm, this was a clean file (new or un-corrupted), and now =
there are 10 instances of corruption? Please confirm asap.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Danqing Jin<BR>
Sent: Monday, February 04, 2008 10:47 AM<BR>
To: Paul Hammer<BR>
Cc: Ed Kwan<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
The test aborted around 4am because VPN connection on my home PC was =
disconnected, but there are up to 10 or so sightings of NULL inserted =
into the file.<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Paul Hammer<BR>
Sent: Monday, February 04, 2008 10:31 AM<BR>
To: Danqing Jin<BR>
Cc: Ed Kwan<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
Thanks, how did the over night testing go? Need an update as soon as you =
can get me one. Thanks,<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
-Paul<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Danqing Jin<BR>
Sent: Monday, February 04, 2008 1:44 AM<BR>
To: Paul Hammer<BR>
Subject: RE: 3.2 corruption update<BR>
<BR>
<BR>
<BR>
Paul,<BR>
<BR>
<BR>
<BR>
Jobi talked to me this evening, and both of us were thinking that we =
probably should run the same test against 3.1 system as well just as =
reference (or as a blank test), which may give us one more data =
point.&nbsp; So I started running the same test against my filer over =
night.&nbsp; Some very initial result seems to indicate that issue we =
saw in sub 5 may also exist in 3.1 where some content is replaced by =
NULL characters (or the NULLs were inserted), I need still let the test =
run through the night to get a larger sample.<BR>
<BR>
<BR>
<BR>
Thanks,<BR>
<BR>
-Danqing-<BR>
<BR>
<BR>
<BR>
________________________________<BR>
<BR>
From: Paul Hammer<BR>
Sent: Sunday, February 03, 2008 2:55 PM<BR>
To: Ed Kwan; Jobi Ariyamannil; Sandrine Boulanger; Brian Nguyen; John =
Rogers; Eric Barrett; Danqing Jin; Jonathan Goldick; Maxim Kozlovsky<BR>
Cc: Caeli Collins; Bob Miller<BR>
Subject: 3.2 corruption update<BR>
<BR>
Hi All,<BR>
<BR>
<BR>
<BR>
Please send your latest status updates to me or this list by 8:30 =
tomorrow morning; if you have any new news/understanding of the issue at =
hand. I want us to have the most current information so that Bob, Caeli =
and I can discuss our customer facing options in the morning.<BR>
<BR>
<BR>
<BR>
Thanks all,<BR>
<BR>
<BR>
<BR>
-Paul<BR>
<BR>
<BR>
</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C867B7.607C0CB8--
