X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C880F1.46656EA8@onstor-exch02.onstor.net>; Sat, 8 Mar 2008 00:51:53 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C880F1.46656EA8"
Content-class: urn:content-classes:message
Subject: RE: in-branch testing status for fb-jong-perf2
Date: Sat, 8 Mar 2008 00:52:02 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E08C0F554@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E08C0F542@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: in-branch testing status for fb-jong-perf2
Thread-Index: AciA0GA6mppkfybfRe2ItgQQzYJ9lwAH371g
References: <BB375AF679D4A34E9CA8DFA650E2B04E08C0F542@onstor-exch02.onstor.net>
From: "Jonathan Goldick" <jonathan.goldick@onstor.com>
To: "dl-Cougar" <dl-Cougar@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C880F1.46656EA8
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Status update:

Dump/Restore to/from SAN tape via ndmpc is working now.

What doesn't work yet in decreasing priority:
1.	When we abort an in-progress dump we appear to be leaking a scsi
descriptor causing an ASSERT in my resource leak detection logic.  This
may be a bug in my stats.
2.	I fail one of my underflow sanity checks when we take a file
system exception.  I'm not sure if it's a true double free or a bug in
my stats.
3.	The I/O coalescing knob for log writes does not work when set
beyond 32, it starts sending smaller chunks.  I can reproduce this
without Cougar.  I have set the defaults to be 32 since that works
properly on all platforms.
4.	My scsi resource leak detection logic doesn't work for Cougar so
I've ifdef'd it out for that platform.  I will revisit it later on.

On a related topic to the above, Tim observed that the way we abort NDMP
sessions probably won't be safe for Cougar.  We may be freeing
edescriptor(s) while there still could be I/O(s) outstanding to scsi.
This needs investigation but it not related to my changes so doesn't
prevent this code from being integrated should it pass the remaining
tests.



_____________________________________________
From: Jonathan Goldick=20
Sent: Friday, March 07, 2008 7:56 PM
To: dl-Cougar
Subject: in-branch testing status for fb-jong-perf2

Raj and Sandrine have helped me make some real problem in shaking out
the Cougar-specific bugs.

It would appear that our LUN labeling and missing LUN problems are
resolved as well the device id mismatch that Tim was working on.  Cougar
soak seems to work reliably.

We can run nfsperftest with a variety of load profiles and I/O(s) are
being coalesced.

What doesn't work yet in decreasing priority:
5.	NDMP dump is failing early on so I broke tape I/O on Bobcat.
Raj has a scsi trace but I will likely need help from Tim.
6.	I fail one of my underflow sanity checks when we take a file
system exception.  I'm not sure if it's a true double free or a bug in
my stats.
7.	The I/O coalescing knob for log writes does not work when set
beyond 32, it starts sending smaller chunks.  I can reproduce this
without Cougar.  I have set the defaults to be 32 since that works
properly on all platforms.
8.	My scsi resource leak detection logic doesn't work for Cougar so
I've ifdef'd it out for that platform.  I will revisit it later on.



------_=_NextPart_001_01C880F1.46656EA8
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7653.38">
<TITLE>RE: in-branch testing status for fb-jong-perf2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">S</FONT><FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">tatus</FONT> <FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">update:</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">Dump</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">/Restore</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial"></FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">to</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">/from</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial"> SAN tape =
via ndmpc</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> =
<FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">is working =
now.</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial">What doesn</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 FACE=3D"Arial">t work yet in =
decreasing priority:</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">W</FONT><FONT SIZE=3D2 FACE=3D"Arial">hen we abort =
an</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT =
SIZE=3D2 FACE=3D"Arial">in-progress dump we appear to be leaking a scsi =
descriptor causing an ASSERT</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial"> in my resource leak detection logic</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">.</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">&nbsp; This may be a bug in =
my stats.</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">I fail one of my underflow sanity checks when we take a =
file system exception.&nbsp; I</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 FACE=3D"Arial">m not sure if =
it</FONT><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">s a true double free or a bug in my =
stats.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">3.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">The I/O coalescing</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">knob for log writes does not work when set beyond 32, it =
starts sending smaller chunks.&nbsp; I can reproduce this without =
Cougar.&nbsp; I have set the defaults to be 32 since that works properly =
on all platforms.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">4.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">My scsi resource leak detection logic doesn</FONT><FONT =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 FACE=3D"Arial">t =
wor</FONT><FONT SIZE=3D2 FACE=3D"Arial">k for Cougar so I</FONT><FONT =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 FACE=3D"Arial">ve =
ifdef</FONT><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">d it out for that platform.&nbsp; I will revisit it later =
on</FONT><FONT SIZE=3D2 FACE=3D"Arial">.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">On a related topic to the =
above, Tim observed that the way we abort NDMP sessions probably =
won</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">t be safe for Cougar.&nbsp; =
We</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">m</FONT><FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">ay be</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial"> freeing</FONT> <FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">e</FONT><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">descriptor</FONT><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">(s)</FONT><FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial"></FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">while =
there still could be I/O(s) outstanding to scsi.</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&nbsp; This needs investigation but it not =
related to my changes so doesn</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">t prevent this code from being integrated should =
it pass the remaining tests.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">From:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Jonathan Goldick<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Sent:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Friday, March 07, 2008 =
7:56 PM<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">To:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> dl-Cougar<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Subject:</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma"></FONT> <FONT SIZE=3D2 FACE=3D"Tahoma">in-branch testing =
status for fb-jong-perf2</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial">Raj and Sandrine have helped me make some real =
problem in shaking out the Cougar-specific bugs.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">It would =
appear that our LUN labeling and missing LUN problems are resolved as =
well</FONT><FONT SIZE=3D2 FACE=3D"Arial"> the device id mismatch that =
Tim was working on.&nbsp; Cougar soak seems to work =
reliably.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">We can =
run nfsperftest with a variety of load profiles and I/O(s) are being =
coalesced.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">What =
doesn</FONT><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">t work yet in decreasing priority:</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">5.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">NDMP dump is failing early on so</FONT><FONT SIZE=3D2 =
FACE=3D"Arial"> I broke tape I/O on Bobcat.&nbsp; Raj has a scsi trace =
but I will likely need help from Tim.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">6.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">I fail one of my underflow sanity checks when we take a =
file system exception.&nbsp; I</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 FACE=3D"Arial">m not sure if =
it</FONT><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">s a true double free or a bug in my =
stats.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">7.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">The I/O coalescing</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">knob for log writes does not work when set beyond 32, it =
starts sending smaller chunks.&nbsp; I can reproduce this without =
Cougar.&nbsp; I have set the defaults to be 32 since that works properly =
on all platforms.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">8.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">My scsi resource leak detection logic doesn</FONT><FONT =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 FACE=3D"Arial">t =
wor</FONT><FONT SIZE=3D2 FACE=3D"Arial">k for Cougar so I</FONT><FONT =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 FACE=3D"Arial">ve =
ifdef</FONT><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">d it out for that platform.&nbsp; I will revisit it later =
on.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

</BODY>
</HTML>
------_=_NextPart_001_01C880F1.46656EA8--
