X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C8840E.A7BE6148@onstor-exch02.onstor.net>; Tue, 11 Mar 2008 23:59:45 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C8840E.A7BE6148"
Content-class: urn:content-classes:message
Subject: testing status of new branch
Date: Tue, 11 Mar 2008 23:59:54 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E08D2A2B3@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: testing status of new branch
Thread-Index: AciEDqz17SM225xWRL2fVoVc9Iq8HQ==
From: "Jonathan Goldick" <jonathan.goldick@onstor.com>
To: "dl-Cougar" <dl-Cougar@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C8840E.A7BE6148
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Here are the problems we are hitting and the progress on resolution:
1.	TED00022751 SCSI Timers are not SMP safe - Bill has coded up a
change to make the Cougar scsi timer code get locks.  Tim will review it
and get it into the branch.  This has been causing dump/restore to crash
about 25% of the time.
2.	We had a crash in dump which looks like a double free of an
e-descriptor in the tape write path.  The root cause remains to be
identified.  Mike Lee fixed a similar about a month ago.
3.	There is a significant slowdown that happens under a combined
restore load, core dump copy to mgmt volume, and a dd write.  There is
little dirty data but we are spending huge times waiting for file system
locks.  Basically there is some throttle that I have not found in my
testing.  In the interim I have checked in a change that defaults the
I/O coalescing for log writes to the 'dev' branch behavior.  When Tim
and I turned the knob to this on a machine that was crawling, it started
running fast again.  I will continue to work this issue in my
fb-jong-perf2 branch after the submittal is done, a QA resource will be
needed to reproduce the problem and I could use Jobi and/or Amit's help
to find the cause of the slowdown.
=09
I think that only number 1 above is blocking the submittal at this
point.


------_=_NextPart_001_01C8840E.A7BE6148
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7653.38">
<TITLE>testing status of new branch</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial">H</FONT><FONT SIZE=3D2 FACE=3D"Arial">ere are =
the problems we are hitting and the progress on =
resolution:</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> =
<FONT SIZE=3D2 FACE=3D"Courier New">TED00022751</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Courier New"></FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT SIZE=3D2 FACE=3D"Courier New">SCSI Timers are not =
SMP safe</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Courier New"></FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Courier New">&#8211;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Courier New"> Bill</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Courier New">has coded up a change to make the Cougar scsi timer =
code get locks.&nbsp;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT SIZE=3D2 FACE=3D"Courier New">Tim</FONT> <FONT =
SIZE=3D2 FACE=3D"Courier New">will</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Courier New"> review it</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Courier New"> and get it into the branch</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Courier New">.</FONT><FONT SIZE=3D2 FACE=3D"Courier New">&nbsp; =
This has been causing dump/restore to crash about 25% of the =
time</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Courier New">.</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">We had a crash in dump which looks like a double free of =
an e-descriptor</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial"> in the tape write =
path</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial">.&nbsp; The root cause remains to be =
identified.&nbsp; Mike Lee fixed a similar</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">about a month ago.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">3.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">There is a significant</FONT> <FONT SIZE=3D2 =
FACE=3D"Arial">slowdown that happens under a combined restore load, core =
dump copy to mgmt volume, and a dd write.&nbsp; There</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">is little dirty data but we are spending huge times =
waiting for file system locks.&nbsp; Basically there is some throttle =
tha</FONT><FONT SIZE=3D2 FACE=3D"Arial">t I have not found in my =
testing.&nbsp; In the interim I have checked in a change that defaults =
the I/O coalescing for log writes to the</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">&#8216;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">dev</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial"> branch behavior.&nbsp; =
When Tim and I turned the knob to this on a machine that was crawling, =
it started running</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT SIZE=3D2 FACE=3D"Arial">fast again.&nbsp; I will =
continue to work this issue in my fb-jong-perf2 branch after the =
submittal is done, a QA resource</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">w</FONT><FONT SIZE=3D2 =
FACE=3D"Arial">ill</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial"> be needed =
to</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial"></FONT> <FONT SIZE=3D2 FACE=3D"Arial">reproduce =
the problem</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial"></FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D2 =
FACE=3D"Arial">a</FONT><FONT SIZE=3D2 FACE=3D"Arial">nd I could use Jobi =
and/or Amit</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">s help to find the cause of the slow</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">down.</FONT></SPAN></P>
<UL DIR=3DLTR>
<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>
</UL>
<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial">I think that only number 1 above is blocking the =
submittal at this point.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

</BODY>
</HTML>
------_=_NextPart_001_01C8840E.A7BE6148--
