X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C7665B.C3C93164@onstor-exch02.onstor.net>; Wed, 14 Mar 2007 10:11:11 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C7665B.C3C93164"
Content-class: urn:content-classes:message
Subject: RE: False ECC errors
Date: Wed, 14 Mar 2007 10:11:11 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E02D8F37D@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E02D8F379@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: False ECC errors
Thread-Index: AcdmWBVGnOpHDdxeREy2hMOjobI4YwAAW/HAAACJicA=
References: <BB375AF679D4A34E9CA8DFA650E2B04E02D8F343@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E02D8F379@onstor-exch02.onstor.net>
From: "Jonathan Goldick" <jonathan.goldick@onstor.com>
To: "Brian Stark" <brian.stark@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C7665B.C3C93164
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

It wasn't Andy, but I don't remember who else it could have been.  Will
try to track down where the idea started :-)


_____________________________________________
From: Brian Stark=20
Sent: Wednesday, March 14, 2007 10:10 AM
To: Jonathan Goldick; Andy Sharp
Subject: RE: False ECC errors

Wow, I haven't heard anything about this.  For ECC errors on the SiByte,
we are looking at the uncorrectable error counter on the SiByte itself.
Does this have anything to do with an invalid pointer access?  Can this
counter be incremented for a reason other than a real ECC error?

This is definitely something we need to get to the bottom of.  We got
the system back from Facebook that reported several ECC errors that were
thought to be real because of the SiByte counter, but we have yet to
find anything wrong with it in the hardware lab.  The tests we are
running are designed to specifically tickle ECC errors, and we've yet to
see a system that experienced ECC errors in normal op and then didn't
have them with this test. =20

I'm starting to worry that this counter is either wrong or that
environmental influences at some customer sites are causing real ECC
errors.  Obviously, neither of these is good.


Brian


	_____________________________________________=20
	From: 	Jonathan Goldick =20
	Sent:	Wednesday, March 14, 2007 9:45 AM
	To:	Andy Sharp
	Cc:	Brian Stark
	Subject:	False ECC errors

	Andy,

	I seem to remember you mentioning that we report an ECC error
when in reality this is an invalid pointer access.  Please confirm since
we are still RMA'ing boxes for ECC errors that may not be real.

	Thanks,

	Jonathan

------_=_NextPart_001_01C7665B.C3C93164
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7652.24">
<TITLE>RE: False ECC errors</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">It wasn</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">t Andy, but I don</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">&#8217;</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT COLOR=3D"#000080" =
SIZE=3D2 FACE=3D"Arial">t remember who else it could have =
been.</FONT>&nbsp;<FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial"> =
W</FONT><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">ill try to track =
down where the</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT COLOR=3D"#000080" SIZE=3D2 =
FACE=3D"Arial">idea</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial"> =
started</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> =
<FONT FACE=3D"Wingdings" SIZE=3D2>J</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">From:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Brian Stark<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Sent:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Wednesday, March 14, 2007 =
10:10 AM<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">To:</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Tahoma"> Jonathan Goldick; Andy =
Sharp<BR>
</FONT></SPAN><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D2 =
FACE=3D"Tahoma">Subject:</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Tahoma"> RE: False ECC errors</FONT></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">Wow, I haven't heard =
an</FONT><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">ything about =
this.&nbsp; For ECC errors on the SiByte, we are looking at the =
uncorrectable error counter on the SiByte itself.&nbsp; Does this have =
anything to do with an invalid pointer access?&nbsp; Can this counter be =
incremented for a reason other than a real ECC er</FONT><FONT =
COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">r</FONT><FONT =
COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">or?</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT COLOR=3D"#0000FF" SIZE=3D2 =
FACE=3D"Arial">This is definitely something we need to get to the bottom =
of.&nbsp; We got the system back from Facebook that reported several ECC =
errors that were thought to be real because of the SiByte counter, but =
we have yet to find anything wrong with it in the hard</FONT><FONT =
COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">ware lab.&nbsp; The tests we =
are running are designed to specifically tickle ECC errors, and we've =
yet to see a system that experienced ECC errors in normal op and then =
didn't have them with this test.&nbsp; </FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT COLOR=3D"#0000FF" SIZE=3D2 =
FACE=3D"Arial">I'm starting to worry that this counter is either =
wrong</FONT> <FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">or that =
environmental influences at some customer sites are causing real ECC =
errors.&nbsp; Obviously, neither of these is good.</FONT></SPAN></P>
<BR>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT COLOR=3D"#0000FF" SIZE=3D2 =
FACE=3D"Arial">Brian</FONT></SPAN></P>
<BR>
<UL DIR=3DLTR>
<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D1 FACE=3D"Tahoma">_____________________________________________ =
</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D1 FACE=3D"Tahoma">From: =
&nbsp;</FONT></B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"> <FONT SIZE=3D1 FACE=3D"Tahoma">Jonathan Goldick&nbsp; =
</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">Sent:&nbsp;&nbsp;</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D1 =
FACE=3D"Tahoma">Wednesday, March 14, 2007 9:45 AM</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">To:&nbsp;&nbsp;&nbsp;&nbsp;</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D1 =
FACE=3D"Tahoma">Andy Sharp</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">Cc:&nbsp;&nbsp;&nbsp;&nbsp;</FONT></B></SPAN><SPAN =
LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT SIZE=3D1 =
FACE=3D"Tahoma">Brian Stark</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><B></B></SPAN><SPAN =
LANG=3D"en-us"><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">Subject:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT>=
</B></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"> <FONT =
SIZE=3D1 FACE=3D"Tahoma">False ECC errors</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"></SPAN><SPAN LANG=3D"en-us"><FONT =
SIZE=3D2 FACE=3D"Arial">Andy,</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 FACE=3D"Arial">I seem =
to remember you mentioning that we report an ECC error when in reality =
this is an invalid pointer access.&nbsp; Please confirm since we are =
still RMA</FONT><FONT SIZE=3D2 FACE=3D"Arial">&#8217;</FONT><FONT =
SIZE=3D2 FACE=3D"Arial">ing boxes for ECC errors that may not be =
re</FONT><FONT SIZE=3D2 FACE=3D"Arial">al.</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Thanks,</FONT></SPAN></P>

<P DIR=3DLTR><SPAN LANG=3D"en-us"><FONT SIZE=3D2 =
FACE=3D"Arial">Jonathan</FONT></SPAN><SPAN LANG=3D"en-us"></SPAN><SPAN =
LANG=3D"en-us"></SPAN></P>
</UL>
</BODY>
</HTML>
------_=_NextPart_001_01C7665B.C3C93164--
