X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C7665B.8D2801B2@onstor-exch02.onstor.net>; Wed, 14 Mar 2007 10:09:39 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C7665B.8D2801B2"
Content-class: urn:content-classes:message
Subject: RE: False ECC errors
Date: Wed, 14 Mar 2007 10:09:39 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E02D8F379@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E02D8F343@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: False ECC errors
Thread-Index: AcdmWBVGnOpHDdxeREy2hMOjobI4YwAAW/HA
References: <BB375AF679D4A34E9CA8DFA650E2B04E02D8F343@onstor-exch02.onstor.net>
From: "Brian Stark" <brian.stark@onstor.com>
To: "Jonathan Goldick" <jonathan.goldick@onstor.com>,
	"Andy Sharp" <andy.sharp@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C7665B.8D2801B2
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Wow, I haven't heard anything about this.  For ECC errors on the SiByte,
we are looking at the uncorrectable error counter on the SiByte itself.
Does this have anything to do with an invalid pointer access?  Can this
counter be incremented for a reason other than a real ECC error?

This is definitely something we need to get to the bottom of.  We got
the system back from Facebook that reported several ECC errors that were
thought to be real because of the SiByte counter, but we have yet to
find anything wrong with it in the hardware lab.  The tests we are
running are designed to specifically tickle ECC errors, and we've yet to
see a system that experienced ECC errors in normal op and then didn't
have them with this test. =20

I'm starting to worry that this counter is either wrong or that
environmental influences at some customer sites are causing real ECC
errors.  Obviously, neither of these is good.


Brian


> _____________________________________________=20
> From: 	Jonathan Goldick =20
> Sent:	Wednesday, March 14, 2007 9:45 AM
> To:	Andy Sharp
> Cc:	Brian Stark
> Subject:	False ECC errors
>=20
> Andy,
>=20
> I seem to remember you mentioning that we report an ECC error when in
> reality this is an invalid pointer access.  Please confirm since we
> are still RMA'ing boxes for ECC errors that may not be real.
>=20
> Thanks,
>=20
> Jonathan

------_=_NextPart_001_01C7665B.8D2801B2
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7652.24">
<TITLE>RE: False ECC errors</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">Wow, I haven't heard =
anything about this.&nbsp; For ECC errors on the SiByte, we are looking =
at the uncorrectable error counter on the SiByte itself.&nbsp; Does this =
have anything to do with an invalid pointer access?&nbsp; Can this =
counter be incremented for a reason other than a real ECC =
error?</FONT></P>

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">This is definitely =
something we need to get to the bottom of.&nbsp; We got the system back =
from Facebook that reported several ECC errors that were thought to be =
real because of the SiByte counter, but we have yet to find anything =
wrong with it in the hardware lab.&nbsp; The tests we are running are =
designed to specifically tickle ECC errors, and we've yet to see a =
system that experienced ECC errors in normal op and then didn't have =
them with this test.&nbsp; </FONT></P>

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">I'm starting to worry =
that this counter is either wrong or that environmental influences at =
some customer sites are causing real ECC errors.&nbsp; Obviously, =
neither of these is good.</FONT></P>
<BR>

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">Brian</FONT>
</P>
<BR>
<UL>
<P><FONT SIZE=3D1 =
FACE=3D"Tahoma">_____________________________________________ </FONT>

<BR><B><FONT SIZE=3D1 FACE=3D"Tahoma">From: &nbsp;</FONT></B> <FONT =
SIZE=3D1 FACE=3D"Tahoma">Jonathan Goldick&nbsp; </FONT>

<BR><B><FONT SIZE=3D1 FACE=3D"Tahoma">Sent:&nbsp;&nbsp;</FONT></B> <FONT =
SIZE=3D1 FACE=3D"Tahoma">Wednesday, March 14, 2007 9:45 AM</FONT>

<BR><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">To:&nbsp;&nbsp;&nbsp;&nbsp;</FONT></B> <FONT SIZE=3D1 =
FACE=3D"Tahoma">Andy Sharp</FONT>

<BR><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">Cc:&nbsp;&nbsp;&nbsp;&nbsp;</FONT></B> <FONT SIZE=3D1 =
FACE=3D"Tahoma">Brian Stark</FONT>

<BR><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">Subject:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT>=
</B> <FONT SIZE=3D1 FACE=3D"Tahoma">False ECC errors</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Andy,</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">I seem to remember you mentioning that =
we report an ECC error when in reality this is an invalid pointer =
access.&nbsp; Please confirm since we are still RMA&#8217;ing boxes for =
ECC errors that may not be real.</FONT></P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Thanks,</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Jonathan</FONT>
</P>
</UL>
</BODY>
</HTML>
------_=_NextPart_001_01C7665B.8D2801B2--
