X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C85874.5009D9EC@onstor-exch02.onstor.net>; Wed, 16 Jan 2008 12:16:36 -0700
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C85874.5009D9EC"
Content-class: urn:content-classes:message
Subject: RE: TXRX optimized build
Date: Wed, 16 Jan 2008 12:16:36 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E07AE22D2@onstor-exch02.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E07AE2256@onstor-exch02.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: TXRX optimized build
Thread-Index: AchYZbKYKkj9SJsmTz2INn+nOUJNagAADpvgAAB4uEAAAA7KoAAALZ7wAAD0SLAAAeJoUA==
References: <BB375AF679D4A34E9CA8DFA650E2B04E07AE2184@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E07AE21A6@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E07AE21AC@onstor-exch02.onstor.net> <BB375AF679D4A34E9CA8DFA650E2B04E07AE21AF@onstor-exch02.onstor.net>  <BB375AF679D4A34E9CA8DFA650E2B04E07AE2256@onstor-exch02.onstor.net>
From: "Brian Stark" <brian.stark@onstor.com>
To: "Rick Lund" <rick.lund@onstor.com>,
	"Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>
Cc: "Andy Sharp" <andy.sharp@onstor.com>

This is a multi-part message in MIME format.

------_=_NextPart_001_01C85874.5009D9EC
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Makes sense.  We were hitting the retry limit of 128 when the SSC was
busy.

My only concern with infinite retries is that we'd get caught in an
endless loop.  Will the watchdog kick in if this happens?


Brian


> _____________________________________________=20
> From: 	Rick Lund =20
> Sent:	Wednesday, January 16, 2008 10:38 AM
> To:	Rick Lund; Maxim Kozlovsky
> Cc:	Brian Stark; Andy Sharp
> Subject:	RE: TXRX optimized build
>=20
> Disabling the ComplTimeout in the 1480 PCI cfg space didn't fix the
> problem.
>=20
> Disabling the RetryTimeout, however, did.
>=20
> Without the RetryTimeout change in the PCI Timeout Register (0xA4),
> when the error occurs, the Additional Status and Command Register
> (0x8C) has RetryErr bit set, which it should as described in the PCI
> Timeout Register.  This must be the problem.
>=20
> I'll add this config to the next prom, but for now, you can add the
> line "pci_cfg_write(0, 0, 0, 0xA4, 0xa0080)" to the pci init code on
> TXRX to fix the problem.
>=20
> The default Retry count is 128.  It would be interesting to see how
> many retries it is taking before completion.
>=20
> -Rick
>=20
> _____________________________________________
> From: Rick Lund=20
> Sent: Wednesday, January 16, 2008 9:56 AM
> To: Maxim Kozlovsky
> Cc: Brian Stark; Andy Sharp
> Subject: RE: TXRX optimized build
>=20
> There's an 1480 errata about disabling PCI timeout and therefore
> causing a retry of a master cycle over PCI.  Let me try that one.  If
> 1125 bus contention is the issue, this may fix the problem.
>=20
> -Rick
>=20
> _____________________________________________
> From: Maxim Kozlovsky=20
> Sent: Wednesday, January 16, 2008 9:50 AM
> To: Rick Lund
> Cc: Brian Stark; Andy Sharp
> Subject: RE: TXRX optimized build
>=20
> They use different mechanism, but (almost) the same memory areas.
>=20
> _____________________________________________
> From: Rick Lund=20
> Sent: Wednesday, January 16, 2008 9:48 AM
> To: Maxim Kozlovsky
> Cc: Brian Stark; Andy Sharp
> Subject: RE: TXRX optimized build
>=20
> Are the messages (printf's) from the TXRX that are being delivered to
> the SSC instead of displayed on the local console using the ring
> buffers that are being initialized by the mgmBus_init() routine?  Or
> do those messages use a different mechanism?
>=20
> _____________________________________________
> From: Maxim Kozlovsky=20
> Sent: Wednesday, January 16, 2008 9:45 AM
> To: Rick Lund
> Cc: Brian Stark; Andy Sharp
> Subject: RE: TXRX optimized build
>=20
> Rcon_init_done and rconInit() are not connected, in fact I should take
> out the rconInit() from cougar, it is for the older rcon interface
> which we don't need anymore.
>=20
> By looking on the consoles it can be seen that the crash on txrx
> happens when the linux is already starting to run the user processes,
> so the Linux pci setup must be compeleted. I had the same crash in the
> debug build when the system was doing some work and it happened really
> late, which excludes the rcon_init and linux pci configuration. It is
> probably something else, but I'll take the workaround for now. My
> guess will be that in the optimized build during the initialization we
> are creating dirty data faster than it can be flushed to the memory,
> so the memory controller and bus are busy when the messaging
> initialization happens and this somehow affects the pci accesses. By
> adding the delay you allow the dirty data to be flushed and this
> allows us to get through the initialization process. This of course is
> pure theory.
>=20
>=20
> _____________________________________________
> From: Rick Lund=20
> Sent: Wednesday, January 16, 2008 9:32 AM
> To: Maxim Kozlovsky
> Cc: Brian Stark; Andy Sharp
> Subject: TXRX optimized build
>=20
> Max,
>     I've worked around the TXRX optimized problem with a couple
> different changes.  One is adding a 5 second delay before the ring
> init in the mgmtBus_init code which is getting the bus error.  The
> second is moving the "rcon_init_done =3D 1" line in test.c to after =
the
> rconInit() call later in the startup sequence.  That one sounds like a
> more logical fix; not setting the flag which says we've initialized
> rcon until after the rconInit() function.  But I don't think that's
> the real problem, since it appears to work in the debug build and
> presumable on Bobcat.
>     My best guess is a conflict with the TXRX using the PCI bus to
> access SSC memory during the SSC's startup.  Linux is disabling the
> memory accesses from the PCI bus during startup?  Something like that?
> And by slightly changing the timing of our use of SSC memory by the
> TXRX, we delay long enough for the Linux PCI init to have been
> completed.
>=20
> -Rick

------_=_NextPart_001_01C85874.5009D9EC
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7653.38">
<TITLE>RE: TXRX optimized build</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">Makes sense.&nbsp; We =
were hitting the retry limit of 128 when the SSC was busy.</FONT>
</P>

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">My only concern with =
infinite retries is that we'd get caught in an endless loop.&nbsp; Will =
the watchdog kick in if this happens?</FONT></P>
<BR>

<P><FONT COLOR=3D"#0000FF" SIZE=3D2 FACE=3D"Arial">Brian</FONT>
</P>
<BR>
<UL>
<P><FONT SIZE=3D1 =
FACE=3D"Tahoma">_____________________________________________ </FONT>

<BR><B><FONT SIZE=3D1 FACE=3D"Tahoma">From: &nbsp;</FONT></B> <FONT =
SIZE=3D1 FACE=3D"Tahoma">Rick Lund&nbsp; </FONT>

<BR><B><FONT SIZE=3D1 FACE=3D"Tahoma">Sent:&nbsp;&nbsp;</FONT></B> <FONT =
SIZE=3D1 FACE=3D"Tahoma">Wednesday, January 16, 2008 10:38 AM</FONT>

<BR><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">To:&nbsp;&nbsp;&nbsp;&nbsp;</FONT></B> <FONT SIZE=3D1 =
FACE=3D"Tahoma">Rick Lund; Maxim Kozlovsky</FONT>

<BR><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">Cc:&nbsp;&nbsp;&nbsp;&nbsp;</FONT></B> <FONT SIZE=3D1 =
FACE=3D"Tahoma">Brian Stark; Andy Sharp</FONT>

<BR><B><FONT SIZE=3D1 =
FACE=3D"Tahoma">Subject:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT>=
</B> <FONT SIZE=3D1 FACE=3D"Tahoma">RE: TXRX optimized build</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">Disabling the =
ComplTimeout in the 1480 PCI cfg space didn&#8217;t fix the =
problem.</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">Disabling the =
RetryTimeout, however, did.</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">Without the =
RetryTimeout change in the PCI Timeout Register (0xA4), when the error =
occurs, the Additional Status and Command Register (0x8C) has RetryErr =
bit set, which it should as described in the PCI Timeout Register.&nbsp; =
This must be the problem.</FONT></P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">I&#8217;ll add this =
config to the next prom, but for now, you can add the line =
&#8220;pci_cfg_write(0, 0, 0, 0xA4, 0xa0080)&#8221; to the pci init code =
on TXRX to fix the problem.</FONT></P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">The default Retry =
count is 128.&nbsp; It would be interesting to see how many retries it =
is taking before completion.</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">-Rick</FONT>
</P>

<P><FONT SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">From:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Rick Lund<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Sent:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Wednesday, January 16, 2008 9:56 AM<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">To:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Maxim Kozlovsky<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Cc:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Brian Stark; Andy Sharp<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Subject:</FONT></B><FONT =
SIZE=3D2 FACE=3D"Tahoma"> RE: TXRX optimized build</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">There&#8217;s an 1480 =
errata about disabling PCI timeout and therefore causing a retry of a =
master cycle over PCI.&nbsp; Let me try that one.&nbsp; If 1125 bus =
contention is the issue, this may fix the problem.</FONT></P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">-Rick</FONT>
</P>

<P><FONT SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">From:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Maxim Kozlovsky<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Sent:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Wednesday, January 16, 2008 9:50 AM<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">To:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Rick Lund<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Cc:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Brian Stark; Andy Sharp<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Subject:</FONT></B><FONT =
SIZE=3D2 FACE=3D"Tahoma"> RE: TXRX optimized build</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">They use different =
mechanism, but (almost) the same memory areas.</FONT>
</P>

<P><FONT SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">From:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Rick Lund<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Sent:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Wednesday, January 16, 2008 9:48 AM<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">To:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Maxim Kozlovsky<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Cc:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Brian Stark; Andy Sharp<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Subject:</FONT></B><FONT =
SIZE=3D2 FACE=3D"Tahoma"> RE: TXRX optimized build</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">Are the messages =
(printf&#8217;s) from the TXRX that are being delivered to the SSC =
instead of displayed on the local console using the ring buffers that =
are being initialized by the mgmBus_init() routine?&nbsp; Or do those =
messages use a different mechanism?</FONT></P>

<P><FONT SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">From:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Maxim Kozlovsky<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Sent:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Wednesday, January 16, 2008 9:45 AM<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">To:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Rick Lund<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Cc:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Brian Stark; Andy Sharp<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Subject:</FONT></B><FONT =
SIZE=3D2 FACE=3D"Tahoma"> RE: TXRX optimized build</FONT>
</P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">Rcon_init_done and =
rconInit() are not connected, in fact I should take out the rconInit() =
from cougar, it is for the older rcon interface which we don&#8217;t =
need anymore.</FONT></P>

<P><FONT COLOR=3D"#000080" SIZE=3D2 FACE=3D"Arial">By looking on the =
consoles it can be seen that the crash on txrx happens when the linux is =
already starting to run the user processes, so the Linux pci setup must =
be compeleted. I had the same crash in the debug build when the system =
was doing some work and it happened really late, which excludes the =
rcon_init and linux pci configuration. It is probably something else, =
but I&#8217;ll take the workaround for now. My guess will be that in the =
optimized build during the initialization we are creating dirty data =
faster than it can be flushed to the memory, so the memory controller =
and bus are busy when the messaging initialization happens and this =
somehow affects the pci accesses. By adding the delay you allow the =
dirty data to be flushed and this allows us to get through the =
initialization process. This of course is pure theory.</FONT></P>
<BR>

<P><FONT SIZE=3D2 =
FACE=3D"Tahoma">_____________________________________________<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">From:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Rick Lund<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Sent:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Wednesday, January 16, 2008 9:32 AM<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">To:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Maxim Kozlovsky<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Cc:</FONT></B><FONT SIZE=3D2 =
FACE=3D"Tahoma"> Brian Stark; Andy Sharp<BR>
</FONT><B><FONT SIZE=3D2 FACE=3D"Tahoma">Subject:</FONT></B><FONT =
SIZE=3D2 FACE=3D"Tahoma"> TXRX optimized build</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Max,</FONT>

<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; I&#8217;ve worked =
around the TXRX optimized problem with a couple different changes.&nbsp; =
One is adding a 5 second delay before the ring init in the mgmtBus_init =
code which is getting the bus error.&nbsp; The second is moving the =
&#8220;rcon_init_done =3D 1&#8221; line in test.c to after the =
rconInit() call later in the startup sequence.&nbsp; That one sounds =
like a more logical fix; not setting the flag which says we&#8217;ve =
initialized rcon until after the rconInit() function.&nbsp; But I =
don&#8217;t think that&#8217;s the real problem, since it appears to =
work in the debug build and presumable on Bobcat.</FONT></P>

<P><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp; My best guess is a =
conflict with the TXRX using the PCI bus to access SSC memory during the =
SSC&#8217;s startup.&nbsp; Linux is disabling the memory accesses from =
the PCI bus during startup?&nbsp; Something like that?&nbsp; And by =
slightly changing the timing of our use of SSC memory by the TXRX, we =
delay long enough for the Linux PCI init to have been =
completed.</FONT></P>

<P><FONT SIZE=3D2 FACE=3D"Arial">-Rick</FONT>
</P>
</UL>
</BODY>
</HTML>
------_=_NextPart_001_01C85874.5009D9EC--
