X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: by onstor-exch02.onstor.net 
	id <01C87A6C.1F2D4B06@onstor-exch02.onstor.net>; Thu, 28 Feb 2008 17:43:37 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-class: urn:content-classes:message
Subject: RE: concrete info on CF status problem
Date: Thu, 28 Feb 2008 17:43:37 -0700
Message-ID: <BB375AF679D4A34E9CA8DFA650E2B04E089F18EC@onstor-exch02.onstor.net>
In-Reply-To: <20080228160210.50e4c6f1@ripper.onstor.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: concrete info on CF status problem
Thread-Index: Ach6ZlRY8h7cEgTuQIqMJOAD8P81WAABQf7A
References: <20080225140859.55145d81@ripper.onstor.net><BB375AF679D4A34E9CA8DFA650E2B04E089F185F@onstor-exch02.onstor.net> <20080228160210.50e4c6f1@ripper.onstor.net>
From: "Brian Stark" <brian.stark@onstor.com>
To: "Andy Sharp" <andy.sharp@onstor.com>
Cc: "Warren Gale" <warren.gale@onstor.com>,
	"Maxim Kozlovsky" <maxim.kozlovsky@onstor.com>

Like I said, I tried your old board and had no luck.  I used the same
kernel on a new board.  I also tried the 1.0.2, 1.0.3, and 1.0.4 PROMs.
This is not a problem with the board or the PROM.

Your old board is connected as follows:

SSC console	10.1.1.100 2001
Power 10.1.1.141, port .a2
IP addr 10.1.1.125

The bottom line is the interrupt is happening when the card is inserted
or ejected and then something in the 1125 is coming in to read the ExCA
status-change register, which then clears the interrupt.  I don't know
what's reading the register, I only know it's coming from the 1125.  And
when I do the same thing in PROM, which has no interrupt handlers for
PCI interrupts, the ExCA status-change register shows up as 0x08,
indicating the card status changed.  When I then manually read the
register, it clears as expected.

Tomorrow afternoon isn't good for me, but I could meet in the morning if
you like.=20


Brian


> -----Original Message-----
> From: Andy Sharp=20
> Sent: Thursday, February 28, 2008 4:02 PM
> To: Brian Stark
> Cc: Warren Gale; Maxim Kozlovsky
> Subject: Re: concrete info on CF status problem
>=20
> On Thu, 28 Feb 2008 15:42:02 -0800 "Brian Stark"
> <brian.stark@onstor.com> wrote:
>=20
> > OK, I've verified the following:
> >=20
> > - Stop in PROM, eject a card, see both the PCI hardware=20
> interrupt with=20
> > the scope and the ExCA status-change interrupt reg (reg 4, bit 3),=20
> > status-change interrupt reg is clear on the next read
> > - Boot Linux, eject a card, see the PCI hardware interrupt with the=20
> > scope, do *not* see the ExCA status-change interrupt reg (reg 4, bit
> > 3) after doing a read with the 'cat /sys/devices' command.
> >=20
> > This is the case on both the old board and new board. =20
> >=20
> > After seeing this, my theory was that the 1125 is reading the ExCA=20
> > registers after a card is ejected, which would then clear the=20
> > status-change interrupt at reg 4, bit 3.  Sure enough, when=20
> I hook up=20
> > an analyzer, I see a PCI cycle from the 1125 that is doing just that
> > -- a read to reg 4 -- after the card is ejected.  This read=20
> clears the=20
> > interrupt, which is why I don't see the interrupt when I=20
> then do the=20
> > 'cat /sys/devices' command.
> >=20
> > So, the interrupts are happening and the 1125 is polling=20
> the interrupt=20
> > status after a card is ejected or inserted, but then it appears the=20
> > interrupt status is somehow lost by the kernel.  Don't hate me for=20
> > saying this, but me thinks there's a bug in the kernel.
>=20
> Too late.
>=20
> But only because you know it's a hardware problem not a=20
> kernel problem.  Defense exhibit A: kernel hasn't changed in=20
> months and this all works as expected on previous rev of=20
> hardware, which has changed.
> My guess: something changed which no one thought would matter=20
> to the kernel but does, like PCI address space mapping=20
> changes in PROM or management bus driver, or some kind of PCI=20
> interrupt handling code changed in the 1480 PROM/run-time or=20
> management bus driver.  Scratch the management bus driver=20
> because my debug kernel doesn't have any recent changes that=20
> might have happened to the mgmt bus driver.
>=20
> Like I tried to say before, the interrupt doesn't matter. =20
> Even if you were right, er hah, and the kernel suddenly=20
> started to lose interrupt status somehow, it should still=20
> work with irqpolling.
>=20
> Just so I can be certain about all the things I'm saying are=20
> undisputable facts, can you hook up my old board and let me=20
> have at it?  I have an instrumented kernel which let's me=20
> know all the down and dirty details.
>=20
> I could come to P-town tomorrow and work on it there, permitting.
>=20
> Cheers,
>=20
> a
>=20
