AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080228155834.4cb28a43@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<brian.stark@onstor.com>,<warren.gale@onstor.com>,<maxim.kozlovsky@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	0	BB375AF679D4A34E9CA8DFA650E2B04E089F185F@onstor-exch02.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 28 Feb 2008 16:02:10 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: "Brian Stark" <brian.stark@onstor.com>
Cc: "Warren Gale" <warren.gale@onstor.com>, Maxim Kozlovsky
 <maxim.kozlovsky@onstor.com>
Subject: Re: concrete info on CF status problem
Message-ID: <20080228160210.50e4c6f1@ripper.onstor.net>
In-Reply-To: <BB375AF679D4A34E9CA8DFA650E2B04E089F185F@onstor-exch02.onstor.net>
References: <20080225140859.55145d81@ripper.onstor.net>
	<BB375AF679D4A34E9CA8DFA650E2B04E089F185F@onstor-exch02.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Thu, 28 Feb 2008 15:42:02 -0800 "Brian Stark"
<brian.stark@onstor.com> wrote:

> OK, I've verified the following:
> 
> - Stop in PROM, eject a card, see both the PCI hardware interrupt with
> the scope and the ExCA status-change interrupt reg (reg 4, bit 3),
> status-change interrupt reg is clear on the next read
> - Boot Linux, eject a card, see the PCI hardware interrupt with the
> scope, do *not* see the ExCA status-change interrupt reg (reg 4, bit
> 3) after doing a read with the 'cat /sys/devices' command.
> 
> This is the case on both the old board and new board.  
> 
> After seeing this, my theory was that the 1125 is reading the ExCA
> registers after a card is ejected, which would then clear the
> status-change interrupt at reg 4, bit 3.  Sure enough, when I hook up
> an analyzer, I see a PCI cycle from the 1125 that is doing just that
> -- a read to reg 4 -- after the card is ejected.  This read clears the
> interrupt, which is why I don't see the interrupt when I then do the
> 'cat /sys/devices' command.
> 
> So, the interrupts are happening and the 1125 is polling the interrupt
> status after a card is ejected or inserted, but then it appears the
> interrupt status is somehow lost by the kernel.  Don't hate me for
> saying this, but me thinks there's a bug in the kernel.

Too late.

But only because you know it's a hardware problem not a kernel
problem.  Defense exhibit A: kernel hasn't changed in months and this
all works as expected on previous rev of hardware, which has changed.
My guess: something changed which no one thought would matter to the
kernel but does, like PCI address space mapping changes in PROM or
management bus driver, or some kind of PCI interrupt handling code
changed in the 1480 PROM/run-time or management bus driver.  Scratch
the management bus driver because my debug kernel doesn't have any
recent changes that might have happened to the mgmt bus driver.

Like I tried to say before, the interrupt doesn't matter.  Even if you
were right, er hah, and the kernel suddenly started to lose interrupt
status somehow, it should still work with irqpolling.

Just so I can be certain about all the things I'm saying are
undisputable facts, can you hook up my old board and let me have at
it?  I have an instrumented kernel which let's me know all the down and
dirty details.

I could come to P-town tomorrow and work on it there, permitting.

Cheers,

a
