AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20080220134927.4bb59bfc@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<warren.gale@onstor.com>,<brian.stark@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 20 Feb 2008 13:50:01 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Warren Gale <warren.gale@onstor.com>, Brian Stark
 <brian.stark@onstor.com>
Subject: CF interrupt diag tests needed
Message-ID: <20080220135001.61942838@ripper.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Warren,

I know you're working on some diag test code right now, and I wanted to
let you know about a new diag requirement for Cougar that would be
different than previous hardware: compact flash diags using interrupts,
both status and card, because we count on those paths working correctly
in Linux on Cougar, whereas we haven't used interrupts in previous
hardware. That is also why some CF cards work on Bobcat/OpenBSD but do
not on Cougar currently.

I'm chasing some problems with CF cards right now, which is why this
occured to me.  I believe I just identified a kernel bug whereby a
broken CF card might fail to respond to pcmcia IDENT command but issue
an interrupt anyway, causing the kernel to shut down that interrupt,
making that CF slot unuseable until reboot.  I believe I've also found
a second kernel bug where a pcmcia reset on a slot that is "interrupt
challenged" (either the card or the slot) can cause a kernel OOPS,
because the code is expecting that a status interrupt would have
occured causing various data structures in the kernel to be adjusted,
but it doesn't get that interrupt, those structures remain unchanged,
and eventually the IDE layer does an I/O access on a stale address
range and we get a DBE.

Phew, OK, so the bottom line is that I believe we need a few more diags
in this area to test incoming hardware to make sure all those data
paths are functional and operational.

Let me know what you guys think.

Cheers,

a
