AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<Bill.Fisher@lsi.com>,<Rendell.Fong@lsi.com>,<Brian.Stark@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	4AFE087F.80105@lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 18 Nov 2009 15:30:59 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Fisher, Bill" <Bill.Fisher@lsi.com>
Cc: "Fong, Rendell" <Rendell.Fong@lsi.com>, "Stark, Brian"
 <Brian.Stark@lsi.com>
Subject: Re: The broken mgmtbus driver
Message-ID: <20091118153059.511aa6be@ripper.onstor.net>
In-Reply-To: <4AFE087F.80105@lsi.com>
References: <4AFE087F.80105@lsi.com>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Just a quick update on this, Bill has found the source of the skb memory
trashing:  I inadvertently horked the makefile arch/mips/pci/Makefile
to include a source file that wasn't ready yet, when it should have
continued to include the old source file.  The file in question was
setting up the PCI memory mapping config.  The wrong file was not doing
that, hence this problem.

This was also contributing to errors from the compact flash driver
which made things look a lot like the other CF problem we were having
at the time, which threw more red herrings on the issue.

Bill tracked all this down and even figured out that the trashed skb's
had data in them from the ata/ide registers, which is pretty good.  Not
sure how he figured that out ~:^)

On Fri, 13 Nov 2009 18:31:43 -0700 William Fisher <bill.fisher@lsi.com>
wrote:

> Andy:
> 
> I have finally determined what is causing the mgmtbus
> driver to crash on the tuxstor branch.
> 
> It seems that somebody is trashing
> one of the used descriptor rings after a packet
> has been transmitted. The buffer address is
> being over-written with trash and things
> do down-hill later when this descriptor
> is used again in the ring.
> 
> I have still yet to determine what changes
> could have caused this, it sure doesn't
> look like anything you changed could have
> caused this "new" behavior. This was
> as we discussed this week.
> 
> I am thinking that the mgmtbus driver in
> the original tuxstor branch was broken from the start.
> Maybe we should try the tuxrx code since that was
> known to work. I will diff the code against that
> one.
> 
> The mgmtbus ddriver I have been using for last 4+ months
> for the RCON shell, packet forwarding and the
> neteee2 development resides
> in my git tree. This code is nearly identical
> to the original tuxstore branch code.
> 
> Go figure.
> 
> This has been rather painful to find due to the
> Yenta kernel probe flakeyness and the whole
> CF booting saga.
> 
> I am working towards a fix.
> 
> later,
> 
> -- Bill
