AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090205103039.2bb4740f@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<maxim.kozlovsky@onstor.com>,<Bill.Fisher@onstor.com>,<brian.stark@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB51851E65D10@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Thu, 5 Feb 2009 10:32:45 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Maxim Kozlovsky <maxim.kozlovsky@onstor.com>
Cc: Bill Fisher <Bill.Fisher@onstor.com>, Brian Stark
 <brian.stark@onstor.com>
Subject: Re: Some observations with the latest PROM's
Message-ID: <20090205103245.0903f3b8@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB51851E65D10@exch1.onstor.net>
References: <498A816C.1050108@onstor.com>
	<2779531E7C760D4491C96305019FEEB51851E65D10@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Well, by all means blame all problems on the image loading.

The image loading can't be the cause of any watchdog NMIs.  Also, my
tests show that parallel loading is the same speed as sequential.
Remember that the CF isn't really a disk, it's random access flash
acting as a disk, and random reads are the same speed as sequential
reads.  It might be slower if loading over NFS.  You can always
edit /etc/init.d/onstor-load-embedded and remove the "--background"
option from the start-stop-daemon line if you want.

Bill, I think your problems are due to the fact that you're actually
using the mgmt_bus, whereas I'm not.  The magic memory locations are on
the SSC, and only get reset if the SSC is rebooted.  If you reboot the
TXRX separately, then you have to manually zero out that location.  But
the mgmt_bus logic probably cannot survive just the TXRX rebooting,
hence the problems that you're seeing.

You need to get onstor-load-embedded installed in /etc/init.d/ and the
proper link to it set up in /etc/rcS.d, and all the symlinks
in /onstor/bin.

I've got some mods in the works on the loading program and SSC kernel,
but they only have a very slight, 1-2 seconds, overall difference.  But
that might help Max some because he's loading the images at a
relatively later time than I am because he's using NFS.  The mods
include a new SSC kernel and load_vmlinux program.  But I'm still
wringing out all the nits.

On Thu, 5 Feb 2009 09:49:46 -0800 Maxim Kozlovsky
<maxim.kozlovsky@onstor.com> wrote:

> I've also had a boot failure where fp came up and crashed with the
> watchdog timeout, and txrx did not come up at all. I don't know what
> happened with txrx since I don't have the consoles attached and it
> did not make it far enough for the rcon to work.
> 
> The boot process for fp+txrx is not right. It loads the images in
> parallel. If one of the images loads sufficiently faster than the
> other, it will crash with watchdog timeout on boot. The loader should
> load both txrx and fp first then tell them to go. Besides reading the
> files sequentially is faster.
> 
> >-----Original Message-----
> >From: Bill Fisher
> >Sent: Wednesday, February 04, 2009 10:04 PM
> >To: Andy Sharp
> >Cc: Maxim Kozlovsky; Brian Stark; Bill Fisher
> >Subject: Some observations with the latest PROM's
> >
> >Andy:
> >
> >I have found that I am having some issues, they might be just
> >cockpit error on my part, but doing the magic command:
> >
> >load_vmlinux /boot/txrx_cg.bin
> >
> >can in lots of cases die in either two cases:
> >
> >1) The TxRx simply hangs at the waiting to autoload prompt OR
> >
> >2) dies with the following results;
> >
> >Exception Cause=TLBL (0000000020008008)
> >EPC = 0xffffffff900200a4
> >ERREPC = 0xffffffff83140918
> >BADVADDR = 0x15b7
> >RA = 0x0
> >
> >
> >For example:
> >
> >Initializing Autoloader, hit control-E to bypass
> >...........................................................................
> >.....
> >
> >Type ctrl-e to stop autoload.
> >Waiting for TXRX image to be loaded...Autoload stopped.
> >TXRX0-PROM> m 40000000
> >0000000040000000 babecafe
> >0000000040000004 00000000 0
> >0000000040000008 00000000 .
> >TXRX0-PROM> autoload
> >
> >Type ctrl-e to stop autoload.
> >Waiting for TXRX image to be loaded...done
> >Linux version 2.6.22-cg (bfisher@bfisher-linux.onstor.net) (gcc
> >version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #810 SMP Wed
> >Feb 4 20:39:15 PST 2009
> >Booting Linux kernel...Mips64 TuxRx
> >CPU revision is: 01041100
> >
> >This works.
> >
> >===========================
> >Other times examining the magic location sometimes does not
> >contain the magic numbers, after hitting Control-E after
> >a reboot command was issued to the TxRx prom.
> >
> >TXRX0-PROM> m 40000000
> >0000000040000000 00000000
> >0000000040000004 00000000
> >0000000040000008 00000000 .
> >
> >In those cases, everything is ZERO and I must type the reboot
> >command another time to get the txrx in the proper state
> >to accept the memory load.
> >
> >Re-verifing is required to ensure the magic locations
> >are correct.
> >
> >Is there a window that I am typing Control-E too fast and this
> >is aborting the memory initialization after reboot?
> >
> >----------------------
> >
> >I have now gotten got the tuxrx-linux-2.6.22 GIT tree kernel built as
> >well as the one I have been using for weeks now running.
> >
> >However I can with various iterations of the above commands get
> >either of these two to fail to get loaded and die with the exception:
> >
> >Exception Cause=TLBL (0000000020008008)
> >EPC = 0xffffffff900200a4
> >ERREPC = 0xffffffff83140918
> >BADVADDR = 0x15b7
> >RA = 0x0
> >
> >
> >Typing reboot and getting the TxRx into the proper state,
> >and also verifying that the memory locations are correct,
> >these two kernels will boot.
> >
> >Also, loading the txrx now appears to be slower than before. I did
> >not do the NICE change on the SSC since I was simply trying
> >to get the kernels verified with the mgmtbus driver in the GIT
> >tree code. Max noted something similar, so I'll have to get
> >exact timings to prove this point.
> >
> >Hence, maybe I need a more detailed cheat sheet on the
> >load_vmlinux case, because I was not getting these
> >strange cases with the "prior" version of the PROM
> >code. There appear to be some "glitches" in the autoload
> >case(s).
> >
> >I recompiled the load_vmlinux code that was checked into
> >the tools directory BUT do not have all the symbolic
> >links created. I looked at them on the development
> >machine, but I am running the program from CF on
> >the SSC, to which I copied over the binary.
> >
> >Thanks,
> >
> >-- Bill
