AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090114114702.46172b1d@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:exch1.onstor.net
NSV:
SSH:
R:<maxim.kozlovsky@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	2779531E7C760D4491C96305019FEEB51763B781CD@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 14 Jan 2009 11:47:19 -0800
From: Andrew Sharp <andy.sharp@onstor.com>
To: Maxim Kozlovsky <maxim.kozlovsky@onstor.com>
Subject: Re: TuxRx Functional Spec
Message-ID: <20090114114719.53b87b16@ripper.onstor.net>
In-Reply-To: <2779531E7C760D4491C96305019FEEB51763B781CD@exch1.onstor.net>
References: <20090106152711.76b59d5d@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB5176334C548@exch1.onstor.net>
	<20090113201411.4080c3aa@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB51763B781A5@exch1.onstor.net>
	<2779531E7C760D4491C96305019FEEB51763B781AE@exch1.onstor.net>
	<20090114111145.760306eb@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB51763B781BE@exch1.onstor.net>
	<20090114112048.229b2444@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB51763B781C6@exch1.onstor.net>
	<20090114112812.4642a19a@ripper.onstor.net>
	<2779531E7C760D4491C96305019FEEB51763B781CD@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Wed, 14 Jan 2009 11:31:48 -0800 Maxim Kozlovsky
<maxim.kozlovsky@onstor.com> wrote:

> 
> 
> >-----Original Message-----
> >From: Andy Sharp
> >Sent: Wednesday, January 14, 2009 11:28 AM
> >To: Maxim Kozlovsky
> >Subject: Re: TuxRx Functional Spec
> >
> >On Wed, 14 Jan 2009 11:23:54 -0800 Maxim Kozlovsky
> ><maxim.kozlovsky@onstor.com> wrote:
> >
> >>
> >>
> >> >-----Original Message-----
> >> >From: Andy Sharp
> >> >Sent: Wednesday, January 14, 2009 11:21 AM
> >> >To: Maxim Kozlovsky
> >> >Subject: Re: TuxRx Functional Spec
> >> >
> >> >On Wed, 14 Jan 2009 11:14:22 -0800 Maxim Kozlovsky
> >> ><maxim.kozlovsky@onstor.com> wrote:
> >> >
> >> >> After the crash we can login to fp or txrx through rcon and run
> >> >> some debugging commands, or attach gdb.
> >> >
> >> >What are you talking to at that moment, the PROM or EEE?  If the
> >> >kernel is crashed, it's crashed, there's no interactive debugging.
> >> >If the ACPU thread has crashed, presumably we will have a core
> >> >dump and can analyze that with gdb.  So, no reason for rcon.  But
> >> >if we use rcon today to talk to the PROM on txrx, that won't go
> >> >away.
> >> [MK]
> >>
> >> EEE.
> >>
> >> How it is possible to not have any interactive debugging after the
> >> crash? Even 10 year old version of BSD that we run supports
> >> interactive debugging.
> >
> >I'm unaware that we have that capability with BSD.  You're saying
> >that if BSD crashes on a bobcat, we can somehow {log into, connect
> >to the console} the SSC and do some debugging w/o bringing BSD back
> >up?
> [MK] 
> It is disabled in the production build, but you can build a kernel
> which will drop into debugger after the crash.

We can probably put a kgdb setup together.  We haven't needed to so
far.  Our 10 year old BSD and our 1 year old Linux kernels have vastly
different worlds in a lot of areas, not the least of which is
stability, but also the large amount of information you can get out of a
kernel without using any debuggers (/proc and so forth).

> >Anyway, of course anything is possible, but I don't see the
> >necessity.
> [MK] 
> I do. 

Congratulations.  It's worthless though, unless you're willing to
explain it to me.

If you're saying we will need it for the ACPU thread, possibly what we
can do is design it to run in userspace or kernel space, which would
give us the best of both worlds: userspace for debugging setups and
kernel for production.  If absolutely necessary we could put together a
kgdb setup for the kernel space version.

> >> >> >-----Original Message-----
> >> >> >From: Andy Sharp
> >> >> >Sent: Wednesday, January 14, 2009 11:12 AM
> >> >> >To: Maxim Kozlovsky
> >> >> >Subject: Re: TuxRx Functional Spec
> >> >> >
> >> >> >On Wed, 14 Jan 2009 10:58:43 -0800 Maxim Kozlovsky
> >> >> ><maxim.kozlovsky@onstor.com> wrote:
> >> >> >
> >> >> >> If you are not going to provide some sort of rcon, how you
> >> >> >> are going to support the interactive debugging after the
> >> >> >> crash? Logging into txrx and running user space commands
> >> >> >> will not work after the crash.
> >> >> >
> >> >> >What "interactive debugging after a crash" are you referring
> >> >> >to?
> >> >> >
> >> >> >> >-----Original Message-----
> >> >> >> >From: Maxim Kozlovsky
> >> >> >> >Sent: Wednesday, January 14, 2009 10:51 AM
> >> >> >> >To: Andy Sharp; Jonathan Goldick
> >> >> >> >Cc: dl-Design Review
> >> >> >> >Subject: RE: TuxRx Functional Spec
> >> >> >> >
> >> >> >> >I think it is not acceptable to not provide kernel core
> >> >> >> >dumps.
> >> >> >> >
> >> >> >> >>-----Original Message-----
> >> >> >> >>From: Andy Sharp
> >> >> >> >>Sent: Tuesday, January 13, 2009 8:14 PM
> >> >> >> >>To: Jonathan Goldick
> >> >> >> >>Cc: dl-Design Review
> >> >> >> >>Subject: Re: TuxRx Functional Spec
> >> >> >> >>
> >> >> >> >>New versions available here:
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> >>>>>http://intranet.onstor.net/md/Software/Kegg/Functional%20Specs/tuxrx_fu
> >n
> >> >c
> >> >> >_
> >> >> >> >s
> >> >> >> >>pec.pdf
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> >>>>>http://intranet.onstor.net/md/Software/Kegg/Functional%20Specs/tuxrx_fu
> >n
> >> >c
> >> >> >_
> >> >> >> >s
> >> >> >> >>pec.doc
> >> >> >> >>
> >> >> >> >>Comments not covered in the changes replied to inline
> >> >> >> >>(section 5. c,d,f):
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>On Wed, 7 Jan 2009 11:47:32 -0800 Jonathan Goldick
> >> >> >> >><jonathan.goldick@onstor.com> wrote:
> >> >> >> >>
> >> >> >> >>> 1. Section 5:
> >> >> >> >>>   a. How will memory change for the ACPU?  Will we have
> >> >> >> >>> just as much available for NFS/CIFS as before?  Will we
> >> >> >> >>> get more memory back because of increased efficiencies
> >> >> >> >>> or lose some due to new overheads?
> >> >> >> >>
> >> >> >> >>> b. Will core dumps work the same way since now there will
> >> >> >> >>> also be user-space processes that didn't exist in EEE
> >> >> >> >>> that could also crash? Will they still go to the
> >> >> >> >>> management volume and have the same names as before?
> >> >> >> >>> Analyzing a Linux core will have a new procedure.
> >> >> >> >>
> >> >> >> >>> c. Will the system boot faster/slower/same?
> >> >> >> >>
> >> >> >> >>I would say about the same.  It seems to me that the
> >> >> >> >>cumulative time for TXRX image to load and for it to become
> >> >> >> >>"ready" is about 20 seconds (for production build) and I
> >> >> >> >>expect it to be about that for TuxRx.
> >> >> >> >>
> >> >> >> >>> d. Will we still need to use wd_kick for watchdog
> >> >> >> >>> support in the ACPU or is Linux handling that in a new
> >> >> >> >>> way?
> >> >> >> >>
> >> >> >> >>Really an implementation detail that hasn't been decided
> >> >> >> >>yet. Probably will be based on what gives the best
> >> >> >> >>functionality.
> >> >> >> >>
> >> >> >> >>> e. Does the ACPU performance profile change now that you
> >> >> >> >>> are probably using the default Linux scheduler instead
> >> >> >> >>> of our current model of polling routines with a maximum
> >> >> >> >>> number of work items before returning to the main loop.
> >> >> >> >>> This question edges on design but has an external
> >> >> >> >>> manifestation of performance.
> >> >> >> >>
> >> >> >> >>> f. In general, this section has too little detail about
> >> >> >> >>> what workflows will change, especially around
> >> >> >> >>> diagnostics and support.
> >> >> >> >>
> >> >> >> >>Agreed.  I will try to be watchful to update the document
> >> >> >> >>as they occur to me.
> >> >> >> >>
> >> >> >> >>> 2. Section 6
> >> >> >> >>> a. I suspect that the UI for rcon and the associated
> >> >> >> >>> console commands will likely change. This will in turn
> >> >> >> >>> impact diagnostic procedures we use in the field. For
> >> >> >> >>> example, the 'eee' command is how we look at memory
> >> >> >> >>> utilization and that will be completely different in
> >> >> >> >>> Linux. You should say something about these commands,
> >> >> >> >>> especially ones like 'req trace on' which is the debug
> >> >> >> >>> ACPU command that traces the CIFS/NFS states. b. Do we
> >> >> >> >>> have to login to yet another root account, this time for
> >> >> >> >>> the TXRX?
> >> >> >> >>>
> >> >> >> >>> 3. Section 10, you can probably get a better estimate by
> >> >> >> >>> how much head room is available on the ACPU and FP
> >> >> >> >>> processors today at maximum spec load.  Since your
> >> >> >> >>> changes will not be increasing their performance in the
> >> >> >> >>> first incarnation, which is a pretty good estimator
> >> >> >> >>> assuming that you are not also changing the useable
> >> >> >> >>> memory and scheduler for the ACPU.
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> -----Original Message-----
> >> >> >> >>> From: Andy Sharp
> >> >> >> >>> Sent: Tuesday, January 06, 2009 3:27 PM
> >> >> >> >>> To: dl-Design Review
> >> >> >> >>> Subject: RFC: TuxRx Functional Spec
> >> >> >> >>>
> >> >> >> >>> Please review and comment on the TuxRx Project Functional
> >> >> >> >>> Spec:
> >> >> >> >>>
> >> >> >> >>> http://ripper.onstor.net/eng/tuxrx/tuxrx_func_spec.pdf
> >> >> >> >>>
> >> >> >> >>> http://ripper.onstor.net/eng/tuxrx/tuxrx_func_spec.pages
> >> >> >> >>>
> >> >> >> >>> Thanks,
> >> >> >> >>>
> >> >> >> >>> a
