AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20090522163534.0e712c15@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:mail.onstor.net
NSV:
SSH:
R:<brian.stark@onstor.com>,<Bill.Fisher@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@exch1.onstor.net/INBOX	0	102AB4F33EBBDB4C91915B145C8E9FB312973651A7@exch1.onstor.net
X-Sylpheed-End-Special-Headers: 1
Date: Fri, 22 May 2009 16:35:46 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: Brian Stark <brian.stark@onstor.com>
Cc: Bill Fisher <Bill.Fisher@onstor.com>
Subject: Re: Perhaps you know this one
Message-ID: <20090522163546.0637b688@ripper.onstor.net>
In-Reply-To: <102AB4F33EBBDB4C91915B145C8E9FB312973651A7@exch1.onstor.net>
References: <102AB4F33EBBDB4C91915B145C8E9FB312973651A7@exch1.onstor.net>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Understood.  My concern is that this isn't code that Bill, or perhaps
even Jonathan, knows very well.  It's FP code, which we haven't looked
at too much.  Bill, I leave it to your judgement as Brian suggests.

Thanks,

a


On Fri, 22 May 2009 16:33:15 -0700 Brian Stark <brian.stark@onstor.com>
wrote:

> The Shopzilla situation is very high priority, and we may get asked
> for further help.  I will be out the rest of today and tomorrow for a
> wedding, so please use your best judgement.  A little help will go a
> long ways.
> 
> 
> Brian
> 
> 
> ----- Original Message -----
> From: Andy Sharp
> To: Bill Fisher
> Cc: Brian Stark
> Sent: Fri May 22 16:24:01 2009
> Subject: Re: Perhaps you know this one
> 
> Bill,
> 
> Just to reiterate so I can be sure we're clear, unless Brian or I say
> so, your priority remains tuxrx.  Please don't spend any time on this
> in lieu of working on tuxrx unless told to do so.
> 
> Thanks,
> 
> a
> 
> On Fri, 22 May 2009 16:16:32 -0700 Bill Fisher
> <Bill.Fisher@onstor.com> wrote:
> 
> > Jonathan Goldick wrote:
> > > Do you know of any way to determine if eee_poll is still being
> > > called by a bobcat FP core? 
> > 
> > I would look at the various counters to see if they are being 
> > incremented. I presume this will require the debugger, since I don't
> > recall an explict procedure to do that. Since the polling buckets
> > are compiled off, that eliminates that case.
> >  >
> > > I do not mean whether dont_poll is TRUE, as in "eee poll off" but
> > > whether the base routine eee_poll is
> >  > still properly set up to be called by the 1280 core.  We have a 
> > reproducible bug at Shopzilla
> >  > where the polling loops effectively stop on all cores, with no 
> > watchdog crash.  Also, rcon still works.
> >  >
> > Since the watchdogs are being serviced, and rcon is working that 
> > immediately imples that the outer-loop cases are still being
> > called, the periodic calls and the watchdog checking.
> > 
> > 
> > The registered "polling" functions are stored in a table, vai the 
> > registerPollXX() procedure on the FP. What does the table dump show?
> > I presume it shows that everything was registered properly.
> > 
> > The next case is under what conditions it never gets back into the
> > major function calling cases, since it called the do_periodic_XXX()
> > procedure or as you noted the checking/processing of the timers.
> > 
> > The reason rcon is still working is that it's get's called once
> > every so many poll iterations, depending on the load on the machine.
> > Hence it's hard-wired into the basement of the eee_poll() procedure.
> > 
> >  >
> > > I believe I know the last routine to be called by the polling
> > > routine, evm_io_clockTic via eee_processTimers,  but have yet to
> > > determine any way it can get stuck without a watchdog.
> >  >
> > 	God question. From my reading of the code so far, I don't
> > have an answer right now.
> >  >
> > > Another possibility is that it  somehow steps on memory such that
> > > eee_poll is no longer called by the system.
> >  >
> > 	That would imply the poll function table is trashed. Isn't
> > there an rcon shell command to dump the registered polling
> > functions?
> > > 
> > > I know of no way to get a traceback of code run in the context of
> > > eee_poll but if you have any ideas on that I would appreciate it.
> >  > I have basically worked out evm_io_clockTic from indirect time
> >  > stamps 
> > rather than a thread stack.
> >  >
> > 
> > I will have to look at the code to get a better handle on "how"
> > this might ever happen. What I described above was from
> > "converting" the eee-poll() code to our Linux thread.
> > 
> > Later,
> > 
> > 
