AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<ralf@linux-mips.org>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/LSI/INBOX	0	20091201012442.GC31892@linux-mips.org
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 2 Dec 2009 11:47:56 -0800
From: Andrew Sharp <andy.sharp@lsi.com>
To: Ralf Baechle <ralf@linux-mips.org>
Subject: Re: NUMA development for sb1 processors
Message-ID: <20091202114756.5b40ea39@ripper.onstor.net>
In-Reply-To: <20091201012442.GC31892@linux-mips.org>
References: <20091130143634.58416df2@ripper.onstor.net>
	<20091201012442.GC31892@linux-mips.org>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Mon, 30 Nov 2009 18:24:42 -0700 Ralf Baechle <ralf@linux-mips.org>
wrote:

> On Mon, Nov 30, 2009 at 02:36:34PM -0800, Andrew Sharp wrote:
> 
> > I have started to do the dev work to implement NUMA for sibyte
> > processors, and it ocurred to me that I should drop you a note just
> > to let you know, and also to ask if you have any advice for me.
> > The only MIPS processor that has it is ip27, but I'm thinking I
> > won't riff off that too much as it's a very odd (early/old)
> > architecture that doesn't have much to do with, well, ours, at
> > least. Ours looks exactly like you might find for an opteron based
> > NUMA system, except it's not opterons.
> 
> The IP27 isn't odd in my definition.  It's a predecessor of the later
> IP35 and also the SGI Altix series and all these are _brilliant_
> and modular system architectures.  On the downside it's a very complex
> architecture but you'd not expect anything less for a system
> architecture that scales all the way to the top spot of the Top 500
> list of supercomputers.

Perhaps odd was the wrong word.  It seems to be capable of much more
complexity than I think I need ... nodes with/without local memory,
heirarchical topology, etc.

> > The biggest question in my mind is about the topology.  It seems
> > that probably the best way to work things is to have the PROM code
> > pass in topology information, but then the question becomes 'what
> > does that look like?'  X86 uses ACPI these days, and I'm not
> > feeling excited about trying to mimic that, and neither am I
> > feeling motivated to try and mimic what is there for ip27.  So I
> > thought, well, maybe Ralf has an opinion that might be helpful.  In
> > case it isn't obvious, I will be doing the PROM changes as well
> > ~:^)  Our current PROM is occassionally CFE based, at least for the
> > part that handles booting.
> > 
> > My goal is to get this into the kernel, but without a standard for
> > the PROM support, I'm thinking I should put some effort into making
> > the topology part "usefulish" in a way that others can exploit it
> > with a minimum of pain, so any ideas or thoughts you might have
> > would be appreciated.
> 
> I'm not sure if flattened device tree covers NUMA but if so that would
> be strong recommendation.  IP27 btw. passes all sorts of information
> from the PROM - see arch/mips/include/asm/sn/klconfig.h.  The key
> there is that using klconfig does not involve using any calls into
> the firmware which traditionally are problems - what are the calling
> conventions (o32, N32, N64), stack alignment, locking conventions
> etc.  A data structure is so much easier to handle.

Flattened device tree ... where to look?  I'll take a gander at
klconfig.h.  BTW, I'm curious what does the 'sn' v. ip27 mean?  Is that
shared with a platform besides ip27?

> ACPI makes strong man cry but I guess you knew that one.

Indeed.

> I'm a bit surprised that you're mailing me about this - it is my
> understanding that for Broadcom the SB1-based stuff is more or less
> dead since a long time and that the NUMA stuff apparently is fairly
> slow though that may to a degree also depend on factors outside the
> BCM1480.

Heh.  Interesting questions.  Since we just kicked off a new build of
100 boxes, ie. 200 1480s, Broadcom said the demand was so high the lead
time would be 3 months.  And there are plans for selling boxes
numbering in the 4 figures.  They'll keep making them if they keep
selling them.

According to our sources at Broadcom, we are the only dual 1480 design
they know of, never mind NUMA.  We are running the HT bus at 600 MHz,
which I'm guessing will be fast enough.  If necessary, I'll try and
kick in the feature where the kernel text is copied to both nodes.

>   Ralf
