X-Sylpheed-Account-Id:2
S:andy.sharp@lsi.com
SCF:#mh/Mailbox/sent
X-Sylpheed-Sign:0
X-Sylpheed-Encrypt:0
X-Sylpheed-Privacy-System:
RMID:#imap/LSI/INBOX	5557	4ABAA258.7080106@lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Wed, 23 Sep 2009 16:50:12 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Fisher, Bill" <Bill.Fisher@lsi.com>
Cc: "Ariyamannil, Jobi" <Jobi.Ariyamannil@lsi.com>, "Kozlovsky, Maxim"
 <Maxim.Kozlovsky@lsi.com>, "Stark, Brian" <Brian.Stark@lsi.com>, Rendell
 Fong <rendell.fong@lsi.com>
Subject: Re: Linux vs EEE
Message-ID: <20090923165012.012de8ac@ripper.onstor.net>
References: <4014E6EE2F9ED44299897AD701ED1C51F093179B@cosmail03.lsi.com>
 <4AB95766.6000505@lsi.com>
 <4014E6EE2F9ED44299897AD701ED1C51F409D6B2@cosmail03.lsi.com>
 <4ABAA258.7080106@lsi.com>
Organization: LSI
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

I haven't seen all these emails and don't know who wrote what when, but
keep a few of these things in mind:

We have to use virtual memory.  There's no way to manage physical
memory apart from setting it aside at boot time so the kernel doesn't
know about it.  Additionally, we won't have physical memory, not
really, in the VM environment, so it stops being useful to try and
manage it.

Drivers, by way of the underlying DMA code, handle virtual memory that
must be in-memory for I/O purposes.  Drivers don't normally do anything
directly in regards to that so long as the memory was vmalloc'd
originally as being for DMA.  Which is done via kmem_slab_create() if
you're using the slab allocator to manage that memory.

Stack overflow is something that was a problem for a brief 4 month
period in the 90s.  Let's move on regarding that, shall we?  It's no
different for Linux than it is for EEE.

Linux memory allocation is somewhat better, and comes with different
facilities for detecting corruption, overwrites and so forth.  All the
good stuff.  Various subsets of these can be turned on or off at
compile time and/or run time.

Basically, all this sort of stuff has been thought through already, or
it doesn't need to be thought through.  We need to get some stuff out;
right now we don't really need to worry about the sky falling on us.
That's already been figured out.  And of course, continues to be
monitored.  Nothing to see here.  Don't worry your pretty little heads
about it: that's my job.


On Wed, 23 Sep 2009 16:34:00 -0600 William Fisher <bfisher@lsi.com>
wrote:

> Ariyamannil, Jobi wrote:
> > Thanks to all for the responses on this topic.  Things are a lot
> > clearer now.
> > 
> > 
> > 
> > To answer Andy's question, StorFS, NFS, CIFS etc are the EEE
> > applications I meant.
> > 
> > 
> > 
> > Some additional questions/comments I have:
> > 
> > 
> > 
> > 1.  Do we need the complexity of virtual memory and slab allocator
> > of Linux for a throughput system like ours?  Can we reserve a big
> > chunk of physical
> 
> memory and let the applications (FS, NFS, CIFS, FC) manage their
> memory? EEE also has some variant of slab allocator which worked well
> all these years
> 
> and we could potentially use the same memory management.  Do we know 
> whether Linux memory management is better than that of EEE?
> 
> 	The memory allocator under EEE is not really a slab
> allocator, it's a trivial linked list of buffers of fixed sizes.
> There is no recombining, avoidance of fragmentation, etc.
> 
> 	I don't see the importance of having 9K of contiguous
> 	physical memory versus 9K of contiguous Virtual memory.
> 	The allocation of the buffers
> 	are from contiguous Virtual Address'es which may have at
> 	least 2-3 pages assigned to them. How does that matter,
> 	we are not running a paging machine here.
> > 
> > 2.  From Bill's email, I guess the linux memory slabs are formed
> > using pages which I guess are of 4K size. 
> 
> This would involve significant metadata for managing the pages and 
> slabs.  Is it possible to use large page
> size for large allocations?  This may be not be an issue for linux
> since the host based applications are using
> 4K pages for regular I/O, but our file system uses 8K block size and
> a lot of file system metadata and user
> data are cached using 8K+ ultra buffers.  We don't want our system to 
> suffer due to 4K page size of linux.
> 
> The use of larger page sizes can be investigated if that is a 
> requirement. On the issue of 4K I/O, the drivers have to do
> the VM to PA address translation, so that is handled under
> Linux and other "unix" like OS'es.
> 
> What is the suffering you are expecting? Extra VA to PA
> mapping operations, split DMA operations, etc?
> 
> > 
> > 3.  I think we will leverage a lot of linux functionality going
> > forward (10GigE, ftp, http etc).
> So the system will be doing some work apart from the EEE applications.
> Also the EEE applications need to be prepared for interrupts, 
> pre-emption etc.
> If the EEE threads consume large portion of the allocated stack,
> which can happen during recursive bmap calls or directory name 
> processing code paths,
> an interrupt on top that could cause stack overflow.
> 
> 	Good question, that is the problem of the fs_threads
> implementation and possibly refacting the applications to use smaller
> 	stacks, I don't know enough to have an intelligent
> 	opinion at this point.
> 
> 
> Also the system may require to steal memory from the EEE applications 
> under real low memory conditions,
> if all of them use the same memory allocator.
> 
> I believe the NAS filing protocols and FS implementation will evolve
> over time to be more Linux friendly, however the first steps are
> a "port" with the goal to minimize breakage for the Cougar line.
> 
> The x86 line, with a wholesale replacement of the FC drivers, SCSI
> support, volume manager and the like is another separate topic.
> That is Phase 2, under Brian'ss schedule.
> 
> -- Bill
