X-Sylpheed-Account-Id:2
S:andy.sharp@lsi.com
SCF:#mh/Mailbox/sent
X-Sylpheed-Sign:0
X-Sylpheed-Encrypt:0
X-Sylpheed-Privacy-System:
RMID:#imap/LSI/INBOX	0	861DA0537719934884B3D30A2666FECC94302B7D@cosmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 22 Sep 2009 13:07:42 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: "Kozlovsky, Maxim" <Maxim.Kozlovsky@lsi.com>
Cc: "Ariyamannil, Jobi" <Jobi.Ariyamannil@lsi.com>, "Sharp, Andy"
 <Andy.Sharp@lsi.com>, "Fisher, Bill" <Bill.Fisher@lsi.com>, "Fong, Rendell"
 <Rendell.Fong@lsi.com>, "Stark, Brian" <Brian.Stark@lsi.com>
Subject: Re: Linux vs EEE
Message-ID: <20090922130742.1d9fcbe1@ripper.onstor.net>
References: <4014E6EE2F9ED44299897AD701ED1C51F093179B@cosmail03.lsi.com>
	<861DA0537719934884B3D30A2666FECC94302B7D@cosmail02.lsi.com>
Organization: LSI
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Tue, 22 Sep 2009 12:38:12 -0600 "Kozlovsky, Maxim"
<Maxim.Kozlovsky@lsi.com> wrote:

> 
> 
> ________________________________
> From: Ariyamannil, Jobi
> Sent: Tuesday, September 22, 2009 11:26 AM
> To: Sharp, Andy; Fisher, Bill; Fong, Rendell; Kozlovsky, Maxim
> Cc: Stark, Brian
> Subject: Linux vs EEE
> 
> I thought to summarize some of the differences between EEE and a
> general purpose OS like linux.  We need to decide how to make EEE
> applications work with Linux.    From my Veritas experience, I can
> tell that linux port of VxFS is a lot convoluted due to the many work
> arounds or hacks used to work with the operating system constraints.
> VxFS being a host based file system which uses linux VFS layer and
> meeting some of the VFS requirements is a challenge for VxFS - Once
> VFS issues the unmount command, the unmount cannot fail, the VFS has
> its own cache of pathnames - dcache, nested mounts etc.  Good that we
> are not using Linux VFS, but even with device drivers a lot of OS
> interactions need to be spelled out to avoid future surprises.  Some
> of the complicated issues are memory allocation and stack overflow
> issues.  To avoid a lot of issues with those, the VxFS is disciplined
> not to allocate more than 4K of contiguous memory.   They had a lot
> of issues during the initial days of VxFS port with running out of
> kernel memory with low end linux systems.  Also the threads will hand
> off the work to another thread if the stack depth goes beyond a
> certain threshold level.  VxFS was easily overflowing the thread
> stack with recursions during bmap lookups, interrupt handling etc.
> On top of that, VxFS was not able to generate core dumps when system
> crashes making Linux debugging a nightmare.   Not sure whether all
> these issues are due to lack of linux expertise in Veritas/Symantec!
> 
> Now from the context of EEE applications like StorFS.
> 
> 
> 1.   The FS code pre-allocates stacks for some number of threads
> (2k).  Each stack is 16KiB.  Are we going to do the same or use OS
> threads?
> 
> [MK] Unless there is some reason that it can not be done, we are
> going to do the same.
> 
> 2.   A lot of work is performed by state machines which cannot be
> easily pre-empted.
> 
> [MK] They will not be preempted.
> 
> 3.   EEE uses polling and no support of interrupts or pre-emption.
> Linux on the other hand, uses interrupts.
> 
> [MK] The only interrupts that EEE code cares about are the timer and
> Qlogic. In both cases the interrupts will be handled by forwarding
> the processing to the EEE polling thread.
> 
> 4.   Will we have support for watch dog timers with the new platform?
> 
> [MK] Yes

Sure.  What did you have in mind to use them for?

> 5.   Are we going to use OS locks or use our own implementation of
> locks which supports timeouts, recursive locking etc?
> 
> [MK] The file system is going to use its own locks, the spin locks
> will be linux native.
> 
> 6.   EEE code crashes if memory allocation fails (most of the
> cases).  EEE applications are well behaved all these years, so we did
> not have a lot of low memory crashes from the field.  We need to make
> sure the same is true with the Linux port.

Or we could try to put some error handling code in these eee
"applications".

> [MK] Yes sure
> 
> 7.   The memory allocation or free operations in EEE will not sleep
> and thus can be invoked holding spin locks.  Will that be true with
> Linux?
> 
> [MK] The same property will be preserved.
> 
> 8.   When the system runs out of memory, it may ask the file systems
> to free up memory by flushing dirty data and free memory from the
> various caches.  As we are not hooked to the VFS layer, how this
> would be accomplished?
> 
> [MK] We are not going to give back the FS memory to Linux. We are not
> a general purpose platform, so the memory for NFS/CIFS/FS will
> essentially be reserved.

Not applicable.  We are likely to set up our own slabs that only onstor
code uses.  Essentially a hybrid to what eee does now.

> 9.   Are we going to allocate large chunks for memory for caching
> small objects like inodes?  If so, the file system needs to find
> whole chunks which are free to free up memory under low memory
> situations.
> 
> [MK] We are not going to allocate large chunks, this is handled by
> the slab allocator.
> 
> 10.   Linux may be efficient while allocating memory in chunks of 4K
> bytes.  But we use a lot of ultra buffers which is 8K+ bytes.  Are we
> going to preallocate/reserve memory for large allocations?
> 
> [MK] I don't think this is still true, but then my Linux knowledge is
> probably even less than yours.

I doubt this was ever true.  Anyway, Linux adopted a slab allocator
(almost identical to Solaris' and what virtually every OS I know of
today uses) system so long ago that I don't even remember when that was.
More than 6 years at least.  It currently has 3 different variants of
the slab allocator, if the standard one doesn't suit you.

> Please excuse me if these things are already discussed and sorted
> out.  Also my linux knowledge is very limited.

Ok, you're excused.  Not.  Keep in mind that making sure this sort of
stuff goes smoothly and doesn't surprise the project is in my job
description.  If it was going to be a problem, either I already have a
solution or I have already brought it to everyone's attention.

> Regards,
> Jobi
