AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:
CFG:
PT:0
S:andy.sharp@lsi.com
RQ:
SSV:mhbs.lsil.com
NSV:
SSH:
R:<brian.stark@lsi.com>
MAID:2
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
FMID:#imap/LSI/INBOX	0	861DA0537719934884B3D30A2666FECC94373C7A@cosmail02.lsi.com
X-Sylpheed-End-Special-Headers: 1
Date: Tue, 29 Sep 2009 19:01:06 -0700
From: Andrew Sharp <andy.sharp@lsi.com>
To: Brian Stark <brian.stark@lsi.com>
Subject: Fw: FS threads design overview
Message-ID: <20090929190106.2e152174@ripper.onstor.net>
Organization: LSI
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary=MP_HH0++3XjgrQiBvwAXfDBuHo

--MP_HH0++3XjgrQiBvwAXfDBuHo
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I refuse to read this because it's not in M$ Word.


Begin forwarded message:

Date: Tue, 29 Sep 2009 18:17:59 -0600
From: "Kozlovsky, Maxim" <Maxim.Kozlovsky@lsi.com>
To:  "Sharp, Andy" <Andy.Sharp@lsi.com>
Subject: FS threads design overview


[FS-threads.txt  text/plain (9403 bytes)]
--MP_HH0++3XjgrQiBvwAXfDBuHo
Content-Type: text/html
Content-Transfer-Encoding: 7bit

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<font face="Times New Roman, serif" size="3">
<div>&nbsp;</div>
<div><font face="Arial, sans-serif" size="2"> </font></div>
<div>&nbsp;</div>
</font>
</body>
</html>

--MP_HH0++3XjgrQiBvwAXfDBuHo
Content-Type: text/plain; name=FS-threads.txt
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename=FS-threads.txt

Version 1.0.
Sep 29 2009


This document describes the FS threads port to Linux and solution for the 
memory management issues.  

1. Problem Statement

The file system threads are basic cooperatively scheduled threads implemented 
on top of EEE state machines.  The EEE provides the following very basic API: 

    thread_create()
    thread_exit()
    thread_yield()

The wake-ups are done by sending the events to the thread state machines, 
esm_deliverEvent().  
    
The file system code expects to be able to efficiently create and schedule 
thousands of threads.  Both thread creation and context switches are expected 
to be extremely light-weight.  The EEE scheduler is strict FIFO and as such is 
as efficient as the scheduler can get.  The linux implementation is expected 
to obey the same constraints.  

The memory manager for the file system is expected to provide never block 
never fail properties.  This is achieved by employing request gate, the new 
requests will not be scheduled unless it is proven there is enough memory to 
complete the currently running requests and the new one.  There is no static 
allocation of the data structures, however, unless the fragmentation goes 
through the roof, the memory allocation is bounded if the number of the data 
structures allocated is bounded.  The use of the slab allocator helps to keep 
the fragmentation in check, so running out of memory have not been a problem 
so far.  The file system code used to rely on the no block property, there are 
a lot of places in the current code that allocate the memory under spin locks.  
It seems that finding and modifying all these places will be too dangerous and 
it is desirable that the memory manager under the linux port will provide the 
same properties.  There a few places outside the file system code that rely on 
no-fail property of the memory manager.  The restricted environment allowed 
this code to work, the out of memory condition simply never happened.  With 
the linux port there is a possibility of all kinds of random code being 
executed on the box so the code should be shielded from the temporary out of 
memory condition.  

2. Design overview.

2.1. FS Threads

The file system threads will be implemented as co-routines on top of the Linux 
kernel threads, rather than having a one-to-one correspondence between Linux 
thread and FS threads.  This will help to keep the scheduler and thread 
creation/deletion overhead to the minimum and will prevent the scheduling 
problems where the herd of FS threads, FP, NCPU, ACPU polling threads and user 
threads will be fighting for the CPU.  

There will be several FP polling threads created, the number of the threads 
will depend on the number of physical cores in the system.  The Cougar 
hardware will have 5 FP polling threads - 5 FP + 1 ACPU + 1 Network 
Interrupts + 1 NCPU = 8, or if NCPU thread does not need full core, 6 FP 
polling threads. The Pikes Peak hardware will have 1 or 2 FP polling
threads (1 VxWorks + 1 Network + 1 ACPU + 1 FP = 4).  The polling thread will 
run the same polling loop as the current software, executing the currently 
runnable threads as just one of the routines in the polling loop.  All the 
code that the threads will be allowed to execute, including the memory 
allocation calls, will be strictly non-blocking.  The polling thread will 
switch to the thread context using roughly the same small piece of the MIPS 
assembler code.  The appropriate assembler code will have to be created for 
x86-64 port.  

2.2. Non-blocking Non-failing Memory Manager. 

The non-blocking non-failing property of the file system memory manager will 
be achieved by using roughly the same techniques as it is done now, with some 
enhancements.  Rather than checking for the available memory before request 
creation, the polling threads will check for the available memory before 
executing the next batch of the active threads/polling functions/ACPU 
requests.  The memory manager would be instructed to keep a significant chunk 
of memory, on the order of 100+ MB, free at all times.  If the free memory 
falls below the threshold, the execution of the FP and ACPU code will be 
temporary suspended until the kernel comes up with the free memory.  If no 
free memory appears before a predefined amount of time, ~4 seconds, passes, 
the box will be rebooted.  This will allow to satisfy the FS and ACPU requests 
for memory without blocking or failing.  

2.5. Yielding Polling Threads

Despite being called polling threads out of habit, the FP and ACPU threads 
will be required to yield the CPU under the linux architecture to allow the 
user code to run.  Instead of polling continuously like the current code does, 
the polling threads will go to sleep if no work can be done.  All the routines 
that produce work for the polling threads will be modified to include a 
polling threads wake up if the corresponding input queue goes from empty to 
not-empty.  The examples of such routines are sending an event to a thread, 
putting a packet on the ACPU input queue, detecting a qlogic interrupt.  The 
new routines will be added to process timer and qlogic interrupts.  

The polling threads will be also yielding the CPU without going to sleep after 
running for longer than predefined time quantum to allow effective sharing of 
the CPU with the user code. The user code will be preempted after using its 
quantum as it normally happens under linux scheduler.

2.4. CPU_PRIVATE data

The CPU_PRIVATE data will be per-thread rather than per-core.  It will be 
implemented as thread specific data associated with the polling threads.  It 
is expected that the syntax will have to change and we'll no longer will be 
able to use the direct variable reference hack, the references to the 
CPU_PRIVATE variables will have to use function call syntax.  

2.5. Debuggability.

2.5.1 KGDB and Core Dumps.

The linux core dump and interactive debugging code will have to be modified to 
be able to be aware of the file system threads. The current EEE gdb and thread 
support and modification to gdb to support the file system threads can be 
consulted for ideas on how this is done.

2.5.2. Volume Exception Dumps.

The volume exception supporting code, namely the user space utility that 
rewrites the exception dump into a core dump format, will need to be updated 
to work with the current version of GCC and GDB. I believe the GDB format has 
slightly changed between revisions, but I can be mistaken here. At least 
we'll need to make sure that the code still works.

2.5.3. Threadstacks dump.

The thread stacks dump feature will be supported just the same as it is now.  
The FP threads will have the same feature allowing to suspend the threads by 
setting global flag and dumping the thread stacks to the log once the threads 
are suspended.  The only difference is that instead of being per-core the 
suspend flags will be per-thread.







--MP_HH0++3XjgrQiBvwAXfDBuHo--
