AF:
NF:0
PS:10
SRH:1
SFN:
DSR:
MID:<20081017184322.6eed0805@ripper.onstor.net>
CFG:
PT:0
S:andy.sharp@onstor.com
RQ:
SSV:onstor-exch02.onstor.net
NSV:
SSH:
R:<bfisher@onstor.com>
MAID:1
X-Sylpheed-Privacy-System:
X-Sylpheed-Sign:0
SCF:#mh/Mailbox/sent
RMID:#imap/andys@onstor.net@onstor-exch02.onstor.net/INBOX	54845	48F7E198.4080502@onstor.com
X-Sylpheed-End-Special-Headers: 1
Date: Fri, 17 Oct 2008 18:43:55 -0700
From: Andrew Sharp <andy.sharp@onstor.com>
To: William Fisher <bfisher@onstor.com>
Subject: Re: ACPU/NCPU Linux Proposal and Task List; Take 2
Message-ID: <20081017184355.05875b9a@ripper.onstor.net>
In-Reply-To: <48F7E198.4080502@onstor.com>
References: <48F7E198.4080502@onstor.com>
Organization: Onstor
X-Mailer: Sylpheed-Claws 2.6.0 (GTK+ 2.8.20; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Bill,

Feedback in line.  Let me know if any of it's unclear.

In summary, this is a good description of the big-picture tasks.  I
especially like the testing steps included as tasks.  A little more
investigation on the vsvr implementation stuff, which I should probably
do, and we can tack some numbers on here and share it around.  Let's
shoot for Monday to get together and discuss some more.


On Thu, 16 Oct 2008 17:51:36 -0700 William Fisher <bfisher@onstor.com>
wrote:

> Guys:
>
> Attached is take 2 of the proposal. I have re-ordered the tasks
> as per Max's suggestion and incorporated the changes as
> per his review comments.
>
> As before, let me know what I am missing or have
> gotten wrong.



>			October 16, 2008
>			William Fisher, Version 2
>
> Proposal for Porting Linux 2.6.{26,27} onto NCPU & ACPU cores
>
> 1.0 Objectives and Requirements
>     ---------------------------
>
> The requirements are summarized in the following points:
>
> 1.1 To obtain a more updated TCP/IP stack which supports both
>     IPv4 and IPv6. An adaption of the Linux TCP/IP protocol

As well as application protocols such as FTP requested by some
customers.

>     stack allows obtaining IPv4, IPv6, bonding driver improvements
>     and 10 Gigabit Ethernet support easily.
>
> 1.2 The support of 10 Gigabit Ethernet 

without having to write our own device drivers.

>
> 1.3 This adoption of Linux on the NCPU core(s) is a first step

... on the TXRX 1480 node is a first step

>     in an eventual 

... migration path to System X.  And it shall be called TUXRX.
Pronounced "tucks-arr-ecks".


> 2.0 Proposal Overview
>     -----------------
>
>     In the sections below, the identified task are listed in
> approximage chronological development order. There are a number of
> things that must be done first before the NFS/CIFS functionality can
> be tested. Hence there is probably run for more parallelism depending

run?  room?

> on the resource allocation.
>
> 2.1 Port stock Linux 2.6.{26,27} kernel onto the one Sibyte socket
>
>     The goal is to "port" a stock Linux 2.6.{26,27} kernel onto the

onto the TXRX node.

>     one of the two SiByte sockets, supporting 4 processor cores. The
> will be a very stripped down kernel supporting the minimum number of
>     device drivers, file systems and user functions. Since a Linux
> 2.6.22 kernel has been ported to the SSC Sibyte 12XX processor as
> part of the 4.0 release, it is envisioned that this is a very
> straight forward task.
>
> 2.2 Loading NCPU Linux kernel ELF Modules
>
>     This task concerns loading the NCPU Linux kernel ELF modules
> after the SSC has loaded NCPU Linux into one of the two Sibyte 1480
> processor sockets.
>
>     The existing PROM code executing in SSC memory, and loads the SSC
> Linux image by reading files from the Compact Flash (CF) directly
> attached to the SSC hardware.

The SSC prom loads all three images into memory, then notifies the
other nodes to start executing their code.

>     In the current software, after the Sibyte 1480's images are
> loaded, the Embedded Eagle Executive (EEE) requires no further code
> loading functions.

Oooh.  Embedded Eagle Executive.  Oooh.

>     On approach is to store the NCPU kernel modules onto the CF. In
> order to access the CF, a communication channel is needed between
> NCPU Linux and SSC Linux, to allow modules to be read from CF. If we
> extend the ssc-mgmt driver running on the SSC to pass TCP/IP packets
> contained in skb's passed via the the shared queue interface, this
> "point-to-point" channel could be used to access the CF via NFS from
> the NCPU Linux.
>
>     The current ssc-mgmt driver already supports passing EEE messages
> contained in skb's, hence a straight-forward extension is to extend
> it to pass IP packets. Using this driver, we could create a
> point-to-point network interface, using a private assigned IP address
> such as 192.168.X.Y, and send TCP/IP packets between the NPCU and the
> SSC.
>
>     Using NFS as the upper level protocol running over this link, the
> CF file system can be mounted onto the NPCU. Since the module we are
> loading into NCPU Linux is the OnStor NFS/CIFS module, we have the
> classic chicken-and-the-egg problem during the module loading phase.
>     This has the side-effect of using stock NFS in NCPU Linux to
> support the mounting and file system operations.
>
>     The requirement to use stock Linux NFS inside the NCPU to load
> modules can be relaxed/emoved if we load a ramFS containing the NFS

It's called an "initrd" (initial ram disk) and we will use that for
production deployment.  I'm guessing we will simply go with a small
ramdisk for the root filesystem, but for early development purposes we
will be able to use NFS root filesystem.

> module during the NCPU Linux booting process. The use of stock Linux
> NFS on the NCPU can be used during software development to speed-up
> the debug cycle of the ACPU module since an old module can be
> unloaded, a new module loaded onto the CF on the SSC and reloaded
> without requiring a reboot of the Sibyte processors.
>
>     The final shippped software would not require stock NFS Linux in
> the NCPU Linux with the use of a ramFS containing the ACPU NFS/CIFS
> code module.
>
> 2.3 Support Shared Memory Queues and Messaging Protocol between Linux
> processors
>
>     In order to minimumize the changes to other parts of the system
                       ^^
>     software, our plan entails using the standard shared memory
> messaging queue's implemented today to communicate between NCPU,
> ACPU, FP and SSC processors. The path of minimal distruption is to
> leave unchanged the message types and formats used today.
>
>     This task addresses the changed required to the NCPU Linux kernel

changes

>     to initialize the shared memory queues and to add support to
>     send and receive messages using standard Linux device drivers.
>     This task address any changes in addition to the porting

addresses

>     of the mgmt-bus driver and eee protocol modules described below.
>
> 2.4 Port SSC "mgmt-bus" driver and eee protocol modules to NCPU Linux
> Kernel
>
>     Since we are replacing the NCPU's EEE functionality with a Linux
>     kernel, this task covers the porting of the SSC "mgmt-bus" device
> driver, implementing the shared message queues between the SSC and
> NCPU, and using common Linux device drivers and protocol modules on
> both the SSC and NCPU Linux kernels.
>
>     The Linux mgmt-bus device driver and 'eee' network protocol
> modules were written and integrated into the 4.0 release, this task
> is envisioned as a simple porting and testing effort.
>
> 2.5 Test the "mgmt-bus" driver and eee protocol modules on NCPU Linux
>
>     The task covers testing the mgmt-bus driver and eee protocol
> modules by running traffic between the SSC and NCPU after the various
> supporting software has been ported.
>
> 2.6 Memory Allocation Task
>
>     This task covers the memory allocation interfaces, sizes and
> mapping functions currently used in the NCPU and APCU software. Since
> we are replacing the NCPU's EEE functionality with a Linux kernel on
>     the NCPU and ACPU cores, the EEE memory allocation schemes must be
>     explictly addressed.
>
>     Currently the EEE supports two memory regions, one for
> descriptors and buffers and the other for general memory allocation.
>     The use of common shared memory regions mapped into all the cores

... mapped into the off-node cores

>     must be maintained for descriptors, buffers, queues and messages.
>     However other local memory allocations should be converted to call
>     the generic kernel memory allocator. The recommendation is to
>     convert the eee_ramAlloc() and cache_alloc() interfaces into
>     calls to the generic Linux kernel memory allocator.
>
>     The plan is to allocate the skb's and there associated buffers

their

> from the common memory region so that the zero copy
> networking/filesystem operations are maintained. In addition, the
> allocation of the shared queue's and there associated messages must

their

> be allocated from another part of this common memory region to
> maintain backward compatability.
>
> 2.7 Port RCON support to NCPU Linux Kernel
>
>     This task covers adapting RCON SSC Linux driver to the NCPU Linux
> kernel.
>
> 2.8 Test the RCON functions between SSC and NCPU Linux's
>
>     The task covers testing the remote console (RCON) functions
> between the SSC and NCPU after the various underlying supporting
> software tasks, covered previously have been ported and unit tested.
>
> 2.9 NCPU Linux distribution of messages from SSC.
>
>     This task covers the messaging communication between the SSC and
>     the ACPU and FP cores. Currently the NCPU core receives all
> messages destined for the ACPU and FP cores coming from the SSC. The
> NCPU is responsible for forwarding messages destined for these others.
>
>     This task needs further study to accurately scope the
> implementation effort.

Mention that this is a change that we can make use of right now to
improve performance, since the TXRX no longer has the extra overhead of
touching all the FP messages.  Won't be a big boost, maybe not even
measurable, but you wouldn't have to mention that part.

> 2.10 NCPU Linux IP Forwarding Functionality
>
>     This task covers the IP forwarding functions that must be
> supported to send packets to/from the SSC when packets are received
> on network interfaces supported by the NCPU Linux. In addition the
> Network Address Translation (NAT) functionality needs to be studied.

This should be trivial utilizing stock kernel IP forwarding and NAT
capabilities of netfilter, given paragraph 5 under item 2.2.

>     This task needs further study to accurately scope the
> implementation effort.
>
>     These requirements might be satisfied using the Linux NetFiler
>     functionality which easily supports NAT, filters and forwarding
>     functions across interfaces typically used in firewalls, NAT
>     boxes, etc.
>
> 2.11 Socket communication between NCPU and FP
>
>     This task covers the messaging communication between the NCPU
> Linux kernel and the FP functionality. The specific messages sent
> between the NCPU and FP are defined in sm-tpl-fp/tpl-fp-api.h and
> cover socket operations such as open, close, listen, accept,
> read/write and unbind.
>
>     The task requires supporting these messages when the Linux TCP/IP
>     stack has been substituted for the current OpenBSD based TCP/IP
>     implementation.

Really?  We send messages to the FP when we get an open?  Maybe for
CIFS I guess.  But actually it's not clear why we would send any
messages about socket events to the FP.

> 2.12 Virtual Stack communication between NCPU and SSC
>
>     This task covers the messaging communication between the SSC
>     and NCPU Linux kernel specific to virtual stacks. It covers the
>     requesting and obtaining information pertaining to virtual
> interfaces, adding and deleting routes and obtaining routing tables,
> configuring interfaces,  getting packet and network interfaces
> statistics, TCP and UDP connections, etc.
>
>     Since the NCPU will field these messages and generate the
> appropriate replies, this task address'es implementing the code to
> obtain the eqivalent data from the Linux protocol stack. The messages
> and there current implementation are described in sm-ipm/ipm.[h,c].

the current implementation ...

>     There are a number of messages that require a considerable amount
> of information to be passed back to the SSC regarding the state of the
>     entire protocol stack. These include cumulative IP, UDP and TCP
>     statistics, UDP and TCP connection tables with the message sizes
>     ranging from 32K to 800K for statistics. Since this includes
> information to specific stack instances under BSD, this may require
> considerable work to maintain this information under Linux.
> Modifications of the messages in this area might be required.

I don't know what you mean by "...specific stack instances..." hence
I'm not grocking this paragraph.  Hmm, maybe I'm getting it upon
reading it a second time.  I actually think this will be relatively
trivial, userspace code.  All the statistics should be easily
exportable to userspace via /proc or netlink socket (iptables) and a
userspace daemon will just pluck the info and hand it back to the SSC.

>     This task needs further study to accurately scope the
> implementation effort.
>
> 2.13 Virtual Server Support on the Linux NCPU
>
>     This task covers the messaging communication between the Virtual
>     Server software running on the SSC and the NCPU Linux kernel.
>
>     The Virtual Server message formats will remain unchanged, so the
>     work covers implementing the functionality proviously added to
>     the BSD protocol stack on the NCPU core that supported these
> messages.
>
>     The development centers on obtaining the information needed to
> satisfy requests and responding with appropriate replies. The Linux
>     implementation must also implement those messages requiring
>     explict notification of changes in the networking stack occuring
>     which must be communicated back to the SSC Virtual Server.

I'm starting to see this now, at least a little bit.  There will need
to be a daemon running on the TUXRX that handles talking to the SSC
for this kind of thing.  This daemon will both monitor messages coming
from the SSC and act on them, as well as send messages to the SSC
pertaining to events occuring on the TUXRX.

>     The Linux equivalent implementation of the vstack partitioning
>     of the BSD protocol stack, for separate routing tables, etc.
>     maybe implementing using Linux netfilter functionality. This is
>     an open question needing more detailed study.
>
>     This task needs further study to accurately scope the
> implementation effort.

Yes.

> 2.14 Convert OnStor Packet Descriptors (pkt_desc) to Linux Socket
> Buffers (skb's)
>
>     The task covers replacing the use of the pkt_desc data structure
> used in describing network data passed between the NCPU and the ACPU
> cores with the use of standard Linux socket buffers (skb's). This
> appears to be a straight-forward replacement, since they are both
> nearly equivalent and allows passing Linux networking buffers to the
> ACPU without copying.
>
>     The task covers the kernel changes required to modify the skb
> memory allocator to use the common mapped shared memory region
> between the Sibyte processor sockets versus using a generic kernel
> slab allocator region.
>
>     A chain of skb's will be allocated from the common mapped shared
> memory region between the NCPU, ACPU and FP and continued use of
> zero-copy networking will be supported. The handoff of ownership of
> the buffers to the destination code via IPC using the shared messages
> queues, will continue.
>
> 2.15 Convert Linux kernel TCP/IP networking stack to be TPL-API aware
>
>     This task covers modifying the Linux networking code to be aware
> of the Transport Layer API (tpl-api) interfaces that must be
> supported to communicate with the ACPU.
>
>     This will allow the Linux networking code to call the appropriate
>     tpl-api functions when changes occur requiring notifcation or
>     reception of messages in either direction.

This would suck.  Can we not handle this in the vsvr implementation?

> 2.16 Convert ACPU NFS/CIFS code to be a Linux kernel module
>
>     This task covers modifying the ACPU NFS and CIFS code to become
> standard Linux kernel modules, with the extra provision of running on
> a single dedicated processor core.
>
> 2.17 Convert ACPU NFS code to use Linux Socket Buffers
>
>     This task covers modifying the NFS code to use Linux Socket
> Buffers (skb's) rather than OnStor pkt_desc's. Since the queue's and
> message formats will remain unchanged, with the exception of passing
> 'skb' pointers in the data messages, the basic assumption of passing
> a complete RPC/XDR message chain between the NCPU and the ACPU
> remains unchanged.
>
>     A closer examination of the NFS code shows that is currently
> handles chains of buffers and uses only a few fields of the pkt_desc
> data structures which have equivalents in the skb data structure.
>
> 2.18 Convert ACPU CIFS code to use Linux Socket Buffers
>
>     This task covers modifying the CIFS code to use Linux Socket
> Buffers (skb's) rather thanf OnStor pkt_desc's. Since the queue's and
> message formats will remain unchanged with the exception of passing
> 'skb' pointers in the data messages, the basic assumption of passing
> a complete message chain between the NCPU and the ACPU remains
> unchanged for CIFS.

Is this and 2.17 just sub steps of 2.14?

> 2.19 Test modified ACPU NFS code
>
>     This task covers testing the modified NFS code running under the
> Linux kernel on the ACPU processor core. Since the virtual server
> functionality is required by the NFS code, the testing and debugging
> of the modified code must be done later in the schedule.
>
> 2.20 Test modified ACPU CIFS code
>
>     This task covers testing the modified CIFS code running under the
>     Linux kernel on the ACPU processor core. Since the virtual server
>     functionality is required by the CIFS code, the testing and
> debugging of the modified must be done later in the schedule.
>
> 2.21 NCPU Linux kernel core dumps
>
>     This task covers obtaining a working kgdb and kernel crash dump
> on the NCPU. The obtaining of the messaging communication between the
> Virtual Server software running on the SSC and the NCPU Linux kernel.

I assume you meant obtaining a panic message (including stack trace),
since we don't do no core dumps in ye ole' Linux kernel.  But ideally
our code will be so clean, straight forward, simple and bullet proof
that we won't even need kgdb.  Yeah, that's what I said.

