Proposal for Porting Linux 2.6.27 onto NCPU & ACPU cores William Fisher, Version 3 October 20, 2008 1.0 Objectives and Requirements --------------------------- The requirements are summarized in the following points: 1.1 To obtain a more updated TCP/IP stack which supports both IPv4 and IPv6. An adaption of the Linux TCP/IP protocol stack allows obtaining IPv4, IPv6, bonding driver improvements and 10 Gigabit Ethernet support easily. In addition, the standard IP based protocols such as FTP, Telnet and others come without major development effort. 1.2 The support of 10 Gigabit Ethernet device drivers via deploying standard vendor Linux NAPI device drivers for PCI-Express hardware is a straight-forward migration path. 1.3 This adoption of Linux on the Sibyte 1480 TXRX core is a first step in the migration to SystemX Linux. Using the existing Cougar hardware platform allows us to use one of the two SiByte Processor Sockets to execute Linux with the other one supporting the FP functions unchanged. This plan allows a transition of the NCPU and ACPU functions to Linux with a minimum of disruption to the system software. 2.0 Proposal Overview ----------------- In the sections below, the identified task are listed in approximage chronological development order. There are a number of things that must be done first before the NFS/CIFS functionality can be tested. Hence there's room for more schedule parallelism depending on the resource allocation. 2.1 Port stock Linux 2.6.{26,27} kernel onto the one Sibyte socket The goal is to "port" a stock Linux 2.6.{26,27} kernel onto the TXRX node, aka the Sibyte 1480 socket supporting 4 processor cores. This will be a very stripped down kernel supporting the minimum number of device drivers, file systems and user functions. Since a Linux 2.6.22 kernel has been ported to the SSC Sibyte 12XX processor as part of the 4.0 release, it's envisioned this is a straight forward task. 2.2 Loading NCPU Linux kernel ELF Modules This task concerns loading the NCPU Linux kernel ELF modules after the SSC has loaded NCPU Linux into one of the two Sibyte 1480 processor sockets. The existing PROM code executing in SSC memory, and loads all three images into memory and then notifies the various nodes to start execution. The SSC Linux has driver support for reading/writing files from Compact Flash (CF) attached the SSC. After SSC Linux has been booted, the NCPU/ACPU/FP processor cores are loaded by reading non-ELF "binary" code files, from either the CF or from the network using NFS, into the main memory of the Sibyte 1480 processor sockets. In the current software, after the Sibyte 1480's images are loaded, the Embedded Eagle Executive (EEE) requires no further code loading functions. On approach is to store the NCPU kernel modules onto the CF. In order to access the CF, a communication channel is needed between NCPU Linux and SSC Linux, to allow modules to be read from CF. If we extend the ssc-mgmt driver running on the SSC to pass TCP/IP packets contained in skb's via the shared queue interface, this "point-to-point" channel could be used to access the CF via NFS from the NCPU Linux. The current ssc-mgmt driver already supports passing EEE messages contained in skb's, hence a straight-forward extension is to extend it to pass IP packets. Using this driver, we could create a point-to-point network interface, using a private assigned IP address such as 192.168.X.Y, and send TCP/IP packets between the NPCU and the SSC. Using NFS as the upper level protocol running over this link, the CF file system can be mounted onto the NPCU. Since the module we are loading into NCPU Linux is the OnStor NFS/CIFS module, we have the classic chicken-and-the-egg problem during the module loading phase. This has the side-effect of using stock NFS in NCPU Linux to support the mounting and file system operations. The requirement to use a stock Linux client NFS inside the NCPU to load modules can use the stock initrd stored on CF for production deployment. The use of a small ramdisk containing the root fileystem is also envisioned with NFS file system used for module loading during software development to speed-up the debug cycle of the ACPU module. With this approach, an old module can be unloaded, a new module copied onto the CF on the SSC and reloaded without requiring a reboot of the Sibyte processors. The use of a stock Linux NFS client in the NCPU could also be a benefit in several other areas in the product. These include the ability to save log files, packet captures using tcpdump or ethereal and saving other files on the CF attached to the SSC. It's envisioned the final software would use the stock NFS Linux client in the NCPU Linux for both module loading as well as logging and saving files. 2.3 Support Shared Memory Queues and Messaging Protocol between Linux processors In order to minimumize the changes to other parts of the system software, our plan entails using the standard shared memory messaging queue's implemented today to communicate between NCPU, ACPU, FP and SSC processors. The path of minimal distruption is to leave unchanged the message types and formats used today. This task addresses the changes required to the NCPU Linux kernel to initialize the shared memory queues and to add support to send and receive messages using standard Linux device drivers. In addition porting the mgmt-bus driver and eee protocol modules are described below. It is a requirement that access by the SSC to the NCPU mailbox registers is provided so the NCPU can be immediately notified when a message is written into the shared queue. This task requires more investigation into the exact set of changes needed to enable this feature. It might be the case that the PROM code requires changes to do proper mapping of the PCI registers, etc. 2.4 Port SSC "mgmt-bus" driver and eee protocol modules to NCPU Linux Kernel Since we are replacing the NCPU's EEE functionality with a Linux kernel, this task covers the porting of the SSC "mgmt-bus" device driver, implementing the shared message queues between the SSC and NCPU, and using common Linux device drivers and protocol modules on both the SSC and NCPU Linux kernels. The Linux mgmt-bus device driver and 'eee' network protocol modules were written and integrated into the 4.0 release, this task is envisioned as a simple porting and testing effort. 2.5 Test the "mgmt-bus" driver and eee protocol modules on NCPU Linux The task covers testing the mgmt-bus driver and eee protocol modules by running traffic between the SSC and NCPU after the various supporting software has been ported. 2.6 Memory Allocation Task This task covers the memory allocation interfaces, sizes and mapping functions currently used in the NCPU and APCU software. Since we are replacing the NCPU's EEE functionality with a Linux kernel on the NCPU and ACPU cores, the EEE memory allocation schemes must be explictly addressed. Currently the EEE supports two memory regions, one for descriptors and buffers and the other for general memory allocation. The use of common shared memory regions mapped into all the cores must be maintained for descriptors, buffers, queues and messages. However other local memory allocations should be converted to call the generic kernel memory allocator. The recommendation is to convert the eee_ramAlloc() and cache_alloc() interfaces into calls to the generic Linux kernel memory allocator. The plan is to allocate the skb's and their associated buffers from the common memory region so that the zero copy networking/filesystem operations are maintained. In addition, the allocation of the shared queue's and their associated messages must be allocated from another part of this common memory region to maintain backward compatability. 2.7 Sibyte 1480 Processor Exception Handling This task covers the design and investigation of the existing MIPS Exception handling code on the ACPU and NCPU with the replacement of the EEE OS by a stock Linux kernel on the NCPU core(s). In the current software, the cores establish "passive crash" interrupt handlers which are activated by a mailbox interrupt when a processor core crashes. This approach supports the stopping of all cores in the event of a crash. With the Linux kernel running on the NCPU, a similar scheme will be required for both the cores under control of Linux as well as those assigned to the FP functions. It is envisioned that the existing code will require modifications to use the mailbox interrupts for more tasks than just the "passive crash" case today. Some cases to consider include modifications to ACPU and FP code to interrupt the NCPU when ever messages are placed in the IPC queues, or when the number of TXRX buffers/descriptors available for allocation on FP falls below some threshold. There are probably other cases which must be determined and properly handled. 2.8 NCPU Linux kernel debugger and core dump support This task covers obtaining a working kernel debugger kgdb, and other tools supporting obtaining MIPS kernel crash dumps on both the SSC and NCPU. The minimum requirement is to obtain a panic message which includes the processor registers, stack trace at the minimum for debugging. Having a debugger is a requirement for faster development. 2.9 Port RCON support to NCPU Linux Kernel This task covers adapting RCON SSC Linux driver to the NCPU Linux kernel. 2.10 Test the RCON functions between SSC and NCPU Linux's The task covers testing the remote console (RCON) functions between the SSC and NCPU after the various underlying supporting software tasks, covered previously have been ported and unit tested. 2.11 NCPU Linux distribution of messages from SSC. This task covers the messaging communication between the SSC and the ACPU and FP cores. Currently the NCPU core receives all messages destined for the ACPU and FP cores coming from the SSC. The NCPU is responsible for forwarding messages destined for these others. If we can remove the examination of the messages destined for the FP by the NCPU coming from the SSC, we can reduce the extra overhead in touching all of the FP messages. This task needs further study to accurately scope the implementation effort. 2.12 NCPU Linux IP Forwarding Functionality This task covers the IP forwarding functions supporting sending packets to/from the SSC when packets are received on network interfaces supported by the NCPU Linux. It is envisioned that stock Linux kernel IP forwarding functions and Linux netfilter can satisfy there requirements. The NetFiler functionality readily supports address translation, packet filtering and forwarding across interfaces typically used in firewalls, etc. This task needs additional study to accurately scope the implementation effort. 2.13 Socket communication between NCPU and FP This task covers the messaging communication between the NCPU Linux kernel and the FP functionality. The specific messages sent between the NCPU and FP are defined in sm-tpl-fp/tpl-fp-api.h and cover socket operations such as open, close, listen, accept, read/write and unbind. The task requires supporting these messages when the Linux TCP/IP stack has been substituted for the current OpenBSD based TCP/IP implementation. 2.14 Virtual Stack communication between NCPU and SSC This task covers the messaging communication between the SSC and NCPU Linux kernel specific to virtual stacks. It covers the requesting and obtaining information pertaining to virtual interfaces, adding and deleting routes and obtaining routing tables, configuring interfaces, getting packet and network interfaces statistics, TCP and UDP connections, etc. Since the NCPU will field these messages and generate the appropriate replies, this task address'es implementing the code to obtain the eqivalent data from the Linux protocol stack. The messages and the current implementation are described in sm-ipm/ipm.[h,c]. There are a number of messages that require a considerable amount of information to be passed back to the SSC regarding the state of the entire protocol stack. These include cumulative IP, UDP and TCP statistics, UDP and TCP connection tables with the message sizes ranging from 32K to 800K for statistics. Since some of these messages include information specific to the BSD protocol, this may require considerable work to maintain exact conformance to the existing message formats under Linux. Modifications of the messages in this area might be required. This task needs further study to accurately scope the implementation effort. One view is to implement portions of this functionality as a user level daemon running under NCPU Linux which obtain the various statistics and connection information by reading either the /proc or /sysfs file system entries to get the information needed for the response message sent back to the SSC. 2.15 Virtual Server Support on the Linux NCPU This task covers the messaging communication between the Virtual Server software running on the SSC and the NCPU Linux kernel. The Virtual Server message formats will remain unchanged, so the work covers implementing the functionality proviously added to the BSD protocol stack on the NCPU core which supported these messages. The development centers on obtaining the information needed to satisfy requests and responding with appropriate replies. The Linux implementation must also implement those messages requiring explict notification of changes in the networking stack occuring which must be communicated back to the SSC Virtual Server. The Linux implementation of the vstack partitioning of the BSD protocol stack into separate "instances" may be implemented using Linux netfilter functionality. This is an open question needing more detailed study to scope the implementation effort. 2.16 Convert OnStor Packet Descriptors (pkt_desc) to Linux Socket Buffers (skb's) The task covers replacing the use of the pkt_desc data structure used in describing network data passed between the NCPU and the ACPU cores with the use of standard Linux socket buffers (skb's). This appears to be a straight-forward replacement, since they are both nearly equivalent and allows passing Linux networking buffers to the ACPU without copying. The task covers the kernel changes required to modify the skb memory allocator to use the common mapped shared memory region between the Sibyte processor sockets versus using a generic kernel slab allocator region. A chain of skb's will be allocated from the common mapped shared memory region between the NCPU, ACPU and FP and continued use of zero-copy networking will be supported. The handoff of ownership of the buffers to the destination code via IPC using the shared messages queues, will continue. 2.17 Convert Linux kernel TCP/IP networking stack to be TPL-API aware This task covers modifying the Linux networking code to be aware of the Transport Layer API (tpl-api) interfaces that must be supported to communicate with the ACPU. This will allow the Linux networking code to call the appropriate tpl-api functions when changes occur requiring notifcation or reception of messages in either direction. It would be nice to avoid these level of chagnes to the Linux protocol stack if possible. Hence some additional investigation on other possible approaches is necessary. 2.18 Convert ACPU NFS/CIFS code to be a Linux kernel module This task covers modifying the ACPU NFS and CIFS code to become standard Linux kernel modules. The additional changes envisioned include running as a Linux kernel process pinned to a processor core. 2.19 Convert ACPU NFS code to use Linux Socket Buffers This task covers modifying the NFS code to use Linux Socket Buffers (skb's) rather than OnStor pkt_desc's. Since the queue's and message formats will remain unchanged, with the exception of passing 'skb' pointers in the data messages, the basic assumption of passing a complete RPC/XDR message chain between the NCPU and the ACPU remains unchanged. A closer examination of the NFS code shows that is currently handles chains of buffers and uses only a few fields of the pkt_desc data structures which have equivalents in the skb data structure. 2.20 Convert ACPU CIFS code to use Linux Socket Buffers This task covers modifying the CIFS code to use Linux Socket Buffers (skb's) rather thanf OnStor pkt_desc's. Since the queue's and message formats will remain unchanged with the exception of passing 'skb' pointers in the data messages, the basic assumption of passing a complete message chain between the NCPU and the ACPU remains unchanged for CIFS. 2.21 Test modified ACPU NFS code This task covers testing the modified NFS code running under the Linux kernel on the ACPU processor core. Since the virtual server functionality is required by the NFS code, the testing and debugging of the modified code must be done later in the schedule. 2.22 Test modified ACPU CIFS code This task covers testing the modified CIFS code running under the Linux kernel on the ACPU processor core. Since the virtual server functionality is required by the CIFS code, the testing and debugging of the modified must be done later in the schedule. 2.23 NCPU Linux Performance Profiling Tools This task covers obtaining a working set of performance measurement tools for collecting data on he Sibyte 1480 processors. These would include the normal Linux Oprofile, GNU gprof and the Broadcom Sibyte profiler tools available from Broadcom under license.