|
A software implementation of RDMAP/DDP/MPA, referred to as SoftRDMA
or a SoftRDMA Verbs Provider, can be used to
enable RDMA on a client without RDMA hardware and thereby support
a server that relies on RDMA for performance. The fact that the
TCP context of a live TCP connection need not be moved between two
different TCP/IP stacks during socket conversion for iSeR or SDP
may also be advantageous. For direct data placement, SoftRDMA may
offer lower performance than an RNIC hardware implementation. However,
since SoftRDMA can benefit from asynchronous communication semantics,
multiple outstanding work requests and RDMA operations, performance
gains such as improved latency and message rate can still be expected
compared to plain TCP/IP.
We expect SoftRDMA to become useful in bridging equipment that
will be able to transparently pass RDMA traffic between InfiniBand
and iWARP-based RDMA networks. SoftRDMA is also an attractive test
environment for an RDMA host software architecture and RDMA applications
because it has no hardware dependencies.
Our starting point for SoftRDMA on Linux is to run RDMAP/DDP/MPA
on top of an untouched kernel TCP/IP stack. The initial SoftRDMA
implementation will support asynchronous operations through an efficient
implementation of work queues and completion queues. We are investigating
dual user/kernel mappings, i.e., mappings that are simultaneously
visible to uVP and kVP. For a SoftRDMA kVP, allocating queues in
main memory through such mappings allows particularly efficient
asynchronous I/O. Moreover, a single system call (kernel transition)
is sufficient for setting up an endpoint with SQ and RQ visible
in both user and kernel address space. The kVP's ri_qp_create()
can use the vp_data opaque to pass userspace
control information for work queue access back to the uVP.
In a later version, changes to the in-kernel TCP stack will be
investigated for improving local buffer management, minimizing data
copy operations in the transmit and receive path, and to support out-of-order
placement of incoming DDP segments. These optimizations may include
changes to the in-kernel TCP interface.
|