Remote direct memory access

Work on RDMA host software at IBM Research – Zurich

Within the Interconnect Software Consortium (ICSC) of The Open Group, we contributed to the standardization of RDMA-enabled programming interfaces, co-chairing both the Interconnect Transport API (IT-API) and the RNIC Programming Interface (RNICPI) work groups. We helped defining a modular, layered, and transport-neutral host software architecture for RDMA through contributions [JAMENE-04] to an industry-driven Linux open-source project called OpenRDMA.

In this context, we have implemented a host software architecture for RDMA that provides the operating system (OS) integration for both iWARP and InfiniBand, supporting IT-API and an enhanced version of RNICPI. A key property of such an architecture is a clean separation of generic/OS functionality and verbs-provider-specific software functionality into user/kernel Access Layer (uAL/kAL) and user/kernel Verbs Provider (uVP/kVP) components, respectively. This approach permits a wide range of RNICs / verbs providers to register themselves through a standard programming interface and minimizes code bloat by keeping generic functionality such as OS-wide RDMA resource management, event handling and connection management in a single, OS-provided implementation. As a prototype, we designed a verbs provider called SoftRDMA, a pure software implementation of the IETF’s iWARP (RDMAP/DDP/MPA) protocol stack [SOFTRDMA-09].

As a broadly supported industry effort, the OpenFabrics Alliance (OFA) develops, distributes and promotes an open-source software stack for RDMA-capable adapters and RDMA transports including InfiniBand and iWARP. Because OpenFabrics provides RDMA support in the Linux kernel and already supports a wide range of RDMA devices as well as RDMA-enabled upper-layer protocols, we are currently developing a fully software-based iWARP Linux driver called Soft-iWARP, which fits into the OFA RDMA environment [SOFTIWARP-09]. The outcome of our work will be a device driver exporting the OFA RDMA verbs and connection manager interfaces. The Soft-iWARP kVP implements the iWARP protocols on top of kernel TCP sockets. It provides standards-compliant iWARP RDMA functionality at a decent performance level. All basic RDMA operations (RDMA resource and connection managment, asynchronous work request posting and completion, Send/Receive as well as RDMA Write and Read operations) are implemented and functioning. We support user-level applications through an OFA-compliant uVP library.

We plan to open-source Soft-iWARP soon to let the community participate in further design and implementation and to obtain feedback on the current design.

A software-based iWARP stack that runs at reasonable performance levels and seamlessly fits into the OFA RDMA environment provides several benefits:

  • As a generic (RNIC-independent) iWARP device driver, it immediately enables RDMA services on all systems with conventional Ethernet adapters, which do not provide RDMA hardware support.
  • Soft-iWARP can be an intermediate step when migrating applications and systems to RDMA APIs and OpenFabrics.
  • Soft-iWARP can be a reasonable solution for client systems, allowing RNIC-equipped peers/servers to enjoy the full benefits of RDMA communication.
  • Soft-iWARP seamlessly supports direct as well as asynchronous transmission with multiple outstanding work requests and RDMA operations.
  • A software-based iWARP stack may flexibly employ any available hardware assists for performance-critical operations such as MPA CRC checksum calculation and direct data placement. The resulting performance levels may approach those of a fully offloaded iWARP stack.

Besides contributing towards a more standardized RDMA ecosystem, we analyze the applicability of RDMA. We identify hidden costs in the setup of its interactions that, if not handled carefully, remove any performance advantage [FREY-09].