![]() |
|
||||||||||||||
|
|||||||||||||||
IBM Research
Contact |
Why an access-layer-based architecture?
|
||||||||||||||||||||||||||||||||||||||||||||
|
The layered RDMA host software architecture shown in Fig. 10, which cleanly separates generic/OS and verbs-provider-specific software functionality into user/kernel Access Layer (uAL/kAL) and user/kernel Verbs Provider (uVP/kVP) components, respectively, has a number of advantages, including:
(A1) to (A3) were discussed in the description of the IT-API. Memory management (MM) extensions for RDMA (A4) are an excellent example for generic functionality that belongs to the kAL. For instance, we have extended Linux MM such that memory is pinned consistently at both VMA level and page level. This resolves a number of issues related to pinning of read-only address intervals, handling of copy-on-write (COW) situations, and overlapping pinnings. Regarding (A5), the use of a single device file for the kAL simplifies the auditing of verbs-provider-specific software, since all RDMA syscalls pass through the uAL as well as the kAL's syscall handler for parameter validation. Consider now the creation of an RDMA resource, which typically consists of a uAL object, a kAL object, and corresponding uVP and kVP objects. The uAL and kAL objects are used for the OS-wide organization and identification of RDMA resources and are typically small; the uVP and kVP objects contain data structures specific for an RNIC implementation. Fig. 11 illustrates the creation of an endpoint and its associated queue pair. An it_ep_rc_create() call (1) into the uAL results in an ri_qp_create() call (2) that is passed via NP-RNICPI to the uVP, which in turn calls the uAL's ri_sys_qp_create() (3). This SYS-RNICPI upcall identifies the uAL's endpoint context via the os_data opaque and invokes the kAL-provided syscall (4) for creating an IT-API endpoint, which passes userspace context information of both uAL and uVP (for the endpoint and the corresponding queue pair, respectively) down to the kAL. As an illustration of (A6), the kAL's syscall handler now generates an OS-wide unique endpoint ID, which will be used for object identification in all subsequent kAL syscalls referring to the endpoint - the subsequent kAL syscalls are simplified by replacing partially redundant verbs information such as the triple (selected VP, RNIC handle, QP handle) with a single, OS-wide unique endpoint ID. Next, the kAL calls the kVP's ri_qp_create() (5) via P-RNICPI. Optionally, and selectable by the verbs provider, the kAL can pass the unique endpoint ID to ri_qp_create() as a replacement for a verbs-provider generated QP handle - using the same OS-wide unique ID for the endpoint and the corresponding queue pair simplifies resource management. The kVP's implementation of ri_qp_create() does an upcall (6) into the kAL to either map work queues in device memory into userspace, or to allocate work queues in main memory as dual user/kernel mappings, i.e., mappings that are simultaneously visible to uVP and kVP. See SoftRDMA for the use of dual user/kernel mappings. It should be noted that the uAL's it_ep_rc_create() call, upon returning from ri_qp_create() (13), can easily audit the uVP by checking whether or not it called back into the uAL as required (A5). The layered architecture also has a few potential disadvantages:
However, since the work request and work completion formats of IT-API and RNICPI are similar, an implementation can convert quite efficiently between the two. Further optimizations are possible, though, and the IT-API and RNICPI work groups are open for suggestions. |
|
|||||||||||||||||||||||||||||||||||||
|