cloudFPGA

Field programmable gate arrays for the cloud.

Overview

Field programmable gate arrays (FPGAs) are making their way into data centers (DC). They serve to offload and accelerate service-oriented tasks such as web-page ranking, memory caching, deep learning, network encryption, video conversion and high-frequency trading.

However, FPGAs are not yet available at scale to general cloud users who want to accelerate their own workload processing. This puts the cloud deployment of compute-intensive workloads at a disadvantage compared with on-site infrastructure installations, where the performance and energy efficiency of FPGAs are increasingly being exploited.

cloudFPGA solves this issue by offering FPGAs as an IaaS, PaaS or FaaS resource to cloud users. Using the cloudFPGA system, users can deploy FPGAs — similarly to VMs in the cloud — thus paving the way for large-scale utilization of FPGAs in DCs.

The cloudFPGA system is built on four pillars:

the use of standalone network-attached FPGAs,
a hyperscale infrastructure for deploying the above FPGAs at large scale and in a cost-effective way,
an accelerator service that integrates and manages the standalone network-attached FPGAs in the cloud,
a GitHub organization for sharing and hosting cloudFPGA related projects and collaborations.

Stand-alone network-attached FPGA

The concept of stand-alone network-attached FPGA builds on two main initiatives:

Changing the traditional way of attaching an FPGA to a CPU by moving from PCIe attachment to network attachment.
Promoting the FPGAs to the rank of remote peer processors by disaggregating them from the servers and provisioning them as independent and self-managed resources in the cloud.

The network attachment sets the FPGA free from the traditional CPU–FPGA attachment by connecting the FPGA directly to the DC network. As a result, the number of distributed FPGAs becomes independent of the number of servers.

Hyperscale infrastructure

To enable cloud users to rent, use and release large numbers of FPGAs on the cloud, the FPGA resource must become plentiful in DCs.

The cloudFPGA infrastructure is the key enabler of such a large-scale deployment of FPGAs in DCs. It was designed from the ground up to provide the world’s highest-density and most energy-efficient rack unit of FPGAs.

The infrastructure combines a passive and an active water-cooling approach to pack 64 FPGAs into one 19"×2U chassis. Such a chassis is made up of two Sleds, each with 32 FPGAs and one 64-port 10GbE Ethernet switch providing 640 Gb/s bi-sectional bandwidth.

In all, 16 such chassis fit into a 42U rack for a total of 1024 FPGAs and 16 TB of DRAM.

Accelerator Service: Management of Cloud FPGAs at scale

Today, the prevailing way to incorporate an FPGA into a server is to connect it to the CPU over a high-speed, point-to-point interconnect such as the PCIe bus, and to treat that FPGA resource as a co-processor worker under the control of the server CPU.

However, because of this master–slave programming paradigm, such an FPGA is typically integrated in the cloud only as an option of the primary host compute resource to which it belongs. As a result, bus-attached FPGAs are usually made available in the cloud indirectly via Virtual Machines (VMs) or Containers.

In our deployment, in contrast, a stand-alone, network-attached FPGA can be requested independently of a host via the cloudFPGA Resource Manager (cFRM, see figure). The cFRM provides a RESTful (Representational State Transfer) API (Application Program Interface) for integration in the Data Center (DC) management stack (e.g. OpenStack).

Cloud integration is the process of making a resource available in the cloud. In the case of cloudFPGA, this process is done by the combination of three levels of management (see Figure): A cloudFPGA Resource Manager (cFRM), a cloudFPGA Sled Manager (cFSM), and an cloudFPGA Manager Core (cFMC).

There is one resource manager per DC to control many Sleds. The cFRM handles the user images and maintains a database of FPGA resources.
There is one sled manager for every 32 FPGAs. The cFSM runs on a service processor that is part of the Sled. It powers the FPGAs on and off, monitors the physical parameters of the FPGAs, and runs the SW management stack of the Ethernet switch.
There is one cFMC per FPGA. The cFMC contains a simplified HTTP server that provides support for the REST API calls issued by the cFRM.

In the end, the components of all levels work together to provide the requested FPGA resources in a fast and secure way. System architecture for the cloudFPGA platform. 32 FPGAs, one switch and a service processor are combined on one carrier board and called Sled. The management tasks are split into three levels (all shown in green) — cloudFPGA Resource Manager (cFRM), cloudFPGA Sled Manager (cFSM), and cloudFPGA Manager Core (cFMC). A Sled is half of a 2U chassis. The OpenStack compute resources (Nova, shown in yellow) CPU nodes are also available for creating heterogeneous clusters.

System architecture for the cloudFPGA platform. 32 FPGAs, one switch and a service processor are combined on one carrier board and called Sled. The management tasks are split into three levels — cloudFPGA Resource Manager (cFRM), cloudFPGA Sled Manager (cFSM), and cloudFPGA Manager Core (cFMC). A Sled is half of a 2U chassis. The OpenStack compute resources (Nova) CPU nodes are also available for creating heterogeneous clusters.

cloudFPGA Organization

The cloudFPGA organization is a central place for sharing and hosting cloudFPGA related projects and collaborations. The organization consists of two main types of repositories:

Repositories which are part of the cloudFPGA core system such as:

the cFDK which contains a framework to help implement your FPGA application on a cloudFPGA research platform,
the cFSP which includes a support library for accessing the data and control paths of a cloudFPGA instance.
the cFCreate tool which eases the creation and the update of cloudFPGA projects,
the Doc repository which contains the generic cloudFPGA documentation used to build the GitHub pages that you are currently reading.
the Dox which is used to generate the Doxygen formatted documentation of the cFDK.

Repositories which contain specific cloudFPGA projects developed by the community such as:

cFp_HelloKale a 'Hello world' project based upon the shell 'Kale',
cFp_HelloThemisto a 'Hello world' project based upon the shell 'Themisto',
cFp_Zoo a set of domain-specific accelerators for the hybrid multi-cloud era, including some of the open source Vitis Library adapted for the cloudFPGA platform.

News & Events

Jun 2022 — Open-source development kit enables plug-and-play FPGA acceleration in the cloud

IBM researchers in Switzerland have released the cloudFPGA development kit, named cFDK, which enables developers to deploy accelerated compute kernels as a network-attached function on field-programmable gate arrays (FPGAs) within minutes. Recently open sourced, it is the first development suite targeting standalone, network-attached FPGAs in the cloud, enabling scalable, FPGA-accelerated cloud-native applications.

Read the entire article here.

Dec 2021 — How to "Hello World" @ cloudFPGA

We recently recorded a webinar as part of the EVEREST project and we thought that it might be of interest to some of our followers. If you are familiar with the cloudFPGA project, you may want to skip the first 20 minutes which cover generalities about FPGAs and their use in the EVEREST project. Next, we present two "Hello world" demonstrations that exemplify the cloudFPGA development flow, the interface with the resource manager, and the interaction with the deployed FPGAs.

The 1st demo starts around minute 21 with the deployment of a single FPGA that is programmed with a static bitstream via JTAG: video
The 2nd demo starts around minute 35 with the deployment of a cluster made of 1 CPU and 2 FPGAs, the latter two being programmed via TCP/IP/Ethernet with two partial bitstreams: video

Aug 2021 — cloudFPGA is now part of the EVEREST project

The cloudFPGA research platform has been selected by the EVEREST consortium to be one of its main demonstrator systems. EVEREST is an European project funded by the Horizon 2020 Programme for research and innovation. EVEREST stands for dEsign enVironmEnt foR Extreme-Scale big data analyTics on heterogeneous platforms. Its target is to develop a design environment to simplify the implementation of Big Data applications on FPGA-based platforms.

Read the press release here.