Overview

Ask the experts

As data is becoming the world’s new natural resource, the capability to store and get value out of it becomes critical to the success of businesses and organizations. To that end, Software-Defined Storage (SDS) plays a key role by offering the required flexibility, scalability, cost efficiency and agility.

SDS decouples storage functions from hardware and implements all the storage system intelligence in software that can run on general-purpose, off-the-shelf hardware components, as well as on virtualized cloud resources.

SDS enables storage systems to be shaped and sized to best fit the needs of particular use cases and workloads. Client APIs enable different pieces of storage infrastructure to be managed as a single entity and automate provisioning, policies and monitoring.

Data is becom­ing the world’s new nat­ural re­source.

—Ioannis Koltsidas, IBM scientist

Our approach is not limited to prototype systems inspired by forward-thinking ideas. We aim to develop practical SDS systems that can be used in real-world production environments.

Our research spans multiple types of SDS systems, including block storage, file storage, object and NoSQL storage on top of diverse storage media such as Flash, phase-change memory, disk and magnetic tape.

We build on open storage protocols such as NVMe, NVMe over Fabrics and OpenCAPI, open storage formats such as LTFS, and new storage access paradigms such as direct user-space I/O.

Using such building blocks, we are developing systems for both enterprise data centers and Cloud environments.

Software-defined NVMe Flash

The NVM Express (NVMe) family of protocols and interfaces introduces exciting new opportunities for software-defined storage systems. Replacing legacy protocols that have been designed for HDDs, PCIe NVMe drives enable a more direct access to storage.

Software systems can take advantage of NVMe to reach unprecedented levels of performance scalability and CPU efficiency and achieve extremely low latency accesses. NVMe over Fabrics, the version of the protocol for network fabrics, extends these capabilities across the network, enabling access to remote storage resources over the same interface with almost the same performance as local, direct-attached storage.

Our research leverages these new interfaces and technologies to build the next generation of Flash-based storage systems.

Our goal is to revisit the storage system architectures and redesign the storage functions in ways that afford composable systems out of disaggregated storage resources with low latency and high performance scalability.

We strive to separate the control path cleanly from the data path functions in order to give storage clients more control when accessing storage resources. The scope of our research extends beyond raw performance to achieve rich data services, cost-efficient storage techniques and workload-optimized data mobility.

SALSA: Unified SDS for low-cost SSDs and SMR disks

As data volumes continue to grow, cost-efficient storage devices are becoming increasingly important. Two prime examples of such devices are low-cost Flash SSDs and shingled magnetic recording (SMR) HDDs.

Low-cost commodity SSDs offer ample read performance with high IOPS and low latency. However, they suffer from poor performance under mixed read/write workloads and poor endurance.

SMR disks feature significant cost benefits over traditional HDDs. However, they require that specific write patterns be adopted, which introduces additional complexity and performance variation for general-purpose workloads.

SoftwAre Log-Structured Array (or SALSA for short) is a unified software stack optimized for low-cost SSDs and SMR HDDs. SALSA is uses software intelligence to mitigate the limitations of commodity devices.

By shifting the complexity from the hardware controller of the devices to software running on the host, SALSA not only reduces costs, but also takes advantage of the ample host resources to manage the device resources more effectively.

For Flash-based SSDs, SALSA elevates their performance and endurance to meet the requirements of modern data centers. For host-managed SMR HDDs, SALSA offers a conventional block interface and controls the data placement on the devices to improve their read and write performance.

SALSA provides redundancy, storage virtualization and data reduction, which allow the user to pool multiple devices and create storage volumes with improved performance, reliability and cost. Most importantly, SALSA exposes a standard block interface so that it can be used by file systems and applications with no modification.

NoSQL key: Value storage and caching

Cloud and mobile applications employ data models that are vastly different from traditional enterprise ones.

Storing and retrieving key/value (K/V) pairs has become one of the most pervasive data models because it affords simplicity, generality and scalability.

Our research focuses on technologies that enable fast, efficient and cost-effective NoSQL K/V storage on NVMe-attached storage media such as Flash-based and 3DXP-based SSDs.

We have developed uDepot (pronounced “micro-depot”), a K/V storage and caching engine that offers micro-second latency access to storage.

uDepot is an NVMe-optimized K/V store that has been built from the ground up to be lean, scalable and efficient.

To that end, uDepot implements a new I/O access paradigm that facilitates zero-copy data transfers, polling-based I/O request completion, user-space I/O that avoids system calls and context switches in the data path and minimizes the end-to-end I/O amplification both in terms of number of I/O operations, as well as in terms of bytes read and written.

uDepot can be used either as a K/V store that is embedded in the application and runs in the application context, or as a scale-out distributed K/V cache that can be accessed using the Memcache protocol.

Distributed shared storage for large-scale scientific computing

The Human Brain Project, a flagship project funded by the European Commission, aims at understanding the human brain through advanced simulation and multi-scale modelling.

Distributed shared storage (DSS) serves as a network-attached, shared data store for large-scale distributed human brain simulations. It brings distributed, network-attached NVMe storage close to the application.

Using remote direct memory access (RDMA) and direct user space storage access technologies, applications achieve high speed, low latency, byte-granular access to a unified shared storage pool.

EU Human Brain project


IBM Power Systems

In a distributed setup running on IBM Power® systems with direct-attached NVMe drives and connected via a 100-Gbit/s InfiniBand® fabric, the DSS prototype demonstrates a storage access throughput of tens of millions of I/O operations per seconds (IOPS) while sustaining a cumulative I/O bandwidth of tens of gigabytes per second (GB/s).

As it bypasses legacy block I/O layering of the operating system, DSS is able to deliver the low response times of NVMe-attached drives at the distributed application level.

Distributed shared storage will soon become available as open-source software.

Host-side Flash-based caching

In a typical enterprise IT environment, where servers store data in one or more SAN storage systems, caching technologies in the server are critical to achieve low-latency and high-throughput data access.

Our research activities include studying systems in which the servers utilize solid-state storage devices based on Flash and newer memory technologies for caching data from the SAN.

We are developing a novel caching framework that exploits synergies between servers and storage.

The system employs advanced caching algorithms to identify which data is hot on each server, i.e., is accessed often from applications running on that server, and which data is cold, i.e., is not frequently accessed. The hot data is stored in the local caches of the server so that it can be served to applications and users at a very low latency.

Our research focuses on high performance and high scalability in all aspects of the system, but also addresses such aspects as reliability and endurance.

Our caching technology has been integrated into the IBM DS8000® Easy Tier Server® and IBM AIX® 7.2 Flash Cache products, and our implementation for the Linux® platform has been released under the iostash open-source project.

flash

Software-defined cold storage

Cold-storage technologies are becoming critical for dealing with exploding data volumes, and tape storage is the most promising technology when it comes to storing vast amounts of data for the long-term with high reliability and low cost.

In the past few years, the Long-Term File System (LTFS) has introduced a standardized open format for data stored on tape, and open-source im­ple­men­ta­tions have enabled users to access tape using non-proprietary components.

Our research builds on LTFS and focuses on making tape-based storage as user-friendly as possible.

We are developing the data path and control path components that enable users to read data from and to tape in a completely transparent way. In other words, we aim to save the user the task of managing robotics, tape drives and tape cartridges.

We aim to make tape-based stor­age as user-friend­ly as pos­si­ble.

—Ioannis Koltsidas, IBM scientist


Our work, which forms the core of IBM Spectrum Archive®, enables enterprise file systems to use a tape backend as a bottomless cold tier in a completely transparent manner.

In addition, we focus on object-based cold storage, i.e., enabling OpenStack Swift to use tape storage transparently.

Our Swift High-Latency Middleware provides that capability in a way that can be extended to other types of high latency media as well.

Publications

[1] Ioannis Koltsidas, Slavisa Sarafijanovic, Martin Petermann, Nils Haustein, Harald Seipp, Robert Haas, Jens Jelitto, Thomas Weigold, Edwin R. Childers, David Pease, Evangelos Eleftheriou,
Seamlessly integrating disk and tape in a multi-tiered distributed file system,”
ICDE 2015: 1328-1339.

[2] Ilias Iliadis, Yusik Kim, Slavisa Sarafijanovic, Vinodh Venkatesan,
Performance Evaluation of a Tape Library System,”
MASCOTS 2016: 59-68.

[3] Nikolas Ioannou, Ioannis Koltsidas, Roman Pletka, Sasa Tomic, Radu Stoica, Thomas Weigold, Evangelos Eleftheriou,
SALSA: Treating the Weaknesses of Low-Cost Flash in Software,”
Non-Volatile Memories Workshop, 2015.

[4] Sangeetha Seshadri, Paul Muench, Lawrence Chiu, Ioannis Koltsidas, Nikolas Ioannou, Robert Haas, Yang Liu, Mei Mei, Stephen Blinick,
Software Defined Just-in-Time Caching in an Enterprise Storage System,”
IBM Journal of Research and Development 58(2/3), 2014.

[5] Ilias Iliadis, Jens Jelitto, Yusik Kim, Slavisa Sarafijanovic, Vinodh Venkatesan,
ExaPlan: Queueing-Based Data Placement and Provisioning for Large Tiered Storage Systems,”
MASCOTS 2015: 218-227.

[6] Hyojun Kim, Ioannis Koltsidas, Nikolas Ioannou, Sangeetha Seshadri, Paul Muench, Clement L. Dickey, Lawrence Chiu,
How Could a Flash Cache Degrade Database Performance Rather Than Improve It? Lessons to be Learnt from Multi-Tiered Storage,”
INFLOW 2014.

[7] Hyojun Kim, Ioannis Koltsidas, Nikolas Ioannou, Sangeetha Seshadri, Paul Muench, Clement L. Dickey, Lawrence Chiu,
Flash-Conscious Cache Population for Enterprise Database Workloads,”
ADMS@VLDB 2014: 45-56.

[8] Xiao-Yu Hu, Evangelos Eleftheriou, Robert Haas, Ilias Iliadis, Roman Pletka,
Write Amplification Analysis in Flash-Based Solid State Drives,”
SYSTOR 2009: 10.

[9] Roman A. Pletka, Sasa Tomic,
Health-Binning: Maximizing the Performance and the Endurance of Consumer-Level NAND Flash,”
SYSTOR 2016: 4:1-4:10.