The rapid growth of data with annual growth rates of 50% and more, together with the requirement to contain the cost of data storage at a manageable level, requires efficient data storage, including compression and data deduplication, but also the careful placement of data on the best suited storage medium. In order to find the optimal data placement on different storage tiers (e.g. SSD, disk, tape) as shown at right in terms of cost and performance, it is crucial to understand the access patterns to data. This knowledge can be utilized not only for data placement, but also for intelligent caching and pre-fetching of data, which was one of the research goals of the DOME project.

Magnetic tape is the storage medium that offers by far the lowest cost of ownership for long term storage. An analysis of the tape market showed that most of today’s tape drive and media characteristics (capacity, throughput, longevity, cost, etc.) satisfy customer requirements for use cases such as backup and archival storage. However, there is a strong demand for improvements in tape manageability and usability, for example, a non-proprietary, simple and cost-effective integration of tape storage in the tiered storage hierarchy or even into cloud storage systems. This seamless tape integrating can be done by utilizing the open Linear Tape File System (LTFS) format developed by IBM together with IBM’s General Parallel File System (GPFS).

Such integration enables tape systems to play an important role in active archives, in which data can be seamlessly migrated to the most appropriate storage tier (e.g. SSD, HDD, tape) and where the data is always online and accessible to the users from all storage tiers through a common file system that represents all the tiers in a single name space. Big Data analytics has become a significant driver for large storage capacity requirements and the demand for highly optimized and responsive storage systems. At the same time, data analytics can also be an enabler for an optimized data management and storage system.

Tiered storage

Tiered storage

In the field of tiered archival storage we address new storage requirements posed by the companies and organizations that base their operations and mission-critical businesses on the ability to store and process vast amounts of data efficiently and cost-effectively.

Storage requirements

  • Data needs to be easily available through a standard interface and via a single name space.
  • Data needs to be protected continuously and stored for a long time.
  • Storage costs and access requirements need to be optimized based on time-varying data usage or value.
  • System should scale to a very large number of files or data objects.

Scalable active archive is another term used for storage systems that satisfy these requirements. Our research on this topic focuses on integrating solid-state drive, disk, and tape tiers under a single name space, and providing additional management functions for moving the data between the tiers. To provide a single name space, reliability, scalability, and data management, we leverage IBM’s General Parallel File System (GPFS) technology and OpenStack Swift. To add a reliable and cheap storage tier, we integrate the open-standard Linear Tape File System (LTFS) technology.

single name space

Tiered storage combines different types of storage media, preferably under a single name space and using a standard interface, equipped with data lifecycle management functions for migrating data between different storage tiers.


Building a clustered file system on top of flash, disk and tape

GLUFS is the research project on integrating tapes formatted in accordance with the Linear Tape File System (LTFS) standard into the General Parallel File System (GPFS) as a tape storage tier for migration and backup. Some of the GLUFS features are marketed by IBM as Linear Tape File System Enterprise Edition (LTFS EE) — a distributed file system on top of flash, disk, and tape.

GPFS is IBM’s disk cluster file system, which is extremely scalable. Seamlessly integrating LTFS tapes into GPFS makes tape look like disk, makes it easy to use, and creates a common name space across disk and tape. Flexible migration policies allow administrators to optimize cost, access time, and power consumption by moving data between disk and tape. When it comes to big data, tape is the most efficient storage medium whenever applications can live with the resulting access latency. GLUFS makes tape easy to use and scales disk clusters to truly big active archives at low cost.

Key features

  • Global name space. Common global name space across disk and tape at GFPS level.
  • Simplified infrastructure and scalability. Metadata of migrated files is kept in GPFS so there is no need for external metadata servers. This allows the system to scale with the number of nodes, drives, and tapes.
  • Open format on portable media. Increased flexibility due to LTFS being an open standard.
  • Import/export and disaster recovery. Tapes can be exported from or imported into a GLUFS system. Tapes remain self-contained, including all meta-data from the original name space. The global name space can be recreated quickly from the meta-data on the tapes. In other words, the system becomes operational and the files become accessible without having to first move the data from the tapes.
  • Multi-node and multi-library support.Multiple GPFS nodes, multiple tape libraries, across multiple locations can be connected.
  • Flexibility. Cost/performance efficiency can be adjusted by disk/tape ratio and migration policies.
  • Simplified tape management. Makes tape management transparent to user by handling: cartridge pooling, reclamation, reconciliation, resource scheduling, replicas, fill policies etc.


GLUFS: Integration of disk and tape (GPFS and LTFS) within a distributed file system that provides a single name space and data lifecycle management (migration between disk and tape) functions.


A distributed object storage on top of disk and tape

The goal of the IceTier project is to build a standalone, archival object storage service based on tape storage media. One use case of a tape-based object storage is a standalone service that stores objects directly on tape with disk being optionally used for caching. In another use case, the “cold” objects from the highly available, disk based, primary object store are migrated to the low-cost tape-based object storage. In yet another use case, disk and tape are integrated within the same object storage service, allowing internal data lifecycle management between disk and tape.


Cognitive storage

Big advances in storage technologies and in storage management have enabled storage administrators to keep up with the exponential data growth over the decades, and to keep the ever increasing storage system complexity manageable so far. However, the question arises whether these advances in storage technologies will be sufficient to accommodate the future data growth and to effectively handle the ever increasing system complexity.

Another question is whether the future storage capacity growth will fall behind data growth rates, meaning that the standard model of storing all data forever will no longer be sustainable due to a shortage of available storage resources. In other words, we must decide whether we even need or want to store all the data that is being generated today and will be in the future.

What we do under the umbrella of cognitive storage is to make an attempt to understand the value and relevance of the data and, based on this data-specific knowledge, determine where, with how much redundancy, and for how long to store the data. Depending on the distribution and evolution of the relevance of the data, there appears to be a large potential for significant storage capacity savings.

Many challenges have to be addressed on the way to a fully-fledged cognitive storage system, most prominently whether and how the data value can actually be defined in a systematic way, whether and how it can be obtained automatically, how it will change over time, whether a future storage system should be able to extract relevant information from data autonomously and store it in a modified compact form, and many more.

Storage reliability

Modern data storage systems are extremely large and consist of several tens or hundreds of storage nodes. In such systems, node failures are daily events, and safeguarding data from them poses a serious design challenge. Data redundancy, in the form of replication or advanced erasure codes, is used to protect data from node failures. By storing redundant data across several nodes, the redundant data on surviving nodes can be used to rebuild the data lost by the failed nodes. As these rebuild processes take time to complete, there exists a chance of additional node failures occurring during rebuild. This eventually may lead to a situation in which some of the data becomes irrecoverably lost from the system.

Our activities in storage system reliability investigate novel methods to address issues encountered in large size storage installations. In particular, we focus on the occurrence of data loss and on methods to improve reliability without sacrificing performance and storage efficiency. We have shown that spreading the redundant data corresponding to the data on each node across a higher number of other nodes, and using a distributed and intelligent rebuild process will improve the system’s mean time to data loss (MTTDL) and the expected annual fraction of data loss (EAFDL).

In particular, declustered placement, which corresponds to spreading the redundant data corresponding to each node equally across all other nodes of the system, is found to have potentially significantly higher MTTDL and lower EAFDL values than other placement schemes, especially for large storage systems. We have also developed enhanced recovery schemes for geo-replicated cloud storage systems where network bandwidth between sites is typically more scarce than bandwidth within a site, and can potentially be a bottleneck for recovery operations.

distributed rebuild model

Example of the distributed rebuild model for a two-way replicated system. When one node fails, the critical data blocks are equally spread across the n−1 surviving nodes. The distributed rebuild process creates replicas of these critical blocks by copying them from one surviving node to another in parallel.