Overview

Ask the experts

Increasingly large sto­rage sub­sys­tems must be en­hanced to pro­tect against er­rors.

—Haris Pozidis, IBM scientist

As digital data storage becomes increasingly vast and sensitive — particularly for businesses — new challenges are developing when it comes to ensuring reliable and secure data availability. Storage subsystems must be enhanced to protect against errors that can occur in increasingly large storage systems while sustaining the enormous throughput and low-latency offered by solid-state storage.

Future storage systems must scale to enable Big Data analytics and cognitive applications while still being cost-efficient by storing data according to its value. This requires flexible and easy-to-use, multi-tiered storage systems that incorporate technologies such as flash, hard-disk drives and magnetic tapes. Furthermore, caching technologies and flash memory management in virtualized and distributed storage environments must be advanced.

New storage-class memory technologies such as phase change memory (PCM) must be integrated into existing storage stacks.

All-flash arrays

Solid-state persistent memory such as flash has been introduced in the enterprise environment as it improves on several factors compared to disk, most notably IO performance and power efficiency.

In order to reduce the total cost, triple-level-cell (TLC) flash memory technology is typically employed today, as opposed to the conventional single-level cell (SLC) and multilevel-cell (MLC) technology, at the cost of much lower reliability and modest latency penalty.

The ever-increasing storage density of NAND flash memory devices requires significant advances in flash management and signal processing to address increasing endurance, retention, and integrity/reliability issues.

Furthermore, new non-volatile memory (NVM) technologies such as phase change memory (PCM) are expected to induce significant changes from the server and storage architectures up to the middleware and application design as they are introduced into the existing storage/memory hierarchy.

Our activities focus on the advanced use of solid-state NVMs in enterprise-class systems. We are designing and evaluating holistic approaches to sustained high-IO operation rates, low latency, as well as error detection and correction from the low-level data block up to the array level.

In addition, we are investigating the potential for synergies between the various layers (devices, controller, file systems, virtualized systems, and applications).

 

Advanced flash management

Our mission is to enable the latest and next generations of NAND flash technologies and new emerging non-volatile memory (NVM) technologies for enterprise storage systems through advanced, flash management schemes.

These schemes can be placed at appropriate locations inside a solid-state storage array, on top of existing consumer grade storage devices, or a combination thereof.

We construct intelligent flash management functions capable of taking advantage of the increasing spread of device characteristics on the page, block, and chip level, uneven wear out of flash blocks and cells, which can be workload-induced or driven by the garbage collection algorithms, thereby achieving optimal wear-leveling.

We focus on techniques that do not impact data path processing of host read and write operations and achieve lowest possible latency characteristics throughout the lifetime of the storage device.

We are further designing and evaluating data reduction schemes such as compression and deduplication to improve overall cost per gigabyte storage capacity and reduce write amplification.

These techniques lead to a significant increase in overall meta-data, for which we are investigating adequate management architectures combining current and next-generation volatile and NVM technologies.

We utilize findings from large-scale characterization of existing non-volatile memory devices combined with different approaches including modeling, simulations, and evaluation on real flash cards and SSDs.

Flash signal processing

In order for enterprises to guarantee high degrees of data integrity and availability, they must cope with the reliability degradation that comes with the continuous technology node shrinkage and the usage of MLC/TLC technology.

We are designing signal processing and coding algorithms and schemes to enhance the reliability of MLC/TLC NAND flash memory and thus to enable its employment in enterprise storage systems and servers. Our work includes advanced characterization and testing of flash memory chips to assess their raw performance and to extract and understand the various noise and distortion sources present in the writing and reading processes.

We are also developing comprehensive models of the NAND flash channel based on experimental data. These models are then used to guide the design of advanced signal processing schemes to mitigate the effects of such impairments as cell-to-cell interference, program and read disturb and distribution shifts due to cycling and/or data retention.

Error correction codes are integral modules of flash controllers in storage systems. Historically, BCH codes have been used to correct errors in flash chips. However, the error-correcting power of these codes has been increasing exponentially with every flash technology generation. The industry is quickly approaching a regime of diminishing performance gains in return for large increases in complexity and thus silicon area and cost.

In an effort to reverse this trend, alternative approaches to ECC design have recently been introduced in flash. These approaches are typically geared towards the use of soft information. However, the extraction of soft information from NAND flash chips requires multiple read operations and thus increases latency, which is at a premium for enterprise applications.

Our work is geared towards addressing all the tradeoffs involved in selecting proper coding schemes and verifying their correction performance, which are critical tasks for the controller design in flash-based storage systems.

Publications

[1] Roman A. Pletka, Sasa Tomic,
Health-Binning: Maximizing the Performance and the Endurance of Consumer-Level NAND Flash,”
SYSTOR 2016: 4:1-4:10.

[2] T. Parnell, C. Duenner, T. Mittelholzer, N. Papandreou,
Capacity of the MLC NAND Flash Channel,”
IEEE J. Selected Areas in Communications, Special Issue on Channel Modeling, Coding and Signal Processing for Novel Physical Memory Devices and Systems, 34(9) 2354-2365, 2016.

[3] N. Papandreou, T. Parnell, T. Mittelholzer, H. Pozidis, T. Griffin, G. Tressler, T. Fisher, C. Camp,
Effect of read disturb on incomplete blocks in MLC NAND flash arrays,”
in Proc. IEEE Int’l Memory Workshop (IMW), Paris, France, 2016.

[4] T. Mittelholzer, T. Parnell, N. Papandreou, H. Pozidis,
Improving the error-floor performance of binary half-product codes,”
in Proc. Int’l Symposium on Information Theory and its Applications (ISITA), Monterey, CA, 2016.

[5] T. Mittelholzer, T. Parnell, N. Papandreou, H. Pozidis,
Symmetry-based subproduct codes,”
in Proc. 2015 IEEE Int’l Symposium on Information Theory (ISIT), pp. 251-255, 2015.

[6] T. Parnell, C. Dunner, T. Mittelholzer, N. Papandreou, H. Pozidis,
Endurance limits of MLC NAND flash,”
in Proc. 2015 IEEE Int’l Conference on Communications (ICC), pp. 376-381, 2015.

[7] T. Parnell, N. Papandreou, T. Mittelholzer, H. Pozidis,
Performance of cell-to-cell interference mitigation in 1y-nm MLC flash memory,”
in Proc. 15th Non-Volatile Memory Technology Symposium (NVMTS) pp. 1-4, 2015.

[8] N. Papandreou, T. Parnell, H. Pozidis, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler, A. Walls,
Enhancing the Reliability of MLC NAND Flash Memory Systems by Read Channel Optimization,”
ACM Transactions on Design Automation of Electronic Systems (TODAES) 20(4), 62, 2015.

[9] T. Parnell,
“Flash Controller Design: Enabling Sub-20nm Technology and Beyond,”
in Proc. Int’l Memory Workshop “IMW” Taipei, Taiwan, 2014.

[10] N. Papandreou, T. Parnell, H. Pozid,, T. Mittelholzer, E. Eleftheriou, C. Camp, T. Griffin, G. Tressler, A. Walls,
Using Adaptive Read Voltage Thresholds to Enhance the Reliability of MLC NAND Flash Memory Systems,"
in Proc. 24th ACM Great Lakes Symp. on VLSI “GLSVLSI,” Houston, TX, 2014.

[11] I. Iliadis,
Rectifying pitfalls in the performance evaluation of flash solid-state drives,”
Performance Evaluation 79, 235-257, 2014.

[12] T. Parnell, N. Papandreou, T. Mittelholzer, H. Pozidis,
Modelling of the threshold voltage distributions of sub-20nm NAND flash memory,”
in Proc. IEEE Global Communications Conference (GLOBECOM), pp. 2351-2356, 2014.

[13] N. Papandreou, Th. Antonakopoulos, U. Egger, A. Palli, H. Pozidis, E. Eleftheriou,
A Versatile Platform for Characterization of Solid-State Memory Channels,”
in Proc. 2013 IEEE 18th Int’l Conf. on Digital Signal Processing “DSP 2013,” 2013.

[14] W. Bux, X.-Y. Hu, I. Iliadis, R. Haas,
Scheduling in Flash-Based Solid-State Drives — Performance Modeling and Optimization,”
in Proc. 20th Annual IEEE Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Washington, DC, pp. 459-468, 2012.

[15] P. Bonnet, L. Bouganim, I. Koltsidas, S.D. Viglas,
System Co-Design and Data Management for Flash Devices,”
in Proc. the VLDB Endowment 4(12), Proc. 37th Int’l Conf. on Very Large Data Bases “VLDB 2011,” Seattle, WA, pp. 1504-1505, 2011.

[16] X.-Y. Hu, R. Haas, E. Eleftheriou,
Container Marking: Combining Data Placement, Garbage Collection and Wear Levelling for Flash,”
in Proc. 2011 IEEE 19th Int’l Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems “MASCOTS 2011,” Singapore, pp. 237-247, 2011.

[17] I. Koltsidas, S.D. Viglas,
Data Management over Flash Memory,”
in Proc. 2011 Int’l Conf. on Management of Data “SIGMOD,” Athens, Greece, pp. 1209-1212, 2011.

[18] I. Koltsidas, S.D. Viglas,
Spatial Data Management over Flash Memory,”
in “Advances in Spatial and Temporal Databases,” Proc. 12th Int’l Symp. on Spatial and Temporal Databases “SSTD 2011,” Minneapolis, MN, Lecture Notes in Computer Science, vol. 6849 (Springer), pp. 449-453, 2011.

[19] I. Koltsidas, S.D. Viglas,
Designing a Flash-Aware Two-Level Cache,”
in “Advances in Databases and Information Systems,” Proc. Advances in Databases and Information Systems “ABDIS,” Vienna, Austria, Lecture Notes in Computer Science, vol. 6909 (Springer), pp. 153-169, 2011.

[20] W. Bux, I. Iliadis,
Performance of Greedy Garbage Collection in Flash-Based Solid-State Drives,”
Performance Evaluation 67(11) 1172-1186, 2010.

[21] X.-Y. Hu, E. Eleftheriou, R. Haas, I. Iliadis, R. Pletka,
Write Amplification Analysis in Flash-Based Solid State Drives,”
in Proc. The Israeli Experimental Systems Conference “SYSTOR 2009,” Haifa, Israel, Article 10, 2009.