Enterprise solid-state storage


Phase change materials exhibit two meta-stable states, namely, a (poly)-crystalline and an amorphous phase of high and low electrical conductivity, respectively. Switching to the amorphous phase (the RESET transition) is typically achieved in less than 50 ns, but requires relatively high current, whereas the transition to the crystalline phase (SET) is slower, on the order of 100-200 ns.

Phase change memory (PCM) scores well in terms of most of the desirable attributes of a universal memory technology.

PCM chip In particular, it exhibits very good endurance on the order of 1 million cycles or more, moderate retention, and superb scalability to sub-20-nm nodes and beyond. In addition, it is amenable to multilevel-cell storage, thanks to the large resistivity contrast between its SET and RESET states. However, a number of technological challenges need to be addressed for PCM to become universal memory.

Apart from the necessary RESET current reduction and SET speed improvement mentioned above, a significant challenge of PCM technology is a phenomenon known as (short-term) resistance drift: The resistance of a cell is observed to drift upwards in time, with the amorphous state drifting more than its crystalline counterpart.

Drift seriously affects the reliability of multilevel-cell (MLC) storage in PCM because of the reduced sensing margin between adjacent tightly-packed resistance levels. Therefore, effective solutions of the drift issue are a key factor for the cost competitiveness of PCM technology [2010-2].

At IBM Research in Zurich we are working on various aspects of PCM technology, including PCM materials and memory cell modelling, with a focus on the enablement of MLC storage [2011-5], as well as device architectures and system-level integration.

In particular, we conduct fundamental research on phase change materials to understand their properties and to guide the design of new materials with improved characteristics. We also apply finite element model simulations to study the impact of electrical transport and other material characteristics on memory cells [2013-2, 2013-5, 2013-6, 2014-1, 2014-3, 2014-4].

Furthermore, we engage in experimental characterization of PCM cells in various configurations, from single cells to large (multi-Mbit) cell arrays. Advanced characterization processes provide an abundance of data which serves as input for statistical modeling, and for the definition of effective algorithms that target memory reliability enhancement [2011-1, 2012-1].

We are conducting research into advanced signal processing and coding schemes to improve reliability by means of enabling higher storage capacity, longer data retention and higher endurance [2011-20].

Moreover, we are designing and implementing novel circuitry for PCM chips in order to program and extract the memory cell information reliably, with low latency and efficiently, in terms of implementation area [2011-8, 2011-6, 2011-4, 2011-3, 2011-2, 2014-2, 2013-3].

Our research in memory reliability enhancement has led to the successful demonstration of reliable 2 bits/cell storage and long data retention in large arrays of PCM cells after they have been cycled 1 million times [2013-1, 2015-1].

Furthermore, recently we have experimentally demonstrated successful storage and retention of 3 bits/cell data on PCM cell arrays that have been pre-cycled 1 million times and have also undergone environmental stress, with long exposure to high temperatures up to 80ºC [2016-1, 2016-3].

This is the first time such levels of reliability have been reached with Multi-bit PCM cell arrays, proving the viability of PCM technology for demanding enterprise (hybrid) memory applications.

In June 2012, IBM and SK hynix signed a joint development agreement to develop MLC PCM technology and to produce competitive PCM memory chips. This deal leverages IBM’s expertise and leadership in MLC PCM technology on the one hand, and SK hynix’s superior semiconductor manufacturing on the other, in order to introduce MLC PCM technology in future computing systems.

At the system level, we have teamed up with the University of Patras, Greece, to develop a PCM-based storage subsystem, which is connected to the host over the PCI-e bus. In our research prototype, the PCM chips are connected to custom-designed PCM channel controllers and attached to a mezzanine card, which in turn is attached to an FPGA board. [2014-5]

Recently, again in collaboration with the University of Patras, we have developed a second generation PCM-based storage subsystem. In this second version, a PCM DIMM (based on legacy PCM chips) is connected to a POWER8(R) processor through the PCI-e bus, and data exchange is performed over the Coherent Accelerator Processor Interface (CAPI). This technology leverages the low latency and small access granularity of PCM, the efficiency of the OpenPOWER architecture and the CAPI protocol.

At a system level, we achieved an average latency of 8.6 μsec for random 128 B reads and 2.9 μsec for random 128 B writes. Most importantly, the latency is very predictable and consistent: 99.9% of the read and write requests completed within 13.8 μsec and 4.1 μsec, respectively.

Furthermore, we have also developed an emulator for a similar storage system based on state-of-the-art PCM chip specifications. In the absence of such chips in the market, we used conventional DRAM DIMMs to emulate the performance of such a system. We have developed an FPGA-based DDR3-like PCM controller using a read and write latency specification of 500 nsec and 2 μsec, respectively. Connecting this DIMM to a Tyan(R) Palmetto server over the CAPI protocol, we demonstrated very low average latency of 3.6 μsec for 128 B reads and 2.9 μsec for 128 B writes. Moreover, the latency remains below 4.7 μsec and 4.1 μsec for 99.9% of the read and write requests, respectively.

These results have been presented recently at the 2nd OpenPOWER Summit in San Jose, CA.

The goal of this project is eventually to integrate PCM at a cluster and data center level using low-latency networking and appropriate support from system software, thereby enabling new use cases for data-intensive applications.

We envision stand-alone as well as hybrid applications, which combine PCM and flash storage together, with PCM as an extremely fast cache. In the enterprise space, entire databases could be stored in PCM for blazing fast query processing for time-critical online applications, such as financial transactions.