Skip to main content


Storage systems

Advanced RAID

Today’s storage systems prevent data loss through the use of RAID technology, which has been designed to handle disk drive failures but can also deal with some uncorrectable media errors. The continued growth in storage density and disk capacity is not accompanied by commensurate improvements in bit error rates. As a result, disk failures are more frequent, and rebuild procedures must read vast amounts of data: The risk of hitting a hard error is therefore no longer negligible.

Our activities in advanced RAID technologies investigate novel methods to address issues encountered in large size storage installations. In particular, we focus on the occurrence of such data losses and methods to improve the reliability (measured in terms of mean time to data loss, MTTDL) without sacrificing performance and storage efficiency.

We have proposed a novel protection mechanism that is complementary to the existing RAID schemes and further improves the MTTDL by 2-3 orders of magnitude. This mechanism is known as Sector Protection via Intra-Disk Redundancy (or SPIDRe). We compared SPIDRe with existing methods such as disk scrubbing and have developed an in-depth understanding of the benefits and tradeoffs.

Images

Disk failure

click to enlarge Figure 1. Disk failure in an array, followed by the rebuild process, during which the occurrence of a hard error in the first disk leads to data loss.



Comparison of RAID schemes

click to enlarge Figure 2. Comparison of RAID schemes with (red circle) and without the SPIDRe extension.