In the era of Big Data, caching is of paramount importance, not only as a means of improving the performance of data-intensive applications that process large volumes of data, but also because it enables new kinds of applications that require memory-speed access to data. In a typical enterprise IT environment where servers store data in one or more SAN storage systems, caching technologies in the server are critical to achieve low-latency and high-throughput data access. In our research activities we study systems in which the servers utilize solid-state storage devices based on flash and newer memory technologies for caching data from the SAN.
The focus of our activities is on a novel caching framework that exploits synergies between servers and storage. The system employs advanced caching algorithms to identify which data is hot on each server, i.e., is accessed often from applications running on that server, and which data is cold, i.e., is not frequently accessed. The hot data is stored in the local caches of the server so that it can be served to applications and users at a very low latency. Our research focuses on high performance and high scalability in all aspects of the system, but also addresses aspects like reliability and endurance.
Recently, the group has been working with IBM scientists from Almaden, California, and Tucson, Arizona, to integrate these technologies in the IBM systems stack and gauge its applicability in real systems and the performance it delivers to real-world applications. As recently unveiled at the EDGE 2012 conference in Orlando, Florida, integrating the caching technology into an extended version of EasyTier was demonstrated to deliver a response time improvement of more than 5× for certain workloads.