Loading [MathJax]/extensions/MathMenu.js
CDCache: Space-Efficient Flash Caching via Compression-before-Deduplication | IEEE Conference Publication | IEEE Xplore

CDCache: Space-Efficient Flash Caching via Compression-before-Deduplication


Abstract:

Large-scale storage systems boost I/O performance via flash caching, but the underlying storage medium of flash caching incurs significant financial costs and also exhibi...Show More

Abstract:

Large-scale storage systems boost I/O performance via flash caching, but the underlying storage medium of flash caching incurs significant financial costs and also exhibits low endurance. Previous studies adopt compression-after-deduplication to mitigate writing redundant contents into the flash cache, so as to address the cost and endurance issues. However, deduplication and compression have conflicting preferable cases, and compression-after-deduplication essentially compromises the space-saving benefits of either deduplication or compression. To simultaneously preserve the benefits of both approaches, we explore compression-before-deduplication, which applies compression to eliminate byte-level redundancies across data blocks, followed by deduplication to write only a single copy of duplicate compressed blocks into the flash cache. We present CDCache, a space-efficient flash caching system that realizes compression-before-deduplication. It proposes to dynamically adjust the compression range of data blocks, so as to preserve the effectiveness of deduplication on the compressed blocks. Also, it builds on various design techniques to approximately estimate duplicate data blocks and efficiently manage compressed blocks. Trace-driven experiments show that CDCache improves the read hit ratio and the write reduction ratio of a previous compression-after-deduplication approach by up to 1.3× and 1.6×, respectively, while it only has small memory overhead for index management.
Date of Conference: 20-23 May 2024
Date Added to IEEE Xplore: 12 August 2024
ISBN Information:

ISSN Information:

Conference Location: Vancouver, BC, Canada

Funding Agency:


I. Introduction

Large-scale storage systems [1], [7], [23], [27], [33] apply flash caching to improve the performance of hard disk drives (HDDs). They propose to use solid-state drives (SSDs) atop of HDDs to serve as buffers for frequently accessed contents, thereby mitigating the performance overhead of direct access to HDDs. However, SSDs have two fundamental limitations that impede their applications in flash caching. First, compared to HDDs, SSDs incur significantly higher cost-per-GiB, and this substantial cost disparity remains prevalent today. For example, the I/O performance of Crucial Pro T700 (a top-selling SSD device) [2] achieves 12,400MiB/s, about 70.8× that of WD Blue WD40EZRZ (a top-selling HDD device), but its cost-per-bit is also 7.5× that of WD Blue WD40EZRZ [4] (based on the available pricing plans in the respective official websites [2], [4] in July 2023). In addition, SSDs exhibit limited endurance and are susceptible to wear-out issues [11]. Specifically, owing to the underlying NAND flash technology in SSDs, each memory cell endures only a finite number of write cycles before experiencing degradation. This degradation manifests in various forms, such as reduced performance, increased error rates, or even complete failure of the drive.

Contact IEEE to Subscribe

References

References is not available for this document.