Skip to Main Content
This paper describes implementation details of a hardware compression and decompression unit (CDU) for optimizing energy consumption in processor-based systems. Many algorithms for data compression (i.e., profile-driven, adaptive, differential) have previously been introduced. In all cases, data compression and decompression are performed on-the-fly on the cache-to-memory path: Uncompressed cache fines are compressed before they are written back to main memory, and decompressed when cache refills occur. This paper completes and extends these previous contributions by providing evidence on the feasibility of the proposed compression architectures by specifically addressing hardware implementation issues. CDU design is targeted towards energy minimization in the cache-bus-memory subsystem with a strict constraint on performance. As a result, average memory energy reductions evaluated on several benchmark programs are around 24%, at no performance penalty.