Cart (Loading....) | Create Account
Close category search window

Design and optimization of large size and low overhead off-chip caches

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Zhao Zhang ; Dept. of Electr. & Comput. Eng., Iowa State Univ., Ames, IA, USA ; Zhichun Zhu ; Xiaodong Zhang

Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited size due to the low density and high cost of SRAM and, thus, cannot hold the working sets of many memory-intensive applications. Second, since the tag checking overhead of large caches is nontrivial, the existence of L3 caches increases the cache miss penalty and may even harm the performance of some memory-intensive applications. To address these two issues, we present a new memory hierarchy design that uses cached DRAM to construct a large size and low overhead off-chip cache. The high density DRAM portion in the cached DRAM can hold large working sets, while the small SRAM portion exploits the spatial locality appearing in L2 miss streams to reduce the access latency. The L3 tag array is placed off-chip with the data array, minimizing the area overhead on the processor for L3 cache, while a small tag cache is placed on-chip, effectively removing the off-chip tag access overhead. A prediction technique accurately predicts the hit/miss status of an access to the cached DRAM, further reducing the access latency. Conducting execution-driven simulations for a 2 GHz 4-way issue processor and with 11 memory-intensive programs from the SPEC 2000 benchmark, we show that a system with a cached DRAM of 64 MB DRAM and 128 KB on-chip SRAM cache as the off-chip cache outperforms the same system with an 8 MB SRAM L3 off-chip cache by up to 78 percent measured by the total execution time. The average speedup of the system with the cached-DRAM off-chip cache is 25 percent over the system with the L3 SRAM cache.

Published in:

Computers, IEEE Transactions on  (Volume:53 ,  Issue: 7 )

Date of Publication:

July 2004

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.