By Topic

Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
E. Speight ; Novel Syst. Archit. Group, IBM Res., Austin, TX, USA ; H. Shafi ; Lixin Zhang ; R. Rajamony

With the ability to place large numbers of transistors on a single silicon chip, manufacturers have begun developing chip multiprocessors (CMPs) containing multiple processor cores, varying amounts of level 1 and level 2 caching, and on-chip directory structures for level 3 caches and memory. The level 3 cache may be used as a victim cache for both modified and clean lines evicted from on-chip level 2 caches. Efficient area and performance management of this cache hierarchy is paramount given the projected increase in access latency to off-chip memory. This paper proposes simple architectural extensions and adaptive policies for managing the L2 and L3 cache hierarchy in a CMP system. In particular, we evaluate two mechanisms that improve cache effectiveness. First, we propose the use of a small history table to provide hints to the L2 caches as to which lines are resident in the L3 cache. We employ this table to eliminate some unnecessary clean write backs to the L3 cache, reducing pressure on the L3 cache and utilization of the on-chip bus. Second, we exam-ine the performance benefits of allowing write backs from L2 caches to be placed in neighboring, on-chip L2 caches rather than forcing them to be absorbed by the L3 cache. This not only reduces the capacity pressure on the L3 cache but also makes subsequent accesses faster since L2-W-L2 cache transfers have typically lower latencies than accesses to a large L3 cache array. We evaluate the performance improvement of these two designs, and their combined effect, on four commercial workloads and observe a reduction in the overall execution time of up to 13%.

Published in:

32nd International Symposium on Computer Architecture (ISCA'05)

Date of Conference:

4-8 June 2005