By Topic

Network Victim Cache: Leveraging Network-on-Chip for Managing Shared Caches in Chip Multiprocessors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Jinglei Wang ; Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China ; Yibo Xue ; Haixia Wang ; Dongsheng Wang

The large working sets of commercial and scientific workloads favor a shared L2 cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests in chip multiprocessors (CMP). There are two important hurdles that restrict the scalability of these chip multiprocessors: the on-chip memory cost of directory and the long L1 miss latencies. This work presents a network victim cache architecture aimed at facing these two important problems. Our proposal takes advantage of on-chip networks to manage shared caches in chip multiprocessors. The network victim cache architecture removes the directory structure from shared L2 caches and stores directory information for the blocks recently cached by L1 caches in the network interface components decreasing on-chip directory memory overhead and improves the scalability. The saved memory space is used as victim caches which are embedded into the network interface components to reduce L1 miss latencies further. The proposed architecture is evaluated based on simulations of a 16-core tiled CMP. Results demonstrate that the network victim cache architecture provides better scalability and improves performance by 23% on average against over the traditional CMP with shared L2 cache design, and up to 34% at best.

Published in:

2009 Fourth International Conference on Embedded and Multimedia Computing

Date of Conference:

10-12 Dec. 2009