Cart (Loading....) | Create Account
Close category search window
 

Optimizing replication, communication, and capacity allocation in CMPs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Chishti, Z. ; Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA ; Powell, M.D. ; Vijaykumar, T.N.

Chip multiprocessors substantially increase capacity pressure on the on-chip memory hierarchy while requiring fast access. Neither private nor shared caches can provide both large capacity and fast access in CMPs. We observe that compared to symmetric multiprocessors (SMPs), CMPs change the latency capacity tradeoff in two significant ways. We propose three novel ideas to exploit the changes: (1) Through placing copies close to requestors allows fast access for read-only sharing, the copies also reduce the already-limited on-chip capacity in CMPs. We propose controlled replication to reduce capacity pressure by not making extra copies in some cases, and obtaining the data from an existing on-chip copy. This option is not suitable for SMPs because obtaining data from another processor is expensive and capacity is not limited to on-chip storage. (2) Unlike SMPs, CMPs allow fast on-chip communication between processors for read-write sharing. Instead of incurring slow access to read-write shared data through coherence misses as do SMPs, we propose in situ communication to provide fast access without making copies or incurring coherence misses. (3) Accessing neighbor's caches is not as expensive in CMPs as it is in SMPs. We propose capacity stealing in which private data that exceeds a core's capacity is placed in a neighboring cache with less capacity demand. To incorporate our ideas, we use a hybrid of private, per-processor tag arrays and a shared data array. Because the shared data is slow, we employ non-uniform access and distance associativity from previous proposals to hold frequently-accessed data in regions close to the requestor. We extend the previously-proposed non-uniform access with replacement and placement using distance associativity (NuRAPID) ro CMPs, and call our cache CMP-NuRAPID. Our result show that for a 4-core CMP with 8 MB cache, CMP-NuRAPID improves performance by 13% over a shared cache and 8% over private caches for three commercial multithreaded workloads.

Published in:

Computer Architecture, 2005. ISCA '05. Proceedings. 32nd International Symposium on

Date of Conference:

4-8 June 2005

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.