By Topic

Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ahmed Abousamra ; University of Pittsburgh, Pittsburgh ; Alex K. Jones ; Rami Melhem

Reducing data access latency is vital to achieving performance improvements in computing. For chip multiprocessors (CMPs), data access latency depends on the organization of the memory hierarchy, the on-chip interconnect, and the running workload. Several network-on-chip (NoC) designs exploit communication locality to reduce communication latency by configuring special fast paths or circuits on which communication is faster than the rest of the NoC. However, communication patterns are directly affected by the cache organization and many cache organizations are designed in isolation of the underlying NoC or assume a simple NoC design, thus possibly missing optimization opportunities. In this work, we take a codesign approach of the NoC and cache organization. First, we propose a hybrid circuit/packet-switched NoC that exploits communication locality through periodic configuration of the most beneficial circuits. Second, we design a Unique Private (UP) caching scheme targeting the class of interconnects which exploit communication locality to improve communication latency. The Unique Private cache stores the data that are mostly accessed by each processor core in the core's locally accessible cache bank, while leveraging dedicated high-speed circuits in the interconnect to provide remote cores with fast access to shared data. Simulations of a suite of scientific and commercial workloads show that our proposed design achieves a speedup of 15.2 and 14 percent on a 16-core and a 64-core CMP, respectively, over the state-of-the-art NoC-Cache codesigned system that also exploits communication locality in multithreaded applications.

Published in:

IEEE Transactions on Parallel and Distributed Systems  (Volume:23 ,  Issue: 6 )