By Topic

Data sharing in multi-threaded applications and its impact on chip design

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Krishna, A. ; Syst. & Technol. Group, Int. Bus. Machines, Inc., USA ; Samih, A. ; Solihin, D.

Analytical modeling is becoming an increasingly important technique used in the design of chip multiprocessors. Most such models assume multi-programmed workload mixes and either ignore or oversimplify the behavior of multi-threaded applications. In particular, data sharing observed in multi-threaded applications, and its impact on chip design decisions, has not been well characterized in prior analytical modeling work. In this work we describe why data sharing behavior is hard to capture in an analytical model, and study why, and by how much, past attempts have fallen short. We propose a new methodology to measure the impact of data sharing, which quantifies the reduction in on-chip cache miss rates attributable solely to the presence of data sharing. We then extend an existing analytical performance model for a many-core chip by incorporating into it the impact of data sharing in contemporary multi-threaded workloads. We use this analytical model to explore the chip design space for a hypothetical many-core chip of the future. We find that the optimal design point is substantially different when the impact of data sharing is modeled compared to when it is not. Data sharing can enable reassigning a significant fraction of the total chip area (up to 16%, per our model of a future many-core) from cache resources to core resources, which, in turn, improves the overall chip throughput (by up to 58%).

Published in:

Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on

Date of Conference:

1-3 April 2012