Cart (Loading....) | Create Account
Close category search window
 

A criticality analysis of clustering in superscalar processors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Salverda, P. ; Dept. of Comput. Sci., Illinois Univ., Urbana, IL ; Zilles, C.

Clustered machines partition hardware resources to circumvent the cycle time penalties incurred by large, monolithic structures. This partitioning introduces a long inter-cluster forwarding latency and the potential for load imbalance, both of which degrade IPC and thus counter the cycle time benefits of clustering. We show that program dataflow can be mapped to clustered machines so as to achieve an IPC rivaling that of an equivalent monolithic machine. That is, the IPC penalties observed by extant schemes are largely an artifact of instruction steering and scheduling policies. Using critical path analysis, we investigate and uncover the main causes for this performance loss. By way of code samples, we illustrate those causes and propose three policies for mitigating them. First, we introduce a new metric, likelihood of criticality, and show how it can halve the performance lost to contention-induced stalls. Second, we develop a stall-over-steer policy that addresses performance lost to inter-cluster forwarding delay. Finally, we show that a proactive load-balancing policy is necessary to improve the distribution of ready instructions among the clusters. Together, these three policies yield performance on 2-, 4- and 8-cluster implementations of an 8-wide machine that is within 2, 4, and 6%, respectively, of the monolithic equivalent

Published in:

Microarchitecture, 2005. MICRO-38. Proceedings. 38th Annual IEEE/ACM International Symposium on

Date of Conference:

16-16 Nov. 2005

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.