By Topic

The impact of instruction-level parallelism on multiprocessor performance and simulation methodology

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
V. S. Pai ; Dept. of Electr. & Comput. Eng., Rice Univ., Houston, TX, USA ; P. Ranganathan ; S. V. Adve

Current microprocessors exploit high levels of instruction-level parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. This paper presents the first detailed analysis of the impact of such processors on shared-memory multiprocessors using a detailed execution-driven simulator. Using this analysis, we also examine the validity of common direct-execution simulation techniques that employ previous-generation processor models to approximate ILP-based multiprocessors. We find that ILP techniques substantially reduce CPU time in multiprocessors, but are less effective in reducing memory stall time. Consequently, despite the presence of inherent latency-tolerating techniques in ILP processors, memory stall time becomes a larger component of execution time and parallel efficiencies are generally poorer in ILP-based multiprocessors than in previous-generation multiprocessors. Examining the validity of direct-execution simulators with previous-generation processor models, we find that, with appropriate approximations, such simulators can reasonably characterize the behavior of applications with poor overlap of read misses. However, they can be highly inaccurate for applications with high overlap of read misses. For our applications, the errors in execution time with these simulators range from 26% to 192% for the most commonly used model, and from -8% to 73% for the most accurate model

Published in:

High-Performance Computer Architecture, 1997., Third International Symposium on

Date of Conference:

1-5 Feb 1997