By Topic

Approaching a machine-application bound in delivered performance on scientific code

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)

A performance bounding methodology that explains the performance of loop-dominated scientific applications on particular systems is presented. The throughput of key hardware units that are common bottlenecks in concurrent machines is modeled. A workload characterization is proposed, and upper bounds on the performance of specific machine-workload pairs are derived. Comparing delivered performance with bounds focuses attention on areas for improvement and indicates how much improvement might be attainable. A detailed analysis and performance improvement effort for the IBM RS/6000 produced an average lower bound of 1.27 clocks per floating-point operation (CPF), whereas machine peak performance is 0.5 CPF and the V2.01 Fortran compiler attains only 2.43 CPF. Code improvements in this study have achieved 1.36 CPF, increasing the harmonic mean steady-state inner loop performance to 97.6% of the MFLOPS bound. Subsequently, the V2.02 compiler achieved 1.75 CPF, and 1.60 with carefully chosen preprocessing

Published in:

Proceedings of the IEEE  (Volume:81 ,  Issue: 8 )