By Topic

Aggressive compiler optimization and parallelization with thread-level speculation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Li-Ling Chen ; Intel Labs., Intel Corp., Santa Clara, CA ; Wu, Y.

We present a technique that exploits close collaboration between the compiler and the speculative multithreaded hardware to explore aggressive optimizations and parallelization for scalar programs. The compiler aggressively optimizes the frequently executed code in user programs by predicting an execution path or the values of long-latency instructions. Based on the predicted hot execution path, the compiler forms regions of greatly simplified data and control flow graphs and then performs aggressive optimizations on the formed regions. Thread level speculation (TLS) helps expose program parallelism and guarantees program correctness when the prediction is incorrect. With the collaboration of compilers and speculative multithreaded support, the program performance can be significantly improved. The preliminary results with simple trace regions demonstrate that the performance gain on dynamic compiler schedule cycles can be 33% for some benchmark and about 10%, on the average, for all the eight SpecInt95 benchmarks. For SpecInt2k, the performance gain is up to 23% with the conservative execution model. With a cycle accurate simulator with the conservative execution model, the overall performance gain by considering runtime factors (e.g., cache misses and branch misprediction) for vortex and m88ksim is 12% and 14.7%, respectively. The performance gain can be higher with more sophisticated region formation and region-based optimizations

Published in:

Parallel Processing, 2003. Proceedings. 2003 International Conference on

Date of Conference:

9-9 Oct. 2003