By Topic

Thread scheduling based on low-quality instruction prediction for simultaneous multithreaded processors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
H. Homayoun ; Dept. of Electr. & Comput. Eng., Victoria Univ., BC, Canada ; K. F. Li ; S. Rafatirad

A simultaneous multithreaded (SMT) processor is capable of executing instructions from multiple threads in the same cycle. SMT in fact was introduced as a complementary architecture to superscalar to increase the throughput of the processor. Recently, several computer manufacturers have introduced their first generation SMT architecture. SMT permits multiple threads to compete simultaneously for shared resources. An example is the race for the fetch unit which is a critical logic responsible for thread scheduling decisions. When more threads than hardware execution contexts are available, the decision of choosing the best threads to fetch instructions from, will affect the processor's efficiency. In this paper the authors presented a new approach to choose the most useful threads among all available threads while they compete on a shared resource. The quality of instructions was identified based on the time they spend in the instruction queue. Low-quality instructions spend more time in the instruction queue. Accordingly threads with fewer number of low-quality instructions have a higher contribution to the entire processor throughput. In an experimental study, such low-quality instructions in each thread was identified to a maximum of 92% accuracy (average 72%). The authors exploited this to increase the overall processor throughput by giving higher priority to threads with lesser number of low-quality instructions. Overall an average of 11% performance improvement was achieved over the traditional algorithm that schedules threads in a round-robin fashion.

Published in:

The 3rd International IEEE-NEWCAS Conference, 2005.

Date of Conference:

19-22 June 2005