By Topic

Exploiting Operand Availability for Efficient Simultaneous Multithreading

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Sharkey, J. ; Dept. of Comput. Sci., State Univ. of New York, Binghamton, NY ; Ponomarev, D.V.

We propose several schemes to improve the scalability, reduce the complexity and delays, and increase the throughput of dynamic scheduling in SMT processors. Our first design is an adaptation of the proposed instruction packing to SMT. Instruction packing opportunistically packs two instructions (possibly from different threads), each with at most one nonready source operand at the time of dispatch, into the same issue queue entry. Our second design, termed 2OP_BLOCK, takes these ideas one step further and completely avoids the dispatching of the instructions with two nonready source operands. This technique has several advantages. First, it reduces the scheduling complexity (and the associated delays) as the logic needed to support the instructions with two nonready source operands is eliminated. More surprisingly, 2OP_BLOCK simultaneously improves the performance as the same issue queue entry may be reallocated multiple times to the instructions with at most one nonready source (which usually spends fewer cycles in the queue) as opposed to hogging the entry with an instruction which enters the queue with two nonready sources. For, schedulers with the capacity to hold 64 instructions on a 4-way SMT, the 2OP_BLOCK design outperforms the traditional queue by 14 percent, on average, and at the same time results in a 10 percent reduction in the overall scheduling delay. We also present mechanisms to support speculative scheduling with 2OP_BLOCK and introduce the hybrid scheme that dynamically switches between 2OP_BLOCK and instruction packing modes depending on the workload characteristics, to achieve further performance gains

Published in:

Computers, IEEE Transactions on  (Volume:56 ,  Issue: 2 )