Skip to Main Content
Simultaneous multithreading (SMT) increases processor throughput by multiplexing resources among several threads. Despite the commercial availability of SMT processors, several aspects of this resource sharing are not well understood. For example, academic SMT studies typically assume that resources are shared dynamically, while industrial designs tend to divide resources statically among threads. This study seeks to quantify the performance impact of resource partitioning policies in SMT machines, focusing on the execution portion of the pipeline. We find that for storage resources, such as the instruction queue and reorder buffer, statically allocating an equal portion to each thread provides good performance, in part by avoiding starvation. The enforced fairness provided by this partitioning obviates sophisticated fetch policies to a large extent. SMT's potential ability to allocate storage resources dynamically across threads does not appear to be of significant benefit. In contrast, static division of issue bandwidth has a negative impact on throughput. SMT's ability to multiplex bursty execution streams dynamically onto shared function units contributes to its overall throughput. Finally, we apply these insights to SMT support in clustered architectures. Assigning threads to separate clusters eliminates intercluster communication; however, in some circumstances, the resulting partitioning of issue bandwidth cancels out the performance benefit of eliminating communication.