Skip to Main Content
Performance and power act as opposing constraints for the optimal pipeline depth of a processor. Although increasing the pipeline depth may enable performance improvement, the higher clock speed associated with a deeper pipeline also increases the power dissipation. Previous papers have shown that the optimal pipeline depth for superscalars considering both power and performance is 18 to 20 fan-out-of-four (FO4) inverter delays. As simultaneous multithreading (SMT) becomes increasingly important for modern high-end processors, there is a need to quantify the optimal power-performance pipeline depth for SMT. Although previous work has shown that SMT retains the performance-optimal pipeline depth in near-future technologies, this result does not take power into account. The intricate interplay between the relative impacts of changing pipeline depth on power and performance makes it difficult to predict the scaling trends for optimal SMT pipeline depths considering both power and performance. Using simulations, we quantify the optimal SMT pipeline depths based on the well-known power-performance metric PD3. Our analysis is novel and provides the following key results about the scaling trends for SMT pipelines considering both power and performance: 1) SMT has a deeper PD3-optimal pipeline as compared to superscalar. 2) The PD3-optimal SMT pipeline depth increases with an increase in the number of programs. 3) The PD3-optimal SMT pipeline becomes shallower with technology for a given number of programs. Based on these results, we provide the following insights into SMT designs for future technologies: 1) To retain the PD3-optimal pipeline depth across technology generations while being energy-efficient, the number of programs running on an SMT must increase. 2) To maintain a constant power dissipation across technology generations, SMT pipelines must become shallower.