Skip to Main Content
Process variability from a range of sources is growing as technology is scaled below 65 nm, increasing variations of transistor delay and leakage current both within a die and across dies. This, in turn, negatively impacts maximum operating frequency and total power consumption of processors. Meanwhile, manufacturers have integrated more cores in a single die to improve the throughput of processors running highly-parallel workloads. However, many existing workloads do not have high enough parallelism to exploit multiple cores in a processor. First, in this paper, we maximize the throughput of power- and thermal-constrained multicore processors using per-core power gating and dynamic voltage/frequency scaling. When we do not have enough parallelism to effectively use all cores, we turn off some cores using per-core power gates that are already available in commercial multicore processors. This provides extra power and thermal headroom, and allows active cores to run faster through voltage/frequency scaling within power, thermal, and voltage scaling limits. Our analysis using a 32 nm predictive technology model demonstrates that jointly optimizing the number of active cores and maximum operating frequency can improve the throughput of a 16-core processor running workloads with limited parallelism by up to 14%. Second, we extend our throughput analysis and optimization to consider the impact of within-die spatial process variations that lead to considerable core-to-core frequency and leakage power variations in multicore processors. Our analysis shows that exploiting core-to-core frequency variations can improve the throughput of a 16-core processor by up to 57%.