Skip to Main Content
Much of the complexity in today's superscalar microprocessors stems from the need to maintain the speculatively produced results within the on-chip storage components until these results can be safely discarded without endangering the reconstruction of the precise state or impeding the recovery from possible branch misspeculations. For this, modern designs use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are delivered to their consumers via the bypass network (consumed ldquoon-the-flyrdquo) and are never read out from the destination registers. In this paper, we first formulate conditions for identifying such transient values and describe their microarchitectural implementation; then we propose a technique to avoid the writeback of such transient values into the RF. With 64-entry integer and floating point register files, our technique achieves an 11% performance improvement and 29% reduction in the RF energy consumption compared to the baseline machine with the same number of registers. Furthermore, for the same performance target, the selective writeback scheme results in a 38% reduction in the energy consumption of the RF compared to the baseline machine.