Skip to Main Content
Modern computer architectures that support variable length instruction set architectures (ISA), such as the Intel's IA-32, distinguish between the architectural level of presentation and the micro-architectural representations of the instructions. At the micro-architectural level, instructions are represented by fixed-length micro-operations termed uops, and complex instructions are broken into sequence of uops. The fetch and decode operations in such architectures are extremely complicated and power hungry, especially if they aim to handle several variable length instructions per cycle. This paper suggests caching uop sequences from decoded instructions in a special structure, termed uop cache (UC), and use this fix-length decoded format when possible. Doing so enables reduction in the processor's power and energy consumption while not compromising performance. We will show that a moderately-sized UC can eliminate about 75% instruction decodes across a broad range of benchmarks and over 90% in multimedia applications and high-power tests. For existing Intel P6 family processors, the eliminated work may save about 10% of the full-chip power consumption. While the new proposed technique can be used to save power without degrading performance, we can also use it to improve processor performance when power is constrained.