Skip to Main Content
A 32-bit integer execution core containing a Han-Carlson arithmetic-logic unit (ALU), an 8-entry × 2 ALU instruction scheduler loop and a 32-entry × 32-bit register file is described. In a 130 nm six-metal, dual-VT CMOS technology, the 2.3 mm2 prototype contains 160 K transistors. Measurements demonstrate capability for 5-GHz single-cycle integer execution at 25°C. The single-ended, leakage-tolerant dynamic scheme used in the ALU and scheduler enables up to 9-wide ORs with 23% critical path speed improvement and 40% active leakage power reduction when compared to a conventional Kogge-Stone implementation. On-chip body-bias circuits provide additional performance improvement or leakage tolerance. Stack node preconditioning improves ALU performance by 10%. At 5 GHz, ALU power is 95 mW at 0.95 V and the register file consumes 172 mW at 1.37 V. The ALU performance is scalable to 6.5 GHz at 1.1 V and to 10 GHz at 1.7 V, 25°C.