Skip to Main Content
This paper presents a novel variable-latency multiplier architecture, suitable for implementation as a self-timed multiplier core or as a fully synchronous multicycle multiplier core. The architecture combines a second-order Booth algorithm with a split carry save array pipelined organization, incorporating multiple row skipping and completion-predicting carry-select dual adder. The paper reports the architecture and logic design, CMOS circuit design and performance evaluation. In 0.35 /spl mu/m CMOS, the expected sustainable cycle time for a 32-bit synchronous implementation is 2.25 ns. Instruction level simulations estimate 54% single-cycle and 46% two-cycle operations in SPEC95 execution. Using the same CMOS process, the 32-bit asynchronous implementation is expected to reach an average 1.76 ns throughput and 3.48 ns latency in SPEC95 execution.