Skip to Main Content
This paper introduces an asynchronous radix-4 Booth multiplier architecture, which is scalable to arbitrary operand lengths while maintaining a constant cycle time per Booth iteration. It has several novel features, including: (i) a novel counterflow organization, in which the data bits flow in one direction and the Booth commands piggyback on the acknowledgments flowing in the opposite direction; (ii) overlapped execution of multiple iterations of the Booth algorithm; and (iii) design modularity and bit-level pipelining, which enable the multiplier to be scaled to arbitrary operand widths without requiring gate resizing or cycle time overheads. Spice simulations in a 0.18 μm TSMC CMOS process at 1.8 V indicate promising performance: the multiplier takes 640-650 ps per Booth iteration, regardless of the operand widths, thereby demonstrating the scalability of our approach. For 16-bit operands, this performance corresponds to nearly 200 Mops/s throughput. Furthermore, the multiplier is fully functional at reduced supply voltages (e.g., 1.5 V and 1.0 V), and thus capable of dynamically trading off performance for energy efficiency.