Skip to Main Content
We propose optimized pipelined implementations for Goldschmidt's division algorithm with IEEE rounding based on Booth radix-8 multiplication. Compared to other FP-division algorithms, our implementations require fewer clock cycles and admit shorter clock periods. The considered optimizations for the quotient approximation are based on a careful general analysis of tight error bounds for the implementation and are accompanied by the utilization of redundant representations, partial compressions, injection-based rounding, and rectangular multipliers for the internal computations. To efficiently achieve IEEE compliant rounding, we introduce the concept of dew-point rounding that allows efficient implementation and reduced requirements for the quotient approximation. On this basis, we propose the implementation of different versions of Goldschmidt's division algorithm with different pipeline depths. None of these implementations requires a full-sized multiplier at any stage of the computations. In this way we reduce latency, cost, and enable increased throughput at a reasonable cost. We suggest a full range of pipelining depths: On one extreme is a 3-stage pipeline with a restart time that simply equals the latency minus the number of pipeline stages. On the other extreme is a fully pipelined design.