Abstract:
We present techniques for accelerating the floating-point computation of x/y when y is known before x. The proposed algorithms are oriented toward architectures with avai...Show MoreMetadata
Abstract:
We present techniques for accelerating the floating-point computation of x/y when y is known before x. The proposed algorithms are oriented toward architectures with available fused-mac operations. The goal is to get exactly the same result as with usual division with rounding to nearest. It is known that the advanced computation of 1/y allows performing correctly rounded division in one multiplication plus two fused-macs. We show algorithms that reduce this latency to one multiplication and one fused-mac. This is achieved if a precision of at least n+1 bits is available, where n is the number of mantissa bits in the target format, or if y satisfies some properties that can be easily checked at compile-time. This requires a double-word approximation of 1/y (we also show how to get it). Compilers to accelerate some numerical programs without loss of accuracy can use these techniques.
Published in: IEEE Transactions on Computers ( Volume: 53, Issue: 8, August 2004)
DOI: 10.1109/TC.2004.37