Loading [MathJax]/extensions/MathEvents.js
Accurate Parallel Floating-Point Accumulation | IEEE Journals & Magazine | IEEE Xplore

Accurate Parallel Floating-Point Accumulation

Open Access

Abstract:

Using parallel associative reduction, iterative refinement, and conservative early termination detection, we show how to use tree-reduce parallelism to compute correctly ...Show More

Abstract:

Using parallel associative reduction, iterative refinement, and conservative early termination detection, we show how to use tree-reduce parallelism to compute correctly rounded floating-point sums in O(\log N) depth. Our parallel solution shows how we can continue to exploit the scaling in transistor count to accelerate floating-point performance even when clock rates remain flat. Empirical evidence suggests our iterative algorithm only requires two tree-reduce passes to converge to the accurate sum in virtually all cases. Furthermore, we develop the hardware implementation of two residue-preserving IEEE-754 double-precision floating-point adders on a Virtex 6 FPGA that run at the same 250 MHz pipeline speed as a standard adder. One adder creates the residue by truncation, requires only 22 percent more area than the standard adder, and allows us to support directed-rounding modes and to lower the cost of round-to-nearest modes. The second adder creates the residue while directly producing a round-to-nearest sum at 48 percent more area than a standard adder.
Published in: IEEE Transactions on Computers ( Volume: 65, Issue: 11, 01 November 2016)
Page(s): 3224 - 3238
Date of Publication: 25 February 2016

ISSN Information:

Funding Agency:


References

References is not available for this document.