Skip to Main Content
Field-programmable gate arrays (FPGAs) have become an attractive option for scientific applications. However, due to the pipelining in the FPGA-based floating-point units, data hazards may occur during reduction of series of values. A typical example of reduction is the accumulation of sets of floating-point values, which is needed in many scientific operations such as dot product and matrix-vector multiplication. Reduction circuits can significantly impact the overall performance, impose unrealistic buffer requirements, or occupy large area on the FPGA. In this paper, we introduce a high-performance and area-efficient FPGA-based reduction circuit. It can reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline. In contrast with previous works, the proposed circuit uses one floating-point adder, and can handle input sets of arbitrary size. The buffer size needed by the circuit is independent of the size of the individual sets and the number of input sets. Using a Xilinx Virtex-II Pro FPGA as the target device, we implement the proposed reduction circuit and present performance and area results.