Skip to Main Content
Numerical linear algebra operations are key primitives in scientific computing. Performance optimizations of such operations have been extensively investigated. With the rapid advances in technology, hardware acceleration of linear algebra applications using FPGAs (field programmable gate arrays) has become feasible. In this paper, we propose FPGA-based designs for several basic linear algebra operations, including dot product, matrix-vector multiplication, matrix multiplication and matrix factorization. By identifying the parameters for each operation, we analyze the trade-offs and propose a high-performance design. In the implementations of the designs, the values of the parameters are determined according to the hardware constraints, such as the available chip area, the size of available memory, the memory bandwidth, and the number of I/O pins. The proposed designs are implemented on Xilinx Virtex-II Pro FPGAs. Experimental results show that our designs scale with the available hardware resources. Also, the performance of our designs compares favorably with that of general-purpose processor based designs. We also show that with faster floating-point units and larger devices, the performance of our designs increases accordingly.