Skip to Main Content
Recently, several state of the art high end platforms have incorporated FPGAs for application acceleration. This talk explores optimizations for accelerating linear algebra computations on such systems. We develop algorithmic optimizations for such systems and demonstrate the suitability of FPGAs for floating point intensive computations. We discuss the design of a BLAS library for such systems and develop a highly optimized reduction circuit for such architectures. Using the reduction circuit, we demonstrate superior performance for sparse matrix computations. The performance of FPGAs is also compared against those of state-of-the-art embedded processors, general purpose processors, and DSPs for floating point intensive applications.