Skip to Main Content
FPGA technology constitutes an attractive platform for high-performance accelerators of parallel workloads in general-purpose computers. Matrix multiplication is a computationally intensive application that is highly parallelizable. Previous work has typically described custom floating-point components and reported on specific designs or implementations using these components for FPGA-based matrix multiplication. We seek to utilize vendor-supplied or other available floating-point components to explore the system-architecture design space for flexible, high-performance, FPGA-based accelerators. In this paper, we focus on the design of control logic that accommodates the configuration of as many implementation aspects as possible (e.g., scheduling of operations, levels of parallelism, and choice of arithmetic operators) for inclusion in an experimental infrastructure to assess the effects of these parameters on overall system performance.