Skip to Main Content
We consider 64-bit floating-point matrix multiplication in the context of polymorphic processor architectures. Our proposal provides a complete and performance efficient solution of the matrix multiplication problem, including hardware design and software interface. We adopt previous ideas1, originally proposed for loosely coupled processors and message passing communications. We employ these ideas into a tightly coupled custom computing unit (CCU) in the Molen polymorphic processor. Furthermore, we introduce a controller, which facilitates the efficient operation of the multiplier processing elements (PEs) in a polymorphic environment. The design is evaluated theoretically and through real hardware experiments. More precisely, we fit 9 processing elements in an XC2VP30-6 device; this configuration suggests theoretical peak performance of 1.80 GFLOPS. In practice, we measured sustained performance of up to 1.79 GFLOPS for the matrix multiplication on real hardware, including the software overhead. Theoretical analysis and experimental results suggest that the design efficiency scales better for large problem sizes.