Scalable and modular algorithms for floating-point matrix multiplication on FPGAs | IEEE Conference Publication | IEEE Xplore