High performance and memory efficient implementation of matrix multiplication on FPGAs | IEEE Conference Publication | IEEE Xplore