Design of a Coarse-Grained Processing Element for Matrix Multiplication on FPGA | IEEE Conference Publication | IEEE Xplore