Optimizing Hardware Accelerated General Matrix-Matrix Multiplication for CNNs on FPGAs | IEEE Journals & Magazine | IEEE Xplore