Optimized Deep Learning Architectures with Fast Matrix Operation Kernels on Parallel Platform | IEEE Conference Publication | IEEE Xplore