Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU | IEEE Conference Publication | IEEE Xplore