Matrix multiplication beyond auto-tuning: Rewrite-based GPU code generation | IEEE Conference Publication | IEEE Xplore