Tensile: Auto-Tuning GEMM GPU Assembly for All Problem Sizes | IEEE Conference Publication | IEEE Xplore