Economical Two-fold Working Precision Matrix Multiplication on Consumer-Level CUDA GPUs | IEEE Conference Publication | IEEE Xplore