Autotuning batch Cholesky factorization in CUDA with interleaved layout of matrices | IEEE Conference Publication | IEEE Xplore