A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations | IEEE Journals & Magazine | IEEE Xplore