Uncut-GEMMs: Communication-Aware Matrix Multiplication on Multi-GPU Nodes | IEEE Conference Publication | IEEE Xplore