Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication | IEEE Conference Publication | IEEE Xplore