Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc* | IEEE Conference Publication | IEEE Xplore