Optimizing MPI communication within large multicore nodes with kernel assistance | IEEE Conference Publication | IEEE Xplore