Optimizing MPI collectives on intel MIC through effective use of cache | IEEE Conference Publication | IEEE Xplore