Algorithmic strategies for optimizing the parallel reduction primitive in CUDA | IEEE Conference Publication | IEEE Xplore