CudaDMA: Optimizing GPU memory bandwidth via warp specialization | IEEE Conference Publication | IEEE Xplore