Compiling SIMT Programs on Multi- and Many-Core Processors with Wide Vector Units: A Case Study with CUDA | IEEE Conference Publication | IEEE Xplore