Enabling coordinated register allocation and thread-level parallelism optimization for GPUs | IEEE Conference Publication | IEEE Xplore