Neither more nor less: Optimizing thread-level parallelism for GPGPUs | IEEE Conference Publication | IEEE Xplore