Many-thread aware instruction-level parallelism: Architecting shader cores for GPU computing | IEEE Conference Publication | IEEE Xplore