CTA-Aware Prefetching and Scheduling for GPU | IEEE Conference Publication | IEEE Xplore