Optimal loop scheduling for hiding memory latency based on two-level partitioning and prefetching | IEEE Journals & Magazine | IEEE Xplore