Data prefetching is a well know approach to reduce memory latency and to improve performance, and has been explored in different applications. Chip Multiprocessor (CMP) now presents new opportunities to data prefetching. However, for pointer-chasing applications with irregular memory access patterns, the prefetching tends to achieve little overall performance gains. In this paper, we compare and analyze the performance of conventional prefetching thread and prefetch instruction by an example and six selected benchmarks from Olden benchmark suite. The experimental results show that prefetch instruction achieves better performance in most cases. In addition, it is observed that the prefetching thread can eliminate more L2 read misses than prefetch instruction on general.
Published in:
Information Science and Engineering (ICISE), 2009 1st International Conference on
Date of Conference: 26-28 Dec. 2009