Skip to Main Content
Chip Multiprocessor (CMP) presents new opportunities to data prefetching. Prefetching thread is a well known approach to reduce memory latency and to improve performance, and has been explored in different applications. However, for applications with linked data structure(LDS), prefetching thread tends to achieve little overall performance gains. In this paper, we analyze the performance of conventional prefetching thread by an example and five selected benchmarks from Olden benchmark suite. The experimental results show that it gets best performance when computation/access latency ratio is close to 1. In addition, we propose a theorem with its proof and testify it by our experiment results.