By Topic

Bundling: reducing the overhead of multiprocessor prefetchers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
D. Wallin ; Dept. of Inf. Technol., Uppsala Univ., Sweden ; E. Hagersten

Summary form only given. Prefetching has proven to be a useful technique for reducing cache misses in multiprocessors at the cost of increased coherence traffic. This is especially trouble some for snoop-based systems, where the available coherence bandwidth often is the scalability bottleneck. The bundling technique reduces the overhead caused by prefetching in two ways: piggybacking prefetches with normal requests, and requiring only one device to perform the snoop lookup for each prefetch transaction. This can reduce both the address bandwidth and the number of snoop lookups compared with a nonprefetching system. We describe bundling implementations for two important transaction types: reads and upgrades. While bundling could reduce the overhead of most existing prefetch schemes, the evaluation of bundling performed has been limited to two of them: sequential prefetching and Dahlgren's adaptive sequential prefetching. Both schemes have their snoop bandwidth halved for all commercial and scientific benchmarks in the study. The combined effect of bundling applied to these prefetch schemes lowers the cache miss rate, the address bandwidth and the snoop bandwidth, compared with a system with no prefetching, for all applications. Bundling, will not reduce the data bandwidth introduced by a prefetch scheme. However, we argue that the data bandwidth is more easily scaled than the snoop bandwidth for snoop-based coherence systems.

Published in:

Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International

Date of Conference:

26-30 April 2004