Designing a modern memory hierarchy with hardware prefetching
Wei-Fen Lin; Reinhardt, S.K.; Burger, D.
Computers, IEEE Transactions on
Volume 50, Issue 11, Nov 2001 Page(s):1202 - 1218
Digital Object Identifier 10.1109/12.966495
Summary:In this paper, we address the severe performance gap caused by
high processor clock rates and slow DRAM accesses. We show that, even
with an aggressive, next-generation memory system using four Direct
Rambus channels and an integrated one-megabyte level-two cache, a
processor still spends over half its time stalling for L2 misses. Our
experimental analysis begins with an effort to tune our baseline memory
system aggressively: incorporating optimizations to reduce DRAM row
buffer misses, reordering miss accesses to reduce queuing delay, and
adjusting the L2 block size to match each channel organization. We show
that there is a large gap between the block sizes at which performance
is best and at which miss rate is minimized. Using those results, we
evaluate a hardware prefetch unit integrated with the L2 cache and
memory controllers. By issuing prefetches only when the Rambus channels
are idle, prioritizing them to maximize DRAM row buffer hits, and giving
them low replacement priority, we achieve a 65 percent speedup across 10
of the 26 SPEC2000 benchmarks, without degrading the performance of the
others. With eight Rambus channels, these 10 benchmarks improve to
within 10 percent of the performance of a perfect L2 cache
View citation and abstract |