By Topic

A high-bandwidth memory pipeline for wide issue processors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Sangyeun Cho ; Media IP Group, Samsung Electron. Co., Kyoung, South Korea ; Pen-Chung Yew ; Gyungho Lee

Providing adequate data bandwidth is extremely important for a future wide-issue processor to achieve its full performance potential. Adding a large number of ports to a data cache, however, becomes increasingly inefficient and can add to the hardware complexity significantly. This paper takes an alternative or complementary approach for providing more data bandwidth, called data decoupling. This paper especially studies an interesting, yet less explored, behavior of memory access instructions, called access region locality, which is concerned with each static memory instruction and its range of access locations at runtime. Our experimental study using a set of SPEC95 benchmark programs shows that most memory access instructions reference a single region at runtime. Also shown is that it is possible to accurately predict the access region of a memory instruction at runtime by scrutinizing the addressing mode of the instruction and the past access history of it. We describe and evaluate a wide-issue superscalar processor with two distinct sets of memory pipelines and caches, driven by the access region predictor. Experimental results indicate that the proposed mechanism is very effective in providing high memory bandwidth to the processor, resulting in comparable or better performance than a conventional memory design with a heavily multiported data cache that can lead to much higher hardware complexity

Published in:

IEEE Transactions on Computers  (Volume:50 ,  Issue: 7 )