By Topic

Address-indexed memory disambiguation and store-to-load forwarding

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
S. S. Stone ; Dept. of Electr. & Comput. Eng., Illinois Univ., Urbana, IL, USA ; K. M. Woley ; M. I. Frank

This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculatively and out-of-order prior to resolving their dependences. Whereas the LSQ requires associative and age-prioritized searches for each access, we propose that an address-indexed store-forwarding cache (SFC) perform store-to-load forwarding and that an address-indexed memory disambiguation table (MDT) perform memory disambiguation. Neither structure includes a CAM. The SFC behaves as a small cache, accessed speculatively and out-of-order by both loads and stores. Because the SFC does not rename in-flight stores to the same address, violations of memory anti and output dependences can cause in-flight loads to obtain incorrect values from the SFC. Therefore, the MDT uses sequence numbers to detect and recover from true, anti, and output memory dependence violations. We observe empirically that loads and stores that violate anti and output memory dependences are rarely on a program's critical path and that the additional cost of enforcing predicted anti and output dependences among these loads and stores is minimal. In conjunction with a scheduler that enforces predicted anti and output dependences, the MDT and SFC yield performance equivalent to that of a large LSQ that has similar or greater circuit complexity. The SFC and MDT are scalable structures that yield high performance and lower dynamic power consumption than the LSQ, and they are well-suited for checkpointed processors with large instruction windows

Published in:

38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05)

Date of Conference:

16-16 Nov. 2005