Skip to Main Content
This paper describes several methods for improving the scalability of memory disambiguation hardware for future high ILP processors. As the number of in-flight instructions grows with issue width and pipeline depth, the load/store queues (LSQ) threaten to become a bottleneck in both power and latency. By employing lightweight approximate hashing in hardware with structures called Bloom filters, many improvements to the LSQ are possible. We propose two types of filtering schemes using Bloom filters: search filtering, which uses hashing to reduce both the number of lookups to the LSQ and the number of entries that must be searched, and state filtering, in which the number of entries kept in the LSQs is reduced by coupling address predictors and Bloom filters, permitting smaller queues. We evaluate these techniques for LSQs indexed by both instruction age and the instruction's effective address, and for both centralized and physically partitioned LSQs. We show that search filtering avoids up to 98% of the associative LSQ searches, providing significant power savings and keeping LSQ searches to under one high-frequency clock cycle. We also show that with state filtering, the load queue can be eliminated altogether with only minor reductions n performance for small instruction window machines.