By Topic

Using virtual load/store queues (VLSQs) to reduce the negative effects of reordered memory instructions

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Jaleel, A. ; Dept. of Electr. & Comput. Eng., Maryland Univ., College Park, MD, USA ; Jacob, B.

The use of large instruction windows coupled with aggressive out-of-order and prefetching capabilities has provided significant improvements in processor performance. In this paper, we quantify the effects of increased out-of-order aggressiveness on a processor's memory ordering/consistency model as well as an application's cache behavior. We observe that increasing reorder buffer sizes cause less than one third of issued memory instructions to be executed in actual program order. We show that increasing the reorder buffer size from 80 to 512 entries results in an increase in the frequency of memory traps by a factor of six and an increase in total execution overhead by 10-40%. Additionally, we observe that the reordering of memory instructions increases the L1 data cache accesses by 10-60% and the L1 data cache misses by 10-20%. These findings reveal that increased out-of-order capability can waste energy in two ways. First, re-fetching and re-executing instructions flushed due to traps require the fetch, map, and execution units to dissipate energy on work that has already been done before. Second, an increase in the number of cache accesses and cache misses needlessly dissipates energy. Both these side effects can be related to the reordering of memory instructions. Thus, to avoid wasting both energy and performance, we propose a virtual load/store queue (VLSQ) within the existing physical load/store queue. The VLSQ reduces the reordering of memory instructions by limiting the number of memory instructions visible to the select and issue logic. We show that VLSQs can reduce trap overhead, cache accesses, and cache misses by as much as 45%, 50%, and 15% respectively when compared to traditional load/store queues. We observe that these reductions yield net power savings of 10-50% with degradation in performance by 1-5%.

Published in:

High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on

Date of Conference:

12-16 Feb. 2005