Skip to Main Content
An execute-ahead processor pre-executes instructions when a load miss would stall the processor. The typical design has several components that grow with the distance to execute ahead and need to be carefully balanced for optimal performance. This paper presents a novel approach which unifies those components and therefore is easy to implement and has no trouble to balance resource investment. When executing ahead, the processor enqueues (or preserves) all instructions along with the known execution results (including register and memory) in a preserving buffer (PB). When the leading load miss is resolved, the processor dequeues the instructions and then restores the known execution results or dispatch the instructions not yet executed. The implementation overheads include PB and a run-ahead cache for forwarding memory data. Only PB grows with the distance to execute ahead. This method can be applied to both in-order and out-of-order processors. Our experiments show that a four-way superscalar out-of-order processor with a 1 K-entry PB can have 15% and 120% speedup over the baseline design for SPEC INT2000 and SPEC FP2000 benchmark suites, assuming a 128-entry instruction window and a 300-cycle memory access latency.