Skip to Main Content
Memory latency is a significant bottleneck in modern computer architectures, especially for commercial and multimedia applications. Instruction cache misses can severely limit the performance, due to advent of superscalar processors and multicore systems. Prefetching is one of the promising method to bridge the performance gap between CPU and DRAM speed. Although Instruction prefetching is a promising technique to hide the memory latency, they fail to issue prefetches early enough for modern superscalar processors. To overcome these limitations, we propose a new instruction prefetching technique called Basicblock Instruction Prefetching that employs a prefetch engine which issues prefetch instructions to achieve useful and early prefetches far enough in advance. Our prefetching design results in good coverage, is accurate, and produces timely results that can be effectively used by the processor. Performance evaluation is carried out through cycle-accurate trace-driven simulation. The experimental results show that the proposed scheme is successful in 80% accurate prediction and achieves better timeliness.