Abstract:
Instruction prefetching can play a pivotal role in improving the performance of workloads with large instruction footprints and frequent, costly frontend stalls. In parti...Show MoreMetadata
Abstract:
Instruction prefetching can play a pivotal role in improving the performance of workloads with large instruction footprints and frequent, costly frontend stalls. In particular, Fetch Directed Prefetching (FDP) is an effective technique to mitigate frontend stalls since it leverages existing branch prediction resources in a processor and incurs very little hardware overhead. Modern processors have been trending towards provisioning more frontend resources, which bodes well for FDP as it requires these resources to be effective. However, recent academic research has been using outdated and less than optimal frontend baselines that employ smaller structures, resulting in equivocal outcomes. This paper presents a detailed FDP microarchitecture and evaluates two improvements, better branch history management and post-fetch correction. Our mechanism provides a 41.0% speedup over the baseline (no prefetching, no FDP) with only 195 bytes of hardware overhead and outperforms the 1st Instruction Prefetching Championship (IPC-1) winners that had a 128KB storage budget. We believe that our FDP-based frontend design can serve as a new reference baseline for instruction prefetching research to bridge the gap between academia and industry.
Published in: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Date of Conference: 28-30 March 2021
Date Added to IEEE Xplore: 28 April 2021
ISBN Information: