Skip to Main Content
As processor back-ends get more aggressive, front-ends will have to scale as well. Although the back-ends of superscalar processors have continued to become more parallel, the front-ends remain sequential. We describe techniques for fetching and renaming multiple noncontiguous portions of the dynamic instruction stream in parallel using multiple fetch and rename units. It demonstrates that parallel front-ends are a viable alternative to high performance sequential front-ends. Compared with an equivalently sized trace cache, our technique increases cache bandwidth utilization by 17%, front-end throughput by 20%, and performance by 5%. Parallelism also enhances latency tolerance: a parallel front-end loses only 6% performance as the cache size is decreased from 128 KB to 8 KB, compared with a 50-65% performance loss for sequential fetch mechanisms.