By Topic

Reducing rename logic complexity for high-speed and low-power front-end architectures

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Sangireddy, R. ; Dept. of Electr. Eng., Texas Univ., Richardson, TX

In modern day high-performance processors, the complexity of the register rename logic grows along with the pipeline width and leads to larger renaming time delay and higher power consumption. Renaming logic in the front-end of the processor is one of the largest contributors of peak temperatures on the chip and, so, demands attention to reduce the power consumption. Further, with the advent of clustered microarchitectures, the rename map table at the front-end is shared by the clusters and, hence, its critical path delay should not become a bottleneck in determining the processor clock cycle time. Analysis of characteristics of Spec2000 integer benchmark programs reveals that, when the programs are processed in a 4-wide processor, none or only one two-source instruction (an instruction with two source registers) is renamed in a cycle for 94 percent of the total execution time. Similarly, in an 8-wide processor, none or only one two-source instruction is renamed in a cycle for 92 percent of the total execution time. Thus, the analysis observes that the rename map table port bandwidth is highly underutilized for a significant portion of time. Based on the analysis, in this paper, we propose a novel technique to significantly reduce the number of ports in the rename map table. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the rename logic, without any additional power, area, and delay overheads in any other logic on the chip. The proposed technique performs the register renaming of instructions in the order of their fetch, with no significant impact on the processor's performance. With this technique in an 8-wide processor, as compared to a conventional rename map table in an integer pipeline with 16 ports to look up source operands, a rename map table with nine ports results in a reduction in access time, power, and area by 14 percent, 42 percent, and 49 percent, respectively, with only 4.7 - - percent loss in instructions committed per cycle (IPC). The implementation of the technique in a 4-wide processor results in a reduction in access time, power, and area by 7 percent, 38 percent, and 59 percent, respectively, with an IPC loss of only 4.4 percent

Published in:

Computers, IEEE Transactions on  (Volume:55 ,  Issue: 6 )