Cart (Loading....) | Create Account
Close category search window

A multithreaded PowerPC processor for commercial servers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $31
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Borkenhagen, J.M. ; IBM Server Group, 3605 Highway 52 N, Rochester, Minnesota 55901, USA ; Eickemeyer, R.J. ; Kalla, R.N. ; Kunkel, S.R.

This paper describes the microarchitecture of the RS64 IV, a multithreaded PowerPC® processor, and its memory system. Because this processor is used only in IBM iSeries™ and pSeries™ commercial servers, it is optimized solely for commercial server workloads. Increasing miss rates because of trends in commercial server applications and increasing latency of cache misses because of rapidly increasing clock frequency are having a compounding effect on the portion of execution time that is wasted on cache misses. As a result, several optimizations are included in the processor design to address this problem. The most significant of these is the use of coarse-grained multithreading to enable the processor to perform useful instructions during cache misses. This provides a significant throughput increase while adding less than 5% to the chip area and having very little impact on cycle time. When compared with other performance-improvement techniques, multithreading yields an excell ent ratio of performance gain to implementation cost. Second, the miss rate of the L2 cache is reduced by making it four-way associative. Third, the latency of cache-to-cache movement of data is minimized. Fourth, the size of the L1 caches is relatively large. In addition to addressing cache misses, pipeline “holes” caused by branches are minimized with large instruction buffers, large L1 I-cache fetch bandwidth, and optimized resolution of the branch direction. In part, the branches are resolved quickly because of the short but efficient pipeline. To minimize pipeline holes due to data dependencies, the L1 D-cache access is optimized to yield a one-cycle load-to-use penalty.

Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.  

Published in:

IBM Journal of Research and Development  (Volume:44 ,  Issue: 6 )

Date of Publication:

Nov. 2000

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.