By Topic

Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Srikantaiah, S. ; Pennsylvania State Univ., University Park, PA, USA ; Kandemir, M.

Translation Look-aside Buffers (TLBs) are vital hardware support for virtual memory management in high performance computer systems and have a momentous influence on overall system performance. Numerous techniques to reduce TLB miss latencies including the impact of TLB size, associativity, multilevel hierarchies, super pages, and prefetching have been well studied in the context of uniprocessors. However, with Chip Multiprocessors (CMPs) becoming the standard design point of processor architectures, it is imperative that we review the design and organization of TLBs in the context of CMPs. In this paper, we propose to improve system performance by means of a novel way of organizing TLBs called Synergistic TLBs. Synergistic TLB is different from per-core private TLB organization in three ways: (i) it provides capacity sharing of TLBs by facilitating storing of victim translations from one TLB in another to emulate a distributed shared TLB (DST), (ii) it supports translation migration for maximizing the utilization of TLB capacity, and (iii) it supports translation replication to avoid excess latency for remote TLB accesses. We explore all the design points in this design space and find that an optimal point exists for high performance address translation. Our evaluation with both multiprogrammed (SPEC 2006 applications) and multithreaded workloads (PARSEC applications) shows that Synergistic TLBs can eliminate, respectively, 44.3% and 31.2% of the TLB misses, on average. It also improves the weighted speedup of multiprogrammed application mixes by 25.1% and performance of multithreaded applications by 27.3%, on average.

Published in:

Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on

Date of Conference:

4-8 Dec. 2010