By Topic

Optimizing the instruction cache performance of the operating system

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Torrellas, J. ; Dept. of Comput. Sci., Illinois Univ., Urbana, IL, USA ; Chun Xia ; Daigle, R.L.

High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to minimize cache interference by improving the layout of the basic blocks of the code. However, the performance impact of this technique has been reported for application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. It is unknown how well existing optimizations perform for systems code acid whether better optimizations can be found. We address this problem in this paper. This paper characterizes, in detail, the locality patterns of the operating system code and shows that there is substantial locality Unfortunately, caches are not able to extract much of it: Rarely-executed special-case code disrupts spatial locality loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. Based on our observations, we propose an algorithm to expose these localities and reduce interference in the cache. For a range of cache sizes, associativities, lines sizes, and organizations, we show that we reduce total instruction miss rates by 31-86 percent, or up to 2.9 absolute points. Using a simple model, this corresponds to execution time reductions of the order of 10-25 percent. In addition, our optimized operating system combines well with optimized and unoptimized applications

Published in:

Computers, IEEE Transactions on  (Volume:47 ,  Issue: 12 )