By Topic

Exploiting Core Working Sets to Filter the L1 Cache with Random Sampling

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Etsion, Y. ; Electr. Eng. & Comput. Sci. Faculties, Technion - Israel Inst. of Technol., Haifa, Israel ; Feitelson, D.G.

Locality is often characterized by working sets, defined by Denning as the set of distinct addresses referenced within a certain window of time. This definition ignores the fact that dramatic differences exist between the usage patterns of frequently used data and transient data. We therefore propose to extend Denning's definition with that of core working sets, which identify blocks that are used most frequently and for the longest time. The concept of a core motivates the design of dual-cache structures that provide special treatment for the core. In particular, we present a probabilistic locality predictor for L1 caches that leverages the skewed popularity of blocks to distinguish transient cache insertions from more persistent ones. We further present a dual L1 design that inserts only frequently used blocks into a low-latency, low-power, direct-mapped main cache, while serving others from a small fully associative filter. To reduce the prohibitive cost of such a filter, we present a content addressable memory design that eliminates most of the costly lookups using a small auxiliary lookup table. The proposed design enables a 16K direct-mapped L1 cache, augmented with a small 2K filter, to outperform a 32K 4-way cache, while at the same time consumes 70-80 percent less dynamic power and 40 percent less static power.

Published in:

Computers, IEEE Transactions on  (Volume:61 ,  Issue: 11 )