By Topic

Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)

Implementing shared memory consistency models on top of hardware caches gives rise to the well-known cache coherence problem. The standard solution involves implementing coherence protocols in hardware, an approach with some design complexity, hardware costs, and restrictions on interconnect behavior. However, for some memory consistency models, an alternative is to enforce coherence in the software implementation of synchronization primitives, using software controlled invalidations and forced writebacks. This requires minimal hardware support but gives less selective enforcement, which affects performance. This paper proposes a novel hybrid software-hardware coherence mechanism. In this scheme, software is responsible for triggering the coherence actions-self-invalidations and writebacks-at appropriate times while hardware uses Bloom filters to perform more selective self-invalidations. We evaluate the proposed scheme on applications from two different domains: the SPLASH-2 scientific and ALP multimedia benchmarks. Experimental results show that while the software-only coherence scheme shows less performance degradation than expected, it still unacceptably degrades performance for some of the benchmarks. Filtering out unnecessary invalidations improves the worst-case performance by as much as 93 percent, and brings the performance of the hybrid scheme within five percent of full hardware coherence for 10 out of 13 benchmarks, on a 32-core CMP with a shared L2 cache.

Published in:

IEEE Transactions on Computers  (Volume:60 ,  Issue: 4 )