ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment | IEEE Conference Publication | IEEE Xplore

ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment


Abstract:

Recently-proposed architectures that continuously operate on atomic blocks of instructions (also called chunks) can boost the programmability and performance of shared-me...Show More

Abstract:

Recently-proposed architectures that continuously operate on atomic blocks of instructions (also called chunks) can boost the programmability and performance of shared-memory multiprocessing. However, they must support chunk operations very efficiently. In particular, in lazy conflict-detection environments, it is key that they provide scalable chunk commits. Unfortunately, current proposals typically fail to enable maximum overlap of conflict-free chunk commits. This paper presents a novel directory-based protocol that enables highly-overlapped, scalable chunk commits. The protocol, called Scalable Bulk, builds on the previously-proposed BulkSC protocol. It introduces three general hardware primitives for scalable commit: preventing access to a set of directory entries, grouping directory modules, and initiating the commit optimistically. Our results with SPLASH-2 and PARSEC codes with up to 64 processors show that Scalable Bulk enables highly-overlapped chunk commits and delivers scalable performance. Unlike previously proposed schemes, it removes practically all commit stalls.
Date of Conference: 04-08 December 2010
Date Added to IEEE Xplore: 20 January 2011
Print ISBN:978-1-4244-9071-4

ISSN Information:

Conference Location: Atlanta, GA, USA

1. Introduction

There are several recent proposals for shared-memory architectures that efficiently support continuous atomic-block operation [2], [5], [6], [8], [9], [14], 1[8], 1[9]. In these architectures, a processor repeatedly executes blocks of consecutive instructions from a thread (also called chunks) in an atomic manner. These systems include TCC [6], [9], BulkSC [5], Implicit Transactions (IT) [18], ASO [19], InvisiFence [2], DMP [8], and SRC [14] among others. This mode of execution has performance and programmability advantages. For example, it can support transactional memory [6], [9], [14]; high-performance execution, even for strict memory consistency models [2], [5], [19]; a variety of techniques for parallel program development and debugging such as determinism [8], program replay [12], and atomicity violation debugging [10]; and even provide a substrate for new high-performance compiler transformations [1], [13].

Contact IEEE to Subscribe

References

References is not available for this document.