1. Introduction
There are several recent proposals for shared-memory architectures that efficiently support continuous atomic-block operation [2], [5], [6], [8], [9], [14], 1[8], 1[9]. In these architectures, a processor repeatedly executes blocks of consecutive instructions from a thread (also called chunks) in an atomic manner. These systems include TCC [6], [9], BulkSC [5], Implicit Transactions (IT) [18], ASO [19], InvisiFence [2], DMP [8], and SRC [14] among others. This mode of execution has performance and programmability advantages. For example, it can support transactional memory [6], [9], [14]; high-performance execution, even for strict memory consistency models [2], [5], [19]; a variety of techniques for parallel program development and debugging such as determinism [8], program replay [12], and atomicity violation debugging [10]; and even provide a substrate for new high-performance compiler transformations [1], [13].