Skip to Main Content
Chip multiprocessors (CMPs) enable concurrent execution of multiple threads using several cores on a die. Current CMPs behave much like symmetric multiprocessors and do not take advantage of the proximity between cores to improve synchronization and communication between concurrent threads. Thread synchronization and communication instead use memory/cache interactions. We propose two architectural enhancements to support fine grain synchronization and communication between threads that reduce overhead and memory/cache contention. Register-based synchronization exploits the proximity between cores to provide low-latency shared registers for synchronization. This approach can save significant power over spin waiting when blocking events that suspend the core are used. Pre-pushing provides software controlled data forwarding between caches to reduce coherence traffic and improve cache latency and hit rates. We explore the behavior of these approaches, and evaluate their effectiveness at improving synchronization and communication performance on CMPs with private caches. Our simulation results show significant reduction in inter-core traffic, latencies, and miss rates.