Skip to Main Content
In 2006, John Mellor-Crummey and Michael Scott received the Dijkstra Prize in distributed computing for their 1991 paper on algorithms for scalable synchronization on shared memory multiprocessors, which included a novel spin-lock algorithm (a.k.a. MCS spin-lock) that carefully distributes spin locations in memory to lessen the impact of bandwidth limitations on spin algorithms. Their empirical work and architectural suggestions have had a major impact on how the field has viewed spin-locks. Motivated by emerging architectures with an increasing number of cores, we present an empirical study on recent shared memory architectures, including IBM P5+ and SGI ccNUMA systems. Our results show that latency will have a much greater impact on performance than bandwidth on these and future architectures with many cores and private caches. Several test cases and a tabular overview of our results are included.