Skip to Main Content
Three implementations of a concurrently-updateable linked list were compared, one that emulates a lock-free approach based on a compare-and-swap instruction, one that makes direct use of the Cray XMT's full-empty synchronization bits on every word of memory, and a third that uses the XMT's atomic int_fetch_add instruction. The relative performance of the three implementations was experimentally compared on a 512-processor XMT. The direct implementation approach performed up to twice as fast as the other two approaches under conditions of low contention, but the three implementations performed about the same when the amount of contention was high.
Date of Conference: 19-23 April 2010