Skip to Main Content
Remote atomic memory operations are critical for achieving high-performance synchronization in tightly-coupled systems. Previous approaches to implementing atomic memory operations on high-performance networks have explored providing the primitives necessary to achieve low latency and low host processor overhead. In this paper, we explore the implementation of atomic memory operations with a focus on achieving high message rate. We believe that high message rate is a key performance characteristic that will determine the viability of a high-performance network to support future multi-petascale systems, especially those that expect to employ a partitioned global address space (PGAS) programming model. As an example, many have proposed using network interface level atomic operations to enhance the performance of the HPCC RandomAccess benchmark. This paper explores several issues relevant to the design of an atomic unit on the network interface. We explore the implications of the size of the cache as well as the associativity. Given the growing ratio of bandwidth to latency of modern host interfaces, we explore some of the interactions that impact the concurrency needed to saturate the interface.