Skip to Main Content
A significant component of a high-performance cluster is the compute node interconnect. InfiniBand, is an interconnect of such systems that is enjoying wide success due to low latency (1.0-3.0 musec) and high bandwidth and other features. The Message Passing Interface (MPI) is the dominant programming model for parallel scientific applications. As a result, the MPI library and interconnect play a significant role in the scalability. These clusters continue to scale to ever-increasing levels making the role very important. As an example, the ldquoRangerrdquo system at the Texas Advanced Computing Center (TACC) includes over 60,000 cores with nearly 4000 InfiniBand ports. Previous work has shown that memory usage simply for connections when using the Reliable Connection (RC) transport of InfiniBand can reach hundreds of megabytes of memory per process at that level. To address these scalability problems a new InfiniBand transport, eXtended Reliable Connection, has been introduced. In this paper we describe XRC and design MPI over this new transport. We describe the variety of design choices that must be made as well as the various optimizations that XRC allows. We implement our designs and evaluate it on an InfiniBand cluster against RC-based designs. The memory scalability in terms of both connection memory and memory efficiency for communication buffers is evaluated for all of the configurations. Connection memory scalability evaluation shows a potential 100 times improvement over a similarly configured RC-based design. Evaluation using NAMD shows a 10% performance improvement for our XRC-based prototype for the jac2000 benchmark.