As systems grow larger and computation is further spread across nodes, efficient data communication is becoming increasingly important to achieve high throughput and low power consumption for high performance computing systems. However, communication efficacy not only depends on application-specific communication patterns, but also on machine-specific communication subsystems, node architectures, and even the runtime communication libraries. In fact, different hardware systems lead to different tradeoffs with respect to communication mechanisms, which can impact the choice of application implementations. We present a set of MPI-based benchmarks to better understand the communication behavior of the hardware systems and guide the performance tuning of scientific applications. We further apply these benchmarks to three clusters and present several interesting lessons from our experience.
Published in:
Parallel Processing Workshops (ICPPW), 2012 41st International Conference on
Date of Conference: 10-13 Sept. 2012