Skip to Main Content
The implementation and optimization of collective communication operations is an important field of active research. Such operations directly influence application performance and need to map the communication requirements in an optimal way to steadily changing network architectures. In this work, we define an abstract domain-specific language to express arbitrary group communication operations. We show the universality of this language and how all existing collective operations can be implemented with it. By design, it readily lends itself to blocking and nonblocking execution, as well as to off-loaded execution of complex group communication operations. We also define several offline and online optimizations (compiler transformations and scheduling decisions, respectively) to improve the overall performance of the operation. Performance results show that the overhead to express current collective operations is negligible in comparison to the potential gains in a highly optimized implementation.