By Topic

Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Hari Subramoni ; Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA ; Krishna Kandalla ; Sayantan Sur ; Dhabaleswar K. Panda

Collective communication operations provided by The Message Passing Interface (MPI) are heavily used by scientific applications at large scale. The current MPI standard, MPI-2.2, only defines blocking collective communication calls, which does not allow simultaneous computation and communication. It is expected that MPI-3 will allow for non-blocking collective communication. The newly introduced ConnectX-2 Infini Band adapter from Mellanox features an offload mechanism that enables the Network Interface Card (NIC) to perform a series of communication and reduction operations without the involvement of the host processor. Current generation MPI stacks implement each collective operation using point-to point operations. To take advantage of offload feature in a rapidly changing architectural environment for all MPI collectives, they must be re-designed using flexible and generalized primitives. The primitives can then be used to compose various collective algorithms. The primitives must provide increased overlap with adapters supporting offload capabilities with varying collective group sizes and communication message sizes. In this paper, we take on the challenge of designing collective communication primitives with good overlap characteristics and evaluate their performance using ConnectX-2 offload feature. We also show how collectives such as Barrier can be designed using our communication primitives. Our evaluation reveals that we can achieve near perfect (94% - 100%) overlap of computation and communication by using our primitives. Additionally, we observe performance improvement of up to 5% using the Recv-Replicate primitive for data transfer.

Published in:

2010 18th IEEE Symposium on High Performance Interconnects

Date of Conference:

18-20 Aug. 2010