This paper presents a new approach to implement global reduction operations in wormhole k-ary n-cubes. The novelty lies in using multidestination message passing mechanism instead of single destination (unicast) messages. Using pairwise exchange worms along each dimension, it is shown that complete global reduction and barrier synchronization operations, as defined by the Message Passing Interface (MPI) standard, can be implemented with n communication start-ups as compared to 2n [log 2 k] start-ups required with unicast-based message passing. Analytical results for different values of communication startup time, system size, and data size are presented and compared with the unicast-based scheme. The analysis indicates that the proposed framework can be effectively used in wormhole-routed systems to achieve fast global reduction without a separate control network
Published in:
Parallel Processing Symposium, 1995. Proceedings., 9th International
Date of Conference: 25-28 Apr 1995