Cart (Loading....) | Create Account
Close category search window

Hardware implementation of MPI_Barrier on an FPGA cluster

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Shanyuan Gao ; Electr. & Comput. Eng. Dept., Univ. of North Carolina at Charlotte, Charlotte, NC, USA ; Schmidt, A.G. ; Sass, R.

Message-Passing is the dominant programming model for distributed memory parallel computers and Message-Passing Interface (MPI) is the standard. Along with point-to-point send and receive message primitives, MPI includes a set of collective communication operations that are used to synchronize and coordinate groups of tasks. The MPI_Barrier, one of the most important collective procedures, has been extensively studied on a variety of architectures over last twenty years. However, a cluster of Platform FPGAs is a new architecture and offers interesting, resource-efficient options for implementing the barrier operation. This paper describes an FPGA implementation of MPI_Barrier. The premise is that barrier (and other collective communication operations) are very sensitive to latency as the number of nodes scales to the tens-of-thousands. The relatively slow processors found on FPGAs will significantly cap performance. The FPGA hardware design implements a tree-based algorithm and is tightly integrated with the custom high-speed on-chip/off-chip network. MPI access is available through a specially-designed kernel module. This effectively offloads the work from the CPU and OS into hardware. The evaluation of this design shows significant performance gains compared with a conventional software implementation on both an FPGA cluster and a commodity cluster. Further, it suggests that moving other MPI collective operations into hardware would be beneficial.

Published in:

Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on

Date of Conference:

Aug. 31 2009-Sept. 2 2009

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.