By Topic

Optimizing Process-to-Core Mappings for Two Dimensional Broadcast/Reduce on Multicore Architectures

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Karlsson, C. ; Dept. of Math. & Comput. Sci., Colorado Sch. of Mines, Golden, CO, USA ; Davies, T. ; Chong Ding ; Hui Liu
more authors

In today's high performance computing, many MPI programs (e.g., ScaLAPACK applications, High Performance Linpack Benchmark HPL, and many PDE solvers based on domain decomposition methods) organize their computational processes as multidimensional process grids. Communications are often necessary in each dimension. Multidimensional broadcast, where a broadcast has to be performed in each dimension, is one of the many operations in applications that use multidimensional process grids. In this paper, we study the impact of the MPI process-to-core mapping on the performance of multidimensional broadcast operations. We show that the default process-to-core mappings in today's state-of-the-art MPI implementations are often sub-optimal for multidimensional broadcast. We propose an application-level multicore-aware process-to-core re-mapping scheme that is capable of achieving optimal performance for multidimensional broadcast operations. The proposed multicore-aware process-to-core re-mapping scheme improves the performance of multidimensional broadcast operations by up to 64% over the default mapping scheme on the world's current eighth fastest supercomputer, Kraken, at the Oak Ridge National Laboratory.

Published in:

Parallel Processing (ICPP), 2011 International Conference on

Date of Conference:

13-16 Sept. 2011