By Topic

A Systemic Strategy for Tuning Intra-node Collective Communication on Multicore Systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Zhiqiang Liu ; Coll. of Comput., Nat. Univ. of Defense Technol., Changsha, China ; Junqiang Song ; Kaijun Ren ; Fen Xu
more authors

In HPC domain, a majority of applications build on MPI and employ collective operations in their communication kernels. Improving the performance of collectives has been long term focused by a lot of work. Recently, in the optimization work of collectives on multi-core clusters, hierarchical algorithm designs are remark-able. This kind of algorithms can greatly reduce the inter-node traffic but increase the intra-node traffic load at the same time. Meanwhile, in hierarchical collectives, the part of intra-node collectives take more and more time while the number of cores in each node keeps growing. Improving the performance of intra-node collectives is critical to the holistic performance. However, on multi-cores, the factor of process affinity greatly impacts the performance of an intra-node collective. This peculiarity challenges us how to improve the overall performance of intra-node collectives. Towards this problem, in this paper, we propose a novel and systemic strategy for tuning the performance of intra-node collectives. As illustrative examples, we have implemented our strategy on a dual-socket Intel Clovertown platform and successfully tuned the performance of Broadcast and Allgather up to 14% and 52% improvement together.

Published in:

2009 Fourth International Conference on Frontier of Computer Science and Technology

Date of Conference:

17-19 Dec. 2009