By Topic

Optimizing strided remote memory access operations on the Quadrics QsNetII network interconnect

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
J. Nieplocha ; Pacific Northwest Nat. Lab., Richland, WA, USA ; V. Tipparaju ; M. Krishnan

This paper describes and evaluates protocols for optimizing strided noncontiguous communication on the Quadrics QsNetII high-performance network interconnect. Most of previous related studies focused primarily on NIC-based or host-based protocols. This paper discusses merits for using both approaches and tries to determine types and data sizes in the communication operations for which these protocols should be used. We focus on the Quadrics QsNetll network, which offers powerful communication processors on the network interface card (NIC) and practical and flexible opportunities for exploiting them in context of the user. Furthermore, the paper focuses on noncontiguous data remote memory access (RMA) transfers and performs the evaluation in context of standalone communication and application microbenchmarks. In comparison to the vendor provided noncontiguous interfaces, proposed approach achieved significant performance improvement in context of microbenchmarks as well as application kernels; dense matrix multiplication, and the Co-Array Fortran version of the NAS BT parallel benchmark. For example, for NAS BT Class B, 54 % improvement in overall communication time and a 42% improvement in matrix multiplication was achieved for 64 processes

Published in:

Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05)

Date of Conference:

1-1 July 2005