By Topic

Portable petaFLOP/s programming: applying distributed computing methodology to the grid within a single machine room

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
P. R. Woodward ; Lab. of Comput. Sci. & Eng., Minnesota Univ., Minneapolis, MN, USA ; S. E. Anderson

According to today's best projections, petaFLOP/s computing platforms will combine deep memory hierarchies in both latency and bandwidth with a need for many-thousand-fold parallelism. Unless effective parallel programs are prepared in advance, much of the promise of the first year or two of operation for these systems may be lost. We introduce a candidate for a portable petaFLOP/s programming model that can enable these important early application programs to be developed while, at the same time, permitting these same applications to run efficiently on the most capable computing systems now available. An MPI-based model is portable, but its programming paradigm ignores the potential benefits of hardware support for shared memory within each network node. A threads-based model cannot directly cope with the distributed nature of the memory over the network. Therefore, a new, portable programming model is needed. The shared memory programming model dramatically simplifies the expression of dynamic load balancing strategies for irregular algorithms. The main strategy is a transparent self-scheduled task list performed in parallel so long as specified data-dependent conditions are met. The model used is a cluster of multiprocessor distributed shared memory machines with network-attached disks. Our experimental run-time system allows the programmer to view this computing platform as a single machine with a four-stage memory hierarchy, consisting of coherent processor cache, non-coherent local shared memory, global shared memory, plus a global disk file system

Published in:

High Performance Distributed Computing, 1999. Proceedings. The Eighth International Symposium on

Date of Conference: