By Topic

TIDeFlow: A Parallel Execution Model for High Performance Computing Programs

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Orozco, D. ; Univ. of Delaware, Newark, DE, USA

Summary form only given. The popularity of serial execution paradigms in the High Performance Computing (HPC) field greatly hinders the ability of computational scientists to develop and support massively parallel programs. Programmers are left with languages that are inadequate to express parallel constructs, being forced to take decisions that are not directly related to the programs they write. Computer architects are forced to support sequential memory semantics only because serial languages require them and operating system designers are forced to support slow synchronization operations. This poster addresses the development and execution of HPC programs in many-core architectures by introducing the Time Iterated Dependency Flow (TIDeFlow) execution model. In TIDeFlow, programmers specify the precedence relations between computations without dealing with implementation details related to synchronization or scheduling. TIDeFlow is a graph-based model inspired by dataflow: Computations in a program are expressed as actors whose dependencies are represented by arcs. TIDeFlow departs from other dataflow models (1) in that actors represent parallel loops as the basic building block of HPC programs, (2) in that arcs between actors represent dependencies of any kind (data, control or other) and (3) in that arc weights allow delaying tokens to provide support for pipelining. TIDeFlow is related to other dataflow models (an excellent survey can be found in [1]) in the idea of executing several computations into a single actor as in Macro Dataflow[2] and in allowing multiple, concurrent executions of the same actor through coloring of tokens as in Dynamic Dataflow[3]. The resulting TIDeFlow model expresses HPC programs as directed graphs with weighted nodes and weighted arcs that represent parallel loops and loop carried dependencies respectively. The model is useful to support task pipelining, task migration, and distributed control. An implementation of TIDeFlow was d- veloped for Cyclops- 64[4], a 160-core architecture by IBM. The implementation resulted in new, highly concurrent algorithms-such as the HT-Queue[4]and in development of efficient representation of runtime system primitives such as polytasks[5]. The implementation is supported by a number of software tools including a graph programming model, a parallel intermediate representation form and a fully distributed runtime system. The effectiveness of TIDeFlow was tested using several HPC programs, including FDTD in 1 and 2 dimensions [6], Matrix Multiply, and FFT. In all cases, the programs were run using Cyclops-64, allowing excellent studies in scalability, performance, parallelism and overhead. The results of the experiments, presented in [4] and [5] show that TIDeFlow can efficiently support very fine grained execution due to its very low overhead and its distributed nature. The experiments also show excellent scalability, al- lowing close-to-linear scalability for 156 processors executing matrix multiply. The experiments also showed the advantages for development: Expressing dependencies using a graph was found to be easier than placing hand-coded synchronization constructs inside programs. The performance of the TIDeFlow runtime system was carefully measured, showing that it uses very few clock cycles to create, schedule and terminate tasks. The runtime system is fully distributed and lock-free, allowing runtime operations to be insensible to the load of the system. This poster introduces TIDeFlow by presenting (1) its graph programming model (weighted nodes and wighted arcs), (2) a description of composability in TIDeFlow, which allows the use of programs to build larger programs, (3) a brief description of the TIDeFlow runtime system and (4) a summary of the excellent results obtained, both in scalability and overhead for FDTD in 1 and 2 dimensions, Matrix Multiply, and FFT. This work contributes to the state of the art by: (1) Presenting a new execution model,

Published in:

Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on

Date of Conference:

10-14 Oct. 2011