By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 11 • Date Nov 2000

Filter Results

Displaying Results 1 - 6 of 6
  • A unified symbolic evaluation framework for parallelizing compilers

    Page(s): 1105 - 1125
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    The quality of many optimizations and analyses of parallelizing compilers depends significantly on the ability to evaluate symbolic expressions and on the amount of information available about program variables at arbitrary program points. In this paper, we describe an effective and unified symbolic evaluation framework that statically determines the values of variables and symbolic expressions, assumptions about and constraints between variable values, and the condition under which control flow reaches a program statement. We introduce the program context, a novel representation for comprehensive and compact control and data flow analysis information. Program contexts are described as first order logic formulas, which allows us to use public domain software for standard symbolic manipulation. Computations are represented as algebraic expressions defined over a program's problem size. Our symbolic evaluation techniques comprise accurate modeling of assignment and input/output statements, branches, loops, recurrences, arrays, and procedures. All of our techniques target both linear, as well as nonlinear, expressions and constraints. Efficiency of symbolic evaluation is highly improved by aggressive simplification techniques. A variety of examples, including program verification, dependence analysis, array privatization, communication vectorization, and elimination of redundant communication, are used to illustrate the effectiveness of our approach. We present results from a preliminary implementation of our framework, which is used as part of a parallelizing compiler that demonstrates the potential performance gains achievable by employing symbolic evaluation to support program parallelization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incremental design of scalable interconnection networks using basic building blocks

    Page(s): 1126 - 1140
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (452 KB)  

    In this paper, we present an incremental design of scalable interconnection networks in multicomputer systems using basic building blocks. Both network topologies and routing algorithms are considered. We use wormhole-routed small-scale 2D meshes as basic building blocks. The minimum requirement to expand these networks is a single building block. This implies that the network does not have to maintain the regular 2D mesh topology. Some new topologies are introduced: incomplete meshes based on those adaptive routing algorithms designed from the turn model and extended incomplete meshes based on XY routing. We show that the original routing algorithm can be adopted to send a message between any source and destination without using store-and-forward and causing deadlock. The way that the network is constructed incrementally requires no or a very small amount of rewiring and keeps high bisection density and short diameter of the network. The design methods can be used to economically and incrementally build expandable and scalable parallel computers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data locality exploitation in the decomposition of regular domain problems

    Page(s): 1141 - 1150
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (824 KB)  

    The aim of this paper is to study the effect of local memory hierarchy and communication network exploitation on message sending and the influence of this effect on the decomposition of regular applications. In particular, we have considered two different parallel computers, a Cray T3E-900 and an SGI Origin 2000. In both systems, the bandwidth reduction due to non-unit-stride memory access is quite significant and could be more important than the reduction due to contention in the network. These conclusions affect the choice of optimal decompositions for regular domains problems. Thus, although traditional 3D decompositions lead to lower inherent communication-to-computation ratios and could exploit more efficiently the interconnection network, lower dimensional decompositions are found to be more efficient due to the data decomposition effects on the spatial locality of the messages to be communicated. This increasing importance of local optimisations has also been shown using a well-known communication-computation overlapping technique which increases execution time, instead of reducing it as we could expect, due to poor cache memory exploitation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance metrics for embedded parallel pipelines

    Page(s): 1164 - 1185
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    A statistical approach to performance prediction is applied to a system development methodology for pipelines comprised of independent parallel stages. The methodology is aimed at distributed memory machines employing medium-grained parallelization. The target applications are continuous-flow embedded systems. The use of order statistics on this type of system is compared to previous practical usage which appears largely confined to traditional Non-Uniform Memory Access (NUMA) machines for loop parallelization. A range of suitable performance metrics which give upper bounds or estimates for task durations are discussed. The metrics have a practical role when included in prediction equations in checking fidelity to an application performance specification. An empirical study applies the mathematical findings to the performance of a multicomputer for a synchronous pipeline stage. The results of a simulation are given for larger numbers of processors. In a further simulation, the results are extended to take account of waiting-time distributions while data are buffered between stages of an asynchronous pipeline. Order statistics are also employed to estimate the degradation due to an output ordering constraint. Practical illustrations in the image communication and vision application domains are included. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic task scheduling using online optimization

    Page(s): 1151 - 1163
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (344 KB)  

    Algorithms for scheduling independent tasks on to the processors of a multiprocessor system must trade-off processor load balance, memory locality, and scheduling overhead. Most existing algorithms, however, do not adequately balance these conflicting factors. This paper introduces the self-adjusting dynamic scheduling (SADS) class of algorithms that use a unified cost model to explicitly account for these factors at runtime. A dedicated processor performs scheduling in phases by maintaining a tree of partial schedules and incrementally assigning tasks to the least-cost schedule. A scheduling phase terminates whenever any processor becomes idle, at which time partial schedules are distributed to the processors. An extension of the basic SADS algorithm, called DBSADS, controls the scheduling overhead by giving higher priority to partial schedules with more task-to-processor assignments. These algorithms are compared to two distributed scheduling algorithms within a database application on an Intel Paragon distributed memory multiprocessor system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A residue number system on reconfigurable mesh with applications to prefix sums and approximate string matching

    Page(s): 1186 - 1199
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (324 KB)  

    Several new number representations based on a residue number system are presented which use the smallest prime numbers as moduli and are suited for parallel computations on a reconfigurable mesh architecture. The bit model of linear reconfigurable mesh with exclusive write and unit-time delay for broadcasting on a subbus is assumed. It is shown how to convert in O(1) time any integer, ranging between 0 and n-1, from any commonly used representation to any new representation proposed in this paper (and vice versa) using an n×O(log2 n/log log n) reconfigurable mesh. In particular, some of the previously known conversion techniques are improved. Moreover, as a byproduct, it is shown how to compute in O(1) time the Prefix Sums of n bits by a reconfigurable mesh having the above mentioned size, thus improving previously known results. Applications to the Prefix Sums of n h-bit integers and to Approximate String Matching with α mismatches are also considered. The Summation and the Prefix Sums can be computed in O(1) time using O(h log N+log2 N/log log N)×Nh and O(h2+log2 N/log(h+log N))×O(N(h+log N)) reconfigurable meshes, respectively. Moreover, it is shown for the first time how to find in O(1) time all the occurrences of a pattern of length m in a text of length n, allowing less than α mismatches, using a reconfigurable mesh of size O(m log|Σ|)×O (n(log|Σ|+log2 α/log log α)), where the pattern and the text are strings over a finite alphabet Σ and α View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology