By Topic

Parallel Processing Symposium, 1994. Proceedings., Eighth International

Date 26-29 April 1994

Filter Results

Displaying Results 1 - 25 of 138
  • Proceedings of 8th International Parallel Processing Symposium

    Publication Year: 1994
    Request permission for commercial reuse | PDF file iconPDF (26 KB)
    Freely Available from IEEE
  • HyperC: portable parallel programming in C

    Publication Year: 1994, Page(s):682 - 687
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (392 KB)

    We introduce the HyperC language, a data parallel extension of C intended for portability over a wide range of architectures. We present the main topics of the language: the explicit parallelism through the data, the synchronous semantics and the parallel flow control that allows asynchronous execution, new function qualifiers to emphasize locality properties code and, finally, new communication t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient barriers for distributed shared memory computers

    Publication Year: 1994, Page(s):604 - 608
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (364 KB)

    Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for non-uniform memory access (NUMA) architectures, but less work has been done for cache only memory access (COMA) or attraction memory architectures such as the KSR-1. We present two new barrier algorithms that offer the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A clustered reduced communication element by element preconditioned conjugate gradient algorithm for finite element computations

    Publication Year: 1994, Page(s):509 - 516
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (460 KB)

    The clustered element by element preconditioned conjugate gradient (EBE-PCG) method can be effectively used to solve problems with symmetric positive definite matrices such as those arising in ANTARES-3D, a metal forming finite element (FE) simulation package. Efficient parallelization of this application on distributed memory multiple instruction multiple data (MIMD) parallel computers require au... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Routing and sorting on meshes with row and column buses

    Publication Year: 1994, Page(s):411 - 417
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (748 KB)

    Gives improved deterministic algorithms for permutation routing and sorting on meshes with row and column buses. Among our results, we obtain a fairly simple algorithm for permutation routing on two-dimensional meshes with buses that achieves a running time of n+o(n) and a queue size of 2. We also describe an algorithm for routing on r-dimensional networks with a running time of (2-1/r)n+o(n) and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bitonic sorting on 2D-PEC: an algorithmic study on a hierarchy of meshes network

    Publication Year: 1994, Page(s):418 - 423
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    Packed exponential connections (PEC) is a new type of network that attempts to solve the scalability and connectivity problems of very large interconnection networks by augmenting a 2D mesh with a uniform distribution of longer connections. In order to gain insight into the use of a MIMD system that has an underlying PEC network, this paper presents the results of an investigation of bitonic sorti... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating functional and imperative parallel programming: CC++ solutions to the Salishan problems

    Publication Year: 1994, Page(s):61 - 67
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (468 KB)

    We investigate the practical integration of functional and imperative parallel programming in the context of a popular sequential object-based language. As the basis of our investigation, we develop solutions to the Salishan problems, a set of problems intended as a standard by which to compare parallel programming notations. The language that we use is CC++, C++ extended with single-assignment va... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal mesh computer algorithm for constrained Delaunay triangulation

    Publication Year: 1994, Page(s):102 - 109
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (540 KB)

    We present an optimal parallel algorithm that runs in O(√n) time on a √n×√n mesh to compute the constrained Delaunay triangulation of a planar straight line graph G whose vertices lie in an n-element set S. Implications of our result also include an efficient PRAM algorithm for the same problem, a new optimal mesh algorithm to compute a planar Voronoi diagram, as well as a ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient task allocation algorithm and its use to parallelize irregular Gauss-Seidel type algorithms

    Publication Year: 1994, Page(s):497 - 501
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (504 KB)

    The parallelization and implementation of Gauss-Seidel power flow analysis have been investigated. The desired properties to maximize the speedup, such as minimum communication overhead and balanced computational load, have been described. In this paper, we investigate a two-stage parallelization scheme to achieve the desired properties for distributed memory machines. In the first stage, we intro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A comparison of heuristics for scheduling DAGs on multiprocessors

    Publication Year: 1994, Page(s):446 - 451
    Cited by:  Papers (34)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (436 KB)

    Many algorithms to schedule directed acyclic graphs (DAGs) on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multiprocessor scheduling is an NP-hard problem, no exact tractable algorithm exists, and no baseline is available from which to compare the resulting schedules. This paper is an attempt to quantify the differences in a few of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication and computation patterns of large scale image convolutions on parallel architectures

    Publication Year: 1994, Page(s):926 - 931
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (476 KB)

    Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. The article presents an analysis of a texture segmentation application containing a 96×96 convolution. Sequential execution required several hours an single processor systems with over 99% of the time spent performing the large convolution. 70% to 75% of exe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accommodating polymorphic data decompositions in explicitly parallel programs

    Publication Year: 1994, Page(s):68 - 74
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    Explicitly parallel programs have the potential for greater performance than their implicitly parallel counterparts. However, this benefit can be accompanied by additional programming difficulties. We address one particular problem that has implications for both scalability and portability: the need for programs do accommodate diverse data decompositions. We explain why programs with explicit comm... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Barrier synchronization techniques for distributed process creation

    Publication Year: 1994, Page(s):597 - 603
    Cited by:  Papers (3)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (524 KB)

    Synchronization techniques are proposed for algorithms which spawn processes remotely on loosely coupled processors based on run-time characteristics. The performance of the proposed synchronization schemes are measured on the iPSC/2 and SNAP-1 multiprocessors and their implementation cost is discussed. Results show that processes created dynamically throughout a distributed system can be synchron... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient matrix chain ordering in polylog time

    Publication Year: 1994, Page(s):234 - 241
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    This paper gives an O(lg3 n)-time and n/lg n processor algorithm for solving the matrix chain ordering problem and for finding optimal triangulations of a convex polygon on the common CRCW PRAM model. This algorithm works by finding shortest paths in special digraphs modeling dynamic programming tables. Also, a key part of the algorithm is improved by computing row minima of a totally m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Deterministic routing and sorting on rings

    Publication Year: 1994, Page(s):406 - 410
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    We present deterministic algorithms for k-k routing and k-k sorting on circular processor arrays with bidirectional connections. We distinguish between cases where k<4, 4⩽k<n2, and k⩾n2. Standing results are considerably improved; for most problem instances, near-optimality is achieved. A very simple algorithm has good performance for dynamic routing problems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The generalized class of g-chain periodic sorting networks

    Publication Year: 1994, Page(s):424 - 432
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (572 KB)

    A periodic sorter is a sorting network which is a cascade of a number of identical blocks, where output i of each block is input i of the next block. Previously, (Dowd et al., 1989) introduced the balanced merging network, with N=2k inputs/outputs and log N stages of comparators. Using an intricate proof, they showed that a cascade of log N such blocks constitutes a sorting network. We ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variable instruction issue for efficient MIMD interpretation on SIMD machines

    Publication Year: 1994, Page(s):304 - 310
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (504 KB)

    Programming SIMD hardware to interpret (in parallel) programs and data resident in each PE is a technique for obtaining a cost effective, massively parallel MIMD processing environment. Although heavily dependent on each application that is interpreted, the performance of the synthesized MIMD environment is greatly influenced by the organization of the instruction interpreter. For example, it is p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massively parallel algorithms for solution of the Schrodinger equation

    Publication Year: 1994, Page(s):517 - 523
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (480 KB)

    Time-parallel algorithms for solution of the Schrodinger equation are developed. By using the Crank-Nicolson method, it is shown that the solution of the problem can be fully parallelized in time, leading to a massive temporal parallelism in the computation with a minimum of communication and synchronization requirements. Our results clearly indicate that the Crank-Nicolson method, in addition to ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The right stuff? Teaching parallel computing

    Publication Year: 1994, Page(s):956 - 961
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (508 KB)

    We consider the educational process with respect to parallel computing. Education in this area is provided by national supercomputing centers, a variety of manufacturers of parallel machines, within companies that use parallel machines, as well as by colleges and universities for both undergraduate and graduate students. The panel evaluates the current system, and debate potential (realistic) impr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A language for conveying the aliasing properties of dynamic, pointer-based data structures

    Publication Year: 1994, Page(s):208 - 216
    Cited by:  Papers (3)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (676 KB)

    High-performance architectures rely upon powerful optimizing and parallelizing compilers to maximize performance. Such compilers need accurate program analysis to enable their performance-enhancing transformations. In the domain of program analysis for parallelization, pointer analysis is a difficult and increasingly common problem. When faced with dynamic, pointer-based data structures, existing ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The block distributed memory model for shared memory multiprocessors

    Publication Year: 1994, Page(s):752 - 756
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (452 KB)

    Introduces a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topology. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes paramet... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the parallel implementation of OSI protocol processing systems

    Publication Year: 1994, Page(s):815 - 819
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (344 KB)

    In a heterogeneous computing environment, computers have to use a suitable transfer syntax to communicate with each other because of the differences in internal data representations. Transfer syntax conversions take over 90% of the total processing power needed in OSI protocol processing. Application specific architectures in a heterogeneous system may not be efficient in performing the protocol p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for programming using non-atomic variables

    Publication Year: 1994, Page(s):133 - 140
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (664 KB)

    The semantics of interprocess communication occurring through shared objects is investigated. A number of definitions of non-atomic memory are considered. Two different notions of commutativity are defined. These are later used to develop conditions under which computations using a non-atomic memory are equivalent to atomic histories with regard to the final state of the objects View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Software assistance for directory-based caches

    Publication Year: 1994, Page(s):151 - 157
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (564 KB)

    We investigate the benefit of combining directory-based schemes with software schemes as a method for maintaining cache coherence on multiprocessors. The main idea is to maintain the directory hardware while allowing eligible write references to bypass the invalidation process. Static analysis is applied to parallel programs in order to mark those eligible write references. The sample results sugg... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CCL: a portable and tunable collective communication library for scalable parallel computers

    Publication Year: 1994, Page(s):835 - 844
    Cited by:  Papers (18)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (960 KB)

    A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the collective communication library (CCL), intended for t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.