Proceedings of International Conference on Parallel Processing

15-19 April 1996

Filter Results

Displaying Results 1 - 25 of 135
  • Proceedings of International Conference on Parallel Processing

    Publication Year: 1996
    Request permission for commercial reuse | |PDF file iconPDF (565 KB)
    Freely Available from IEEE
  • Ocean circulation on the Intel Paragon: modeling and implementation

    Publication Year: 1996
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (884 KB)

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nested parallel call optimization

    Publication Year: 1996, Page(s):225 - 229
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (577 KB)

    We present a novel optimization called Last Parallel Call Optimization (LPCO) for parallel systems. The last parallel call optimization can be regarded as a parallel extension of last call optimization found in sequential systems. While the LPCO is fairly general, we use and-parallel logic programming systems to illustrate it and to report its performance on multiprocessor systems. The last parall... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Panel on "For a Massive Number of Massively Parallel Machines: What are the Target Applications, Who

    Publication Year: 1996
    Request permission for commercial reuse | |PDF file iconPDF (358 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1996
    Request permission for commercial reuse | |PDF file iconPDF (218 KB)
    Freely Available from IEEE
  • Generating realignment-based communication for HPF programs

    Publication Year: 1996, Page(s):364 - 371
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (712 KB)

    This paper presents methods for generating communication on compiling HPF programs for distributed-memory machines. We introduce the concept of an iteration template corresponding to an iteration space. Our HPF compiler performs the loop iteration mapping through the two-level mapping of the iteration template in the same way as the data mapping is performed in HPF. Making use of this unified mapp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Complete parallelization of computations: integration of data partitioning and functional parallelism for dynamic data structures

    Publication Year: 1996, Page(s):354 - 360
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (648 KB)

    This paper presents a parallel programming system which: supports complete parallelization of array-oriented computations through a coherent integration of data partitioning parallelization and functional decomposition based parallelization; and implements a declarative representation of operations over distributed dynamic arrays. The conceptual framework for this integration is a generalized depe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An adaptive approach to data placement

    Publication Year: 1996, Page(s):349 - 353
    Cited by:  Papers (5)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (492 KB)

    Programming distributed-memory machines requires careful placement of data to balance the computational load among the nodes and minimize excess data movement between the nodes. Most current approaches to data placement require the programmer or compiler to place data initially and then possibly to move it explicitly during a computation. This paper describes a new, adaptive approach. It is implem... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hector: automated task allocation for MPI

    Publication Year: 1996, Page(s):344 - 348
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (512 KB)

    Many institutions already have networks of workstations, which could potentially be harnessed as a powerful parallel processing resource. A new, automatic task allocation system has been built on top of MPI, an environment that permits parallel programming by using the message-passing paradigm and implemented in extensions to C and FORTRAN. This system, known as “Hector”, supports dyna... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The parallel break construct, or how to kill an activity tree

    Publication Year: 1996, Page(s):230 - 234
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (444 KB)

    Most parallel languages provide means to express parallelism, e.g. a parallel-do construct, but no means to terminate the parallel activities spawned by such constructs. We propose three high-level primitives for this purpose, which are defined by analogies with primitives that break out of sequential iterative constructs. The primitives are pcontinue, which terminates the calling activity, pbreak... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of scalable blocking locks using an adaptive thread scheduler

    Publication Year: 1996, Page(s):339 - 343
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (448 KB)

    Blocking locks are commonly used in parallel programs to improve application performance and system throughput. However, most implementations of such locks suffer from two major problems-latency and scalability. We propose an implementation of blocking locks using scheduler adaptation which exploits the interaction between thread schedulers and locks. By experimentation using well-known multiproce... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping techniques for parallel evaluation of chains of recurrences

    Publication Year: 1996, Page(s):620 - 624
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (488 KB)

    This paper examines the parallelization of a technique for speeding up the evaluation of potentially-complex real-valued functions at a large number of points. The technique being parallelized generates a chain of recurrences (CR) which is then used to compute the function incrementally (i.e., by using the results of one iteration in calculating the value of the function in the next iteration). Th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On some global operations in faulty SIMD hypercubes

    Publication Year: 1996, Page(s):579 - 583
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (440 KB)

    Extends our prior results [Raghavendra & Sridhar (1992, 1993), Sengupta & Raghavendra (1994)] to obtain algorithms for performing all-to-all broadcast, global sum and broadcast operation in an N-node (N=2n), n-dimensional faulty SIMD hypercube, Qn (n⩾9), with the number of faults n-1<f⩽2n-3. We also discuss optimal algorithms for one-to-all personalized bro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous compression of makespan and number of processors using CRP

    Publication Year: 1996, Page(s):332 - 338
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (716 KB)

    This paper presents a new 2D compression (2DC) method for solving the multiprocessor scheduling (MS) problems to simultaneously achieve both objectives of minimizing the makespan and the number of processors used. Most existing approaches tend to focus on a very specific range of the MS problems, while risking the loss of the solution quality elsewhere. 2DC synthesizes two main classes of compress... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dome: parallel programming in a distributed computing environment

    Publication Year: 1996, Page(s):218 - 224
    Cited by:  Papers (9)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (644 KB)

    The Distributed object migration environment (Dome) addresses three major issues of distributed parallel programming: ease of use, load balancing, and fault tolerance. Dome provides process control, data distribution, communication, and synchronization for Dome programs running in a heterogeneous distributed computing environment. The parallel programmer writes a C++ program using Dome objects whi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiling MATLAB programs to ScaLAPACK: exploiting task and data parallelism

    Publication Year: 1996, Page(s):613 - 619
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (540 KB)

    We suggest a new approach aimed at reducing the effort required to program distributed-memory multicomputers. The key idea in our approach is to automatically convert a program written in a library-based programming language (MATLAB) to a parallel program based on the ScaLAPACK parallel library. In the process of performing this conversion, we apply compiler optimizations that simultaneously explo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributing tokens on a hypercube without error accumulation

    Publication Year: 1996, Page(s):573 - 578
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (532 KB)

    The problem of load balancing on the hypercube is considered. A number of tokens are placed at the nodes and the goal is to redistribute them evenly throughout the network. Initially, each of the p nodes stores up to m tokens. A simple algorithm is presented, operating in 𝒪(log p+m·log log p) time on average View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A TeraFLOP supercomputer in 1996: the ASCI TFLOP system

    Publication Year: 1996, Page(s):84 - 93
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (884 KB)

    To maintain the integrity of the US nuclear stockpile without detonating nuclear weapons, the DOE needs the results of computer-simulations that overwhelm the world's most powerful supercomputers. Responding to this need, the US Department of Energy (DOE) initiated the Accelerated Strategic Computing Initiative (ASCI). This program accelerates the development of new scalable supercomputers resulti... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource placement in torus-based networks

    Publication Year: 1996, Page(s):327 - 331
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (388 KB)

    This paper investigates methods to locate system resources, such as expensive hardware or software modules, to provide the most effective cost/performance tradeoffs in a torus parallel machine. This paper contains some solutions to perfect distance-t and perfect/quasi-perfect j-adjacency placement in a κ-ary n-cube and a torus using Lee (1958) distance error-correcting codes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Converse: an interoperable framework for parallel programming

    Publication Year: 1996, Page(s):212 - 217
    Cited by:  Papers (26)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (640 KB)

    Many different parallel languages and paradigms have been developed, each with its own advantages. To benefit from all of them, it should be possible to link together modules written in different parallel languages in a single application. Since the paradigms sometimes differ in fundamental ways, this is difficult to accomplish. This paper describes a framework, Converse, that supports such multi-... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of the numerical effects of parallelism on a parallel genetic algorithm

    Publication Year: 1996, Page(s):606 - 612
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (644 KB)

    Examines the effects of relaxed synchronization on both the numerical and parallel efficiency of parallel genetic algorithms (GAs). We describe a coarse-grain geographically structured parallel genetic algorithm. Our experiments provide preliminary evidence that asynchronous versions of these algorithms have a lower run-time than synchronous GAs. Our analysis shows that this improvement is due to ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determining asynchronous acyclic pipeline execution times

    Publication Year: 1996, Page(s):568 - 572
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (412 KB)

    Pipeline execution is a form of parallelism in which sub-computations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and to evaluate alternative scheduling choices. We derive a formula for precisely determining the asynch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Study of scalable declustering algorithms for parallel grid files

    Publication Year: 1996, Page(s):434 - 440
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (664 KB)

    The efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations, such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known ac... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A study of high-performance communication mechanism for multicomputer systems

    Publication Year: 1996, Page(s):76 - 83
    Cited by:  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (564 KB)

    Based on our analysis of the behavior of standard UNIX communication software, we propose a new communication mechanism which reduces the software overhead. This mechanism is optimized to be used in business applications, such as those of online transaction processing and database management. These applications are currently designed to operate in multiple concurrent processes, thus requiring mult... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PACK/UNPACK on coarse-grained distributed memory parallel machines

    Publication Year: 1996, Page(s):320 - 324
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (488 KB)

    PACK/UNPACK are Fortran 9O/HPF array construction functions which derive new arrays from existing arrays. We present algorithms for performing these operations on coarse-grained parallel machines. Our algorithms are relatively architecture independent and can be applied to arrays of arbitrary dimensions with arbitrary distribution along every dimension. Experimental results are presented on the CM... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.