Frontiers of Massively Parallel Computation, 1995. Proceedings. Frontiers '95., Fifth Symposium on the

6-9 Feb. 1995

Filter Results

Displaying Results 1 - 25 of 62
  • Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation

    Publication Year: 1995
    Request permission for commercial reuse | |PDF file iconPDF (30 KB)
    Freely Available from IEEE
  • The DOSHARED directive in CRAFT on the Cray T3D

    Publication Year: 1995, Page(s):54 - 61
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (704 KB)

    CRAFT is a hybrid language that allows a user to program an MPP using many paradigms. The DOSHARED directive in CRAFT schedules the iterations of nested do-loops based upon the home of a specified array reference. It allows a user to have control over the location of the execution of the iterations of a loop. There are many complexities involved in the implementation of this directive; some of the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic generation of efficient array redistribution routines for distributed memory multicomputers

    Publication Year: 1995, Page(s):342 - 349
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (560 KB)

    Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We presen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Periodically regular chordal ring networks for massively parallel architectures

    Publication Year: 1995, Page(s):315 - 322
    Cited by:  Papers (9)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (632 KB)

    Chordal rings have been proposed in the past as networks that combine the simple routing framework of rings with the lower diameter, wider bisection, and higher resilience of other architectures. Virtually all proposed chordal ring networks are node-symmetric; i.e., all nodes have the same in/out degree and interconnection pattern. Unfortunately, such regular chordal rings are not scalable. The pe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Many-to-many personalized communication with bounded traffic

    Publication Year: 1995, Page(s):20 - 27
    Cited by:  Papers (17)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (572 KB)

    This paper presents solutions for the problem of many-to-many personalized communication, with bounded incoming and outgoing traffic, on a distributed memory parallel machine. We present a two-stage algorithm that decomposes the many-to-many communication with possibly high variance in message size into two communications with low message size variance. The algorithm is deterministic and takes tim... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The performance impact of false subpage sharing in KSR1

    Publication Year: 1995, Page(s):64 - 71
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (560 KB)

    This paper presents insight into important aspects of the performance of KSR1 multiprocessor. We report performance degradations caused by false sharing of memory subpages (128 bytes long units of transfer and consistency) between local caches in the KSR1. In other words, the performance is measured when multiple processing nodes issue simultaneous write requests for a single subpage. Our measurem... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A scalable, visual interface for debugging with event-based behavioral abstraction

    Publication Year: 1995, Page(s):472 - 479
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (568 KB)

    Event-based behavioral abstraction, in which models of intended program behavior are compared to actual program behavior, offers solutions to many of the debugging problems introduced by parallelism. Currently, however, its widespread application is limited by an inability to provide sufficient feedback on the mismatches between intended and actual behaviors, and an inability to provide output tha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Work-efficient nested data-parallelism

    Publication Year: 1995, Page(s):186 - 193
    Cited by:  Papers (2)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (600 KB)

    An apply-to-all construct is the key mechanism for expressing data-parallelism, but data-parallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested data-parallel computations. The technique of flattening nested parallelism introduced by Blelloch, co... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler support for out-of-core arrays on parallel machines

    Publication Year: 1995, Page(s):110 - 118
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (620 KB)

    Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient out-of-core version of the application. We are investigating a compiler-based approach to the above problem. In general, our compiler techniques attempt to choreograph I/O for an application based on high-level programmer annotations similar to Fo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic synchronisation elimination in synchronous FORALLs

    Publication Year: 1995, Page(s):350 - 357
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (648 KB)

    This paper investigates a promising optimization technique that automatically eliminates redundant synchronization barriers in synchronous FORALLs. We present complete algorithms for the necessary program restrictions and subsequent code generation. Furthermore, we discuss the correctness, complexity, and performance of our restructuring algorithm before we finally evaluate its practical usefulnes... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PERFSIM: a tool for automatic performance analysis of data-parallel Fortran programs

    Publication Year: 1995, Page(s):396 - 405
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (828 KB)

    This paper presents PERFSIM, a tool for automatic performance analysis of CM Fortran programs running on the Connection Machine CM-5. PERFSIM executes the scalar part of the program, including all of its control structure, but estimates the running time of all vector operations, including both communication and computation. Our empirical studies show that the overall estimates are accurate to with... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • APR's approach to High Performance Fortran for distributed memory multiprocessor systems

    Publication Year: 1995, Page(s):41 - 45
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (408 KB)

    The practical requirements for implementing a compilation system for High Performance Fortran are discussed, along with APR's experience solving them. As always, portability, efficiency, diagnostics, and reliability remain primary goals View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Aligning parallel arrays to reduce communication

    Publication Year: 1995, Page(s):324 - 331
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (596 KB)

    Axis and stride alignment is an important optimization in compiling data-parallel programs for distributed-memory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NP-complete in this setting, so we study heuristic methods. This paper makes two contributions. First, we... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis and optimal system configuration of hierarchical two-level COMA multiprocessors

    Publication Year: 1995, Page(s):90 - 97
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (576 KB)

    The single-bus UMA architecture with the write-once cache protocol is a simple and popular choice, however, it only works well for multiprocessors with tens of processors. For larger multiprocessors, the hierarchical COMA architecture provides excellent scalability while very well maintaining the system performance. This paper analyzes the performance of hierarchical 2-level COMA multiprocessors b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A data parallel algorithm for Boolean function manipulation

    Publication Year: 1995, Page(s):28 - 34
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (444 KB)

    This paper describes a data-parallel algorithm for boolean function manipulation. The algorithm adopts Binary Decision Diagrams (BDDs), which are the state-of-the-art approach for representing and handling boolean functions. The algorithm is well suited for SIMD architectures and is based on distributing BDD nodes to the available Processing Elements and traversing BDDs in a breadth-first manner. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multi-cache coherence scheme for shuffle-exchange network based multiprocessors

    Publication Year: 1995, Page(s):72 - 79
    Cited by:  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (740 KB)

    As VLSI technology continues to increase the speed of microprocessors, their effective use in a shared memory multiprocessor model has become a primary challenge. This has placed greater burden on the interconnection network which must efficiently satisfy the bandwidth requirements of these powerful processors at an effective cost. In this paper, we evaluate the performance of a memory-coherent si... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time- and VLSI-optimal convex hull computation on meshes with multiple broadcasting

    Publication Year: 1995, Page(s):506 - 513
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (616 KB)

    Computing the convex hull of a planar set of points is one of the most extensively investigated topics in computational geometry. Our main contribution is to present the first known general-case, time- and VLSI-optimal, algorithm for convex hull computation on meshes with multiple broadcasting. Specifically, we show that for every choice of a positive integer constant c, the convex hull of a set o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The performance impact of data placement for wavelet decomposition of two-dimensional image data on SIMD machines

    Publication Year: 1995, Page(s):246 - 251
    Cited by:  Papers (4)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (456 KB)

    Wavelet transform is a mathematical tool through which 2D spatial image data can be mapped into wavelet space for compact representation and for various signal analyses. The highly regular structure of the wavelet decomposition algorithm makes it well-suited for fine-grained parallelization. Most existing parallelization approaches focus on how to map computing functions to processors, but pay lit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Runtime support for execution of fine grain parallel code on coarse grain multiprocessors

    Publication Year: 1995, Page(s):440 - 447
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (636 KB)

    The goal of this research is to provide systems support that allows fine grain, data parallel code to execute efficiently on much coarser grain multiprocessors. The task of writing parallel applications is simplified by allowing the programmer to assume a number of processors convenient to the algorithm being implemented. This paper describes and evaluates a runtime approach that efficiently manag... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visualizing distributed data structures

    Publication Year: 1995, Page(s):480 - 487
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (552 KB)

    A new programming style for large-scale parallel programs centered around distributed data structures has emerged. The current parallel program visualization tools were intended for the old style and do not deal with distributed data structures. We show, with several examples of visualizations and animations developed for large scale pC++ programs, that visualizing and animating distributed data s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing irregular computations on SIMD machines: a case study

    Publication Year: 1995, Page(s):222 - 230
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (676 KB)

    Data-parallel computations with regular structure fixed data size and predictable control patterns can be implemented efficiently on SIMD architectures. However many large applications have irregular structure, either data sets that vary in size as the computation progresses or control structures that select different subsets of the processors at each stage of the computation. In this paper we des... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A data parallel C and its platforms

    Publication Year: 1995, Page(s):194 - 202
    Cited by:  Papers (1)  |  Patents (48)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (668 KB)

    dbC is a data parallel extension to ANSI C similar to Thinking Machines C* and MasPar MPL. To facilitate bit-oriented computation, dbC supports computation with arbitrary precision integer data, bit string extraction and insertion, and function parameters with dynamic bit length. In this paper, we describe dbC and its mapping to three very different architectures: (1) Terasys, an experimental SIMD... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A data management approach for handling large compressed arrays in high performance computing

    Publication Year: 1995, Page(s):119 - 128
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (752 KB)

    Poor parallel i/o performance has recently been recognized as a roadblock to scalability of parallel architectures, algorithms, and data sets. For i/o of large arrays, the storage of arrays by subarray divisions-chunking-has been shown to improve i/o performance substantially in many circumstances, In this paper we show how to increase the performance advantages of chunking by combining it with da... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and analysis of product networks

    Publication Year: 1995, Page(s):521 - 528
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (708 KB)

    In this paper a unified theory of Cartesian product networks is developed. Product networks (PN) include meshes, tori, and hypercubes among others. This paper studies the fundamental issues of topological properties, cost-performance ratio optimization, scalability, routing, embedding, and fault tolerance properties of PNs. In particular, the degree, diameter, average distance, connectivity, and n... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallelization and performance of three-dimensional plasma simulation

    Publication Year: 1995, Page(s):148 - 155
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (600 KB)

    Plasma-assisted materials processing is being applied for several applications including manufacture of integrated circuits. A need of these applications is the ability to computer simulate the plasma systems and plasma processes. The motion of particles in particle type plasma simulations is inherently parallel. Further, the solution to Poisson's equation and the calculation of local electric and... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.