Frontiers of Massively Parallel Computation, 1995. Proceedings. Frontiers '95., Fifth Symposium on the

6-9 Feb. 1995

Filter Results

Displaying Results 1 - 25 of 62
  • Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation

    Publication Year: 1995
    Request permission for commercial reuse | PDF file iconPDF (30 KB)
    Freely Available from IEEE
  • Parallel homologous sequence searching in large databases

    Publication Year: 1995, Page(s):231 - 237
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (548 KB)

    We present a parallel computational method for retrieving similar sequences from large genetic and protein databases using a dynamic programming comparison algorithm. Two previously published parallel methods for performing this task are first discussed and evaluated. The advantages of these two parallel methods are combined and incorporated into our new method to obtain better performance than ei... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal parallel algorithm for volume ray casting

    Publication Year: 1995, Page(s):238 - 245
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB)

    Volume rendering by ray casting is a computationally expensive problem. For interactive volume visualization, rendering has to be done in real time (30 frames/sec). Since the typical 3-D dataset size is 256 3 the use of parallel processing is imperative. In this paper, we present an O(log n) EREW algorithm for volume rendering using O(n3) processors which can be optimized to ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel graph partitioner on a distributed memory multiprocessor

    Publication Year: 1995, Page(s):360 - 366
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (472 KB)

    In order to realize the full potential of speed-up by parallelization, it is essential to partition a problem into small tasks with minimal interactions without making this process itself a bottleneck. We present a method for graph partitioning that is suitable for parallel implementation and scales well with the number of processors and the problem size. Our algorithm uses hierarchical partitioni... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The performance impact of data placement for wavelet decomposition of two-dimensional image data on SIMD machines

    Publication Year: 1995, Page(s):246 - 251
    Cited by:  Papers (4)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (456 KB)

    Wavelet transform is a mathematical tool through which 2D spatial image data can be mapped into wavelet space for compact representation and for various signal analyses. The highly regular structure of the wavelet decomposition algorithm makes it well-suited for fine-grained parallelization. Most existing parallelization approaches focus on how to map computing functions to processors, but pay lit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel remapping algorithms for adaptive problems

    Publication Year: 1995, Page(s):367 - 374
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB)

    We present fast parallel algorithms for remapping a class of irregular and adaptive problems on coarse-grained distributed-memory machines. We show that the remapping of these applications, using simple index-based mapping algorithms, can be reduced to sorting a nearly sorted list of integers or merging an assorted list of integers with a sorted list of integers. By using the algorithms we have de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of cost of performing communications using various communication mechanisms

    Publication Year: 1995, Page(s):290 - 297
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (652 KB)

    There is a trend towards incorporating architectural features which support mechanisms for efficiently communicating long messages and (fixed-size) short messages. In this paper, we provide a framework for analyzing the communication costs in parallel machines which provide various communication mechanisms. First, we define four parameters to accurately model the communication time in these parall... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The practicality of SIMD for scientific computing

    Publication Year: 1995, Page(s):258 - 264
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    While not popular at present for scientific computing, SIMD has proven to be a practical option for tackling a variety of scientific problems. The MasPar MP-2 at NASA/Goddard Space Flight Center is being successfully employed for Grand Challenges in Earth and space science. This paper considers the applicability, specific implementation and performance of five algorithms-astrophysical fluid dynami... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance debugging based on scalability analysis

    Publication Year: 1995, Page(s):406 - 413
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (676 KB)

    This paper presents scalability as a basis for profiling and performance debugging of parallel programs, as only the purely scalable code runs efficiently in parallel. The approach is based on separating scalable and various kinds of non-scalable parts of a program, identifying the reasons for non-scalability, and focusing the programmer's attention on why and where non-scalable execution is occur... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visualizing distributed data structures

    Publication Year: 1995, Page(s):480 - 487
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (552 KB)

    A new programming style for large-scale parallel programs centered around distributed data structures has emerged. The current parallel program visualization tools were intended for the old style and do not deal with distributed data structures. We show, with several examples of visualizations and animations developed for large scale pC++ programs, that visualizing and animating distributed data s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Falcon: on-line monitoring and steering of large-scale parallel programs

    Publication Year: 1995, Page(s):422 - 429
    Cited by:  Papers (25)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (668 KB)

    Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such program steering is to improve the application's performance or to affect its execution behavior. This paper presents the framework of the Falcon system and its implementation, and then evaluates the performance of the system. A complex sample application, a molecular dynamics simulation pr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and analysis of product networks

    Publication Year: 1995, Page(s):521 - 528
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (708 KB)

    In this paper a unified theory of Cartesian product networks is developed. Product networks (PN) include meshes, tori, and hypercubes among others. This paper studies the fundamental issues of topological properties, cost-performance ratio optimization, scalability, routing, embedding, and fault tolerance properties of PNs. In particular, the degree, diameter, average distance, connectivity, and n... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Code generation for multiple mappings

    Publication Year: 1995, Page(s):332 - 341
    Cited by:  Papers (17)  |  Patents (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (732 KB)

    There has been a great amount of recent work toward unifying iteration reordering transformations. Many of these approaches represent transformations as affine mappings from the original iteration space to a new iteration space. These approaches show a great deal of promise, but they all rely on the ability to generate code that iterates over the points in these new iteration spaces in the appropr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the influence of partitioning schemes on the efficiency of overlapping domain decomposition methods

    Publication Year: 1995, Page(s):375 - 384
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (688 KB)

    One level overlapping Schwarz domain decomposition preconditioners can be viewed as a generalization of block Jacobi preconditioning. The effect of the number of blocks and the amount of overlapping between blocks on the convergence rate is well understood. This paper considers the related issue of the effect of the scheme used to partition the matrix into blocks on the convergence rate of the pre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectrum analysis and min-cut transformation of communication networks in parallel computers

    Publication Year: 1995, Page(s):298 - 307
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (792 KB)

    We present a formal model for the analysis of communication networks in parallel computers. Unlike most others, our model focuses on the transmission delays as opposed to the propagation delays of communication patterns. The model allows all symmetric communication networks to be examined by their spectrums and characterized by their transmission dimensions. A min-cut transformation is introduced ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Characteristics of the MasPar parallel I/O system

    Publication Year: 1995, Page(s):265 - 272
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (520 KB)

    Input/output speed continues to present a performance challenge for high-performance computing systems. This is because technology improves processor speed, memory speed and capacity, and disk capacity at a much higher-rate than mass storage latency. Developments in I/O architecture have been attempting to reduce this performance gap. The MasPar I/O architecture includes many interesting features.... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Migrating from PVM to MPI.I. The Unify system

    Publication Year: 1995, Page(s):488 - 495
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (604 KB)

    This paper presents a new kind of portability system, Unify, which modifies the PVM message passing system to provide (currently a subset of) the message Passing Interface (MPI) standard notation for message passing. Unify is designed to reduce the effort of learning MPI, while providing a sensible means to make use of MPI libraries and MPI calls (while applications continue to run in the PVM envi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Runtime support for data parallel tasks

    Publication Year: 1995, Page(s):432 - 439
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (576 KB)

    We have recently introduced Opus, a set of Fortran language extensions that provide shared data abstractions (SDAs) as a mechanism for communication and synchronization among coarse-grain data parallel tasks. In this paper, we discuss the design and implementation issues of the runtime system necessary to support SDAs, and outline the underlying requirements for such a runtime system. We explore t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A broadcast algorithm for all-port wormhole-routed torus networks

    Publication Year: 1995, Page(s):529 - 536
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (612 KB)

    A new approach to broadcast in wormhole-routed two- and three-dimensional torus networks is proposed. The approach extends the concept of dominating sets from graph theory by accounting for the relative distance-insensitivity of the wormhole routing switching strategy and by taking advantage of an allport communication architecture. The resulting broadcast operation is based on a tree structure th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Migrating CM Fortran applications to HPF

    Publication Year: 1995, Page(s):37 - 40
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (320 KB)

    In the course of developing pghpf, the PGI High Performance Fortran compiler, several CM Fortran applications have been converted to HPF and run on multiple platforms as part of customer benchmarking exercises. The existing base of CM Fortran codes, along with MP Fortran codes developed for the MasPar machines, represent perhaps the largest body of existing data parallel applications. With product... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic generation of efficient array redistribution routines for distributed memory multicomputers

    Publication Year: 1995, Page(s):342 - 349
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We presen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploitation of control parallelism in data parallel algorithms

    Publication Year: 1995, Page(s):385 - 392
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (540 KB)

    This paper considers the matrix decomposition A=LDLT, as a vehicle to explore the improvement in performance obtainable through the execution of multiple streams of control on SIMD architectures. Several methods for partitioning the SIMD array are considered. Architectural support for and feasibility of using control parallelism in SIMD algorithms is briefly considered. Techniques for c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of finite buffered multistage interconnection networks under first-blocked-first-unblock conflict resolution

    Publication Year: 1995, Page(s):308 - 314
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (492 KB)

    Previous models of multistage interconnection networks assume that when a conflict occurs for an output-port of a switching element (SE), the conflict is resolved randomly at each network cycle. While this assumption leads to simple analytic models, packets that are left behind because of the backpressure mechanism will contend for the same output-port of the SE in the following cycle. In our mode... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Periodically regular chordal ring networks for massively parallel architectures

    Publication Year: 1995, Page(s):315 - 322
    Cited by:  Papers (9)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    Chordal rings have been proposed in the past as networks that combine the simple routing framework of rings with the lower diameter, wider bisection, and higher resilience of other architectures. Virtually all proposed chordal ring networks are node-symmetric; i.e., all nodes have the same in/out degree and interconnection pattern. Unfortunately, such regular chordal rings are not scalable. The pe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An object-oriented approach to nested data parallelism

    Publication Year: 1995, Page(s):203 - 210
    Cited by:  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (600 KB)

    This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs data aggregates called “collections” and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for “nested data ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.