By Topic

Proceedings of IEEE Scalable High Performance Computing Conference

23-25 May 1994

Filter Results

Displaying Results 1 - 25 of 110
  • Proceedings of IEEE Scalable High Performance Computing Conference

    Publication Year: 1994
    Request permission for commercial reuse | PDF file iconPDF (164 KB)
    Freely Available from IEEE
  • Adaptive runtime support for direct simulation Monte Carlo methods on distributed memory architectures

    Publication Year: 1994, Page(s):176 - 183
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (696 KB)

    In highly adaptive irregular problems such as many particle-in-cell (PIC) codes and direct simulation Monte Carlo (DSMC) codes, data access patterns may vary from time step to time step. This fluctuation may hinder efficient utilization of distributed memory parallel computers because of the resulting overhead for data redistribution and dynamic load balancing. To efficiently parallelize such adap... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of a fully parallel sparse solver

    Publication Year: 1994, Page(s):334 - 341
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (564 KB)

    The performance of a fully parallel direct solver for large sparse symmetric positive definite systems of linear equations is demonstrated. The solver is designed for distributed-memory message-passing parallel computer systems, particularly massively parallel machines. All phases of the computation, including symbolic processing as well as numeric factorization and triangular solution, are perfor... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adhara: runtime support for dynamic space-based applications on distributed memory MIMD multiprocessors

    Publication Year: 1994, Page(s):168 - 175
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (604 KB)

    We describe Adhara, a runtime system specialized for dynamic space-based applications, such as particle-in-cell simulations, molecular dynamics problems and adaptive grid simulations. Adhara facilitates the programming of such applications by supporting spatial data structures (e.g., grids and particles), and facilitates obtaining good performance by performing automatic data partitioning and dyna... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hypercube algorithm for image component labeling

    Publication Year: 1994, Page(s):259 - 262
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (328 KB)

    Labeling the connected regions of a digitized image is a fundamental computation in image analysis and computer vision. By assigning a unique label to each connected region, higher level image operations can identify, extract, and process different connected regions separately. Because of its primary importance, the problem has attracted research in developing parallel algorithms. Most of the rese... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SDBS: a task duplication based optimal scheduling algorithm

    Publication Year: 1994, Page(s):756 - 763
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (524 KB)

    An efficient scheduling algorithm is one of the key factors in determining the performance of distributed memory machines. The paper presents a search and duplication based scheduling (SDBS) algorithm which can schedule directed acyclic graphs (DAGs). The complexity of this scheduling algorithm is in O(V+E), where V is the number of nodes and E is the number of edges in the task graph. This algori... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load-balancing algorithms for climate models

    Publication Year: 1994, Page(s):674 - 681
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (576 KB)

    Implementations of climate models on scalable parallel computer systems can suffer from load imbalances due to temporal and spatial variations in the amount of computation required for physical parameterizations such as solar radiation and convective adjustment. We have developed specialized techniques for correcting such imbalances. These techniques are incorporated in a general-purpose, programm... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable implementations of multipole-accelerated algorithms for molecular dynamics

    Publication Year: 1994, Page(s):87 - 94
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB)

    We consider efficient, scalable solutions to the long-range force computation problem in molecular dynamics (MD) simulation. Straightforward implementation of a solver for the time-consuming Coulomb force yields O(N2) runtime for N atoms in a system; this quadratic complexity limits the size of systems that can be simulated. Exclusion of interactions beyond a certain cutoff radius reduc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed array data management on NUMA multiprocessors

    Publication Year: 1994, Page(s):551 - 559
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (616 KB)

    Management of program data to reduce false sharing and improve locality is critical for scaling performance on NUMA multiprocessors. We use HPF-like directives to partition and place arrays in data-parallel applications on Hector, a shared-memory NUMA multiprocessor. We present experimental results that demonstrate the magnitude of the performance improvement attainable when our proposed array man... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient single copy cache coherence protocol for multiprocessors with multistage interconnection networks

    Publication Year: 1994, Page(s):1 - 8
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (720 KB)

    Multistage interconnection networks offer an efficient, scalable, and cost effective solution for the problem of connecting processors to memory in a shared memory multiprocessor system. In this paper, we present an efficient single copy cache coherence protocol for multiprocessors with multistage interconnection networks. Our protocol depends on incorporating the cache memory into the switches (c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers

    Publication Year: 1994, Page(s):324 - 333
    Cited by:  Papers (4)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (736 KB)

    Sparse Cholesky factorization has historically achieved extremely low performance on distributed memory multiprocessors. Three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate paralle... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient runtime support for parallelizing block structured applications

    Publication Year: 1994, Page(s):158 - 167
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (852 KB)

    Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). We describe a runtime library for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. This runtime library is imple... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load balancing of parallel volume rendering with scattered decomposition

    Publication Year: 1994, Page(s):252 - 258
    Cited by:  Papers (6)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (440 KB)

    A scheme for the visualization of large data volumes using volume rendering on a distributed memory MIMD system is described. The data to be rendered is decomposed into subvolumes to reside in the local memories of the system's nodes. A partial image of the local data is generated at each node by ray tracing, and is then composited with partial images on other nodes in the correct order to generat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High Performance Fortran interface to the parallel C++

    Publication Year: 1994, Page(s):301 - 308
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    Describes the design of a High Performance Fortran (HPF) interface to the parallel C++ (pC++) programming language. The pC++/HPF interface provides a mechanism for users to link pC++ programs with Fortran subroutines so that they can take advantage of both the fast computing speed of Fortran and the object-oriented programming paradigm of C++. We discuss the design of the Fortran interface and ill... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel semi-Lagrangian advection on the sphere using PVM

    Publication Year: 1994, Page(s):470 - 477
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    Numerical methods for solving the advection problem in spherical geometry are examined in conjunction with techniques for solving the shallow-water equations on the sphere. Eulerian methods are restricted by the Courant-Friedrichs-Lewy (CFL) condition and the semi-Lagrangian method is an alternative approach for taking longer time steps. Recent progress in the development of distributed MIMD paral... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PCG: a software package for the iterative solution of linear systems on scalar, vector and parallel computers

    Publication Year: 1994, Page(s):811 - 816
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (412 KB)

    The PCG package is a software system for solving systems of linear equations by means of preconditioned conjugate gradient (PCG)-type iterative methods on a variety of computer architectures. The software is designed to give high performance with a nearly identical user interface across different scalar, vector and parallel platforms, as well as across different programming models, such as shared-... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel algorithm for large-scale linear programs with a special structure

    Publication Year: 1994, Page(s):749 - 755
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (448 KB)

    A new sequential algorithm and computational results for large-scale linear programs with a special structure were presented previously by J.B. Rosen and S. Oh (1992). A parallel version of the algorithm is developed for a hypercube multiprocessor architecture NCUBE2. Computational results using 128 processors are presented for a randomly generated large-scale sparse or dense problems with the num... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Randomized load balancing for tree-structured computation

    Publication Year: 1994, Page(s):666 - 673
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (628 KB)

    Studies the performance of a randomized algorithm for balancing load across a multiprocessor executing a dynamic irregular task tree. Specifically, we show that the time taken to explore a task tree is likely to be within a small constant factor of an inherent lower bound for the tree instance. Our model permits arbitrary task times and overlap between computation and load balance, and thus extend... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A scalable high-performance I/O system

    Publication Year: 1994, Page(s):79 - 86
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (520 KB)

    A significant weakness of many existing parallel supercomputers is their lack of high-performance parallel I/O. This weakness has prevented, in many cases, the full exploitation of the true potential of MPP systems. As part of a joint project with IBM, we have designed a parallel I/O system for an IBM SP system that can provide sustained I/O rates of greater than 160 MB/s from collections of compu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The architecture and programming of the ISI Embeddable Variant Multicomputer

    Publication Year: 1994, Page(s):134 - 141
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (588 KB)

    The ISI Embeddable Variant is a 16-node, physically-compact, modular multicomputer designed for applications in embedded computing where issues of size, weight, and power are of paramount importance. Through the use of aggressive packaging techniques and a node architecture that is tailored to the requirements of the packaging technology, EV demonstrates a very high compute density per unit volume... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A user-level process package for PVM

    Publication Year: 1994, Page(s):48 - 55
    Cited by:  Papers (8)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (788 KB)

    This paper describes an approach to supporting efficient processor virtualization and dynamic load balancing for message-based, parallel programs. Specifically, a user-level process package (UPVM) for SPMD-style PVM applications is presented. UPVM supports light-weight virtual processors that are transparently and independently migratable. It also implements a source-code compatible PVM interface,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Matrix-vector multiplication and conjugate gradient algorithms on distributed memory computers

    Publication Year: 1994, Page(s):542 - 550
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (508 KB)

    The critical bottlenecks in the implementation of the conjugate gradient algorithm on distributed memory computers are the communication requirements of the sparse matrix-vector multiply and of the vector recurrences. In a previous paper (G. Lewis et al., 1993), we described the data distribution and communication patterns of several implementations of parallel matrix-vector multiplication, demons... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cluster-C*: understanding the performance limits

    Publication Year: 1994, Page(s):229 - 238
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (768 KB)

    Data parallel languages are gaining interest as it becomes clear that they support a wider range of computation than previously believed. With improved network technology, it is now feasible to build data parallel supercomputers using traditional RISC-based workstations connected by a highspeed network. The paper presents an in-depth look at the communication behavior of nine C* program... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Complete exchange and broadcast algorithms for meshes

    Publication Year: 1994, Page(s):422 - 428
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (468 KB)

    Two complete exchange algorithms for meshes are given. The modified quadrant exchange algorithm is based on the quadrant exchange algorithm and it is well suited for square meshes with a power of two rows and columns. The store-and-forward complete exchange algorithm is suitable for meshes of arbitrary size. A pipelined broadcast algorithm for meshes is also presented. This new algorithm, called t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The scalability of decoupled multiprocessors

    Publication Year: 1994, Page(s):17 - 22
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (412 KB)

    We consider the ability of the technique of decoupling to improve the scalability of multiprocessors which have physically distributed memory but which support a shared memory model of computation. We consider the performance of a variety of similar such architectures; those with and without caching and those with and without decoupling. As a metric of scalability we focus on the speedup of these ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.