Frontiers of Massively Parallel Computation, 1995. Proceedings. Frontiers '95., Fifth Symposium on the

6-9 Feb. 1995

Filter Results

Displaying Results 1 - 25 of 62
  • Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation

    Publication Year: 1995
    Request permission for commercial reuse | PDF file iconPDF (30 KB)
    Freely Available from IEEE
  • An optimal parallel algorithm for volume ray casting

    Publication Year: 1995, Page(s):238 - 245
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB)

    Volume rendering by ray casting is a computationally expensive problem. For interactive volume visualization, rendering has to be done in real time (30 frames/sec). Since the typical 3-D dataset size is 256 3 the use of parallel processing is imperative. In this paper, we present an O(log n) EREW algorithm for volume rendering using O(n3) processors which can be optimized to ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The DEC High Performance Fortran 90 compiler front end

    Publication Year: 1995, Page(s):46 - 53
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (584 KB)

    Digital has developed a compiler for full Fortran 90 and the High Performance Fortran extensions to Fortran 90. This compiler targets Digital's Alpha workstations, servers, shared-memory SMP servers, and distributed memory AdvantageCluster and workstation farm systems. This paper gives an overview of the structure of the compiler's front end component, responsible for lexical analysis, syntax anal... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and analysis of product networks

    Publication Year: 1995, Page(s):521 - 528
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (708 KB)

    In this paper a unified theory of Cartesian product networks is developed. Product networks (PN) include meshes, tori, and hypercubes among others. This paper studies the fundamental issues of topological properties, cost-performance ratio optimization, scalability, routing, embedding, and fault tolerance properties of PNs. In particular, the degree, diameter, average distance, connectivity, and n... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The performance impact of data placement for wavelet decomposition of two-dimensional image data on SIMD machines

    Publication Year: 1995, Page(s):246 - 251
    Cited by:  Papers (4)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (456 KB)

    Wavelet transform is a mathematical tool through which 2D spatial image data can be mapped into wavelet space for compact representation and for various signal analyses. The highly regular structure of the wavelet decomposition algorithm makes it well-suited for fine-grained parallelization. Most existing parallelization approaches focus on how to map computing functions to processors, but pay lit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The DOSHARED directive in CRAFT on the Cray T3D

    Publication Year: 1995, Page(s):54 - 61
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (704 KB)

    CRAFT is a hybrid language that allows a user to program an MPP using many paradigms. The DOSHARED directive in CRAFT schedules the iterations of nested do-loops based upon the home of a specified array reference. It allows a user to have control over the location of the execution of the iterations of a loop. There are many complexities involved in the implementation of this directive; some of the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel remapping algorithms for adaptive problems

    Publication Year: 1995, Page(s):367 - 374
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB)

    We present fast parallel algorithms for remapping a class of irregular and adaptive problems on coarse-grained distributed-memory machines. We show that the remapping of these applications, using simple index-based mapping algorithms, can be reduced to sorting a nearly sorted list of integers or merging an assorted list of integers with a sorted list of integers. By using the algorithms we have de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A broadcast algorithm for all-port wormhole-routed torus networks

    Publication Year: 1995, Page(s):529 - 536
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (612 KB)

    A new approach to broadcast in wormhole-routed two- and three-dimensional torus networks is proposed. The approach extends the concept of dominating sets from graph theory by accounting for the relative distance-insensitivity of the wormhole routing switching strategy and by taking advantage of an allport communication architecture. The resulting broadcast operation is based on a tree structure th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MICA: a mapped interconnection-cached architecture

    Publication Year: 1995, Page(s):80 - 89
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (688 KB)

    MICA (Mapped Interconnection-Cached Architecture) is a novel architecture combining large reconfigurable networks and small, fast on-line routing, crossbar switches. It offers a good match for parallel applications exhibiting switching locality. Switching locality means that the need to “switch” or route the information to or from each PE is limited to a small set of sources or destina... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Runtime support for data parallel tasks

    Publication Year: 1995, Page(s):432 - 439
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (576 KB)

    We have recently introduced Opus, a set of Fortran language extensions that provide shared data abstractions (SDAs) as a mechanism for communication and synchronization among coarse-grain data parallel tasks. In this paper, we discuss the design and implementation issues of the runtime system necessary to support SDAs, and outline the underlying requirements for such a runtime system. We explore t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Migrating from PVM to MPI.I. The Unify system

    Publication Year: 1995, Page(s):488 - 495
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (604 KB)

    This paper presents a new kind of portability system, Unify, which modifies the PVM message passing system to provide (currently a subset of) the message Passing Interface (MPI) standard notation for message passing. Unify is designed to reduce the effort of learning MPI, while providing a sensible means to make use of MPI libraries and MPI calls (while applications continue to run in the PVM envi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The practicality of SIMD for scientific computing

    Publication Year: 1995, Page(s):258 - 264
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    While not popular at present for scientific computing, SIMD has proven to be a practical option for tackling a variety of scientific problems. The MasPar MP-2 at NASA/Goddard Space Flight Center is being successfully employed for Grand Challenges in Earth and space science. This paper considers the applicability, specific implementation and performance of five algorithms-astrophysical fluid dynami... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Migrating CM Fortran applications to HPF

    Publication Year: 1995, Page(s):37 - 40
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (320 KB)

    In the course of developing pghpf, the PGI High Performance Fortran compiler, several CM Fortran applications have been converted to HPF and run on multiple platforms as part of customer benchmarking exercises. The existing base of CM Fortran codes, along with MP Fortran codes developed for the MasPar machines, represent perhaps the largest body of existing data parallel applications. With product... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PERFSIM: a tool for automatic performance analysis of data-parallel Fortran programs

    Publication Year: 1995, Page(s):396 - 405
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (828 KB)

    This paper presents PERFSIM, a tool for automatic performance analysis of CM Fortran programs running on the Connection Machine CM-5. PERFSIM executes the scalar part of the program, including all of its control structure, but estimates the running time of all vector operations, including both communication and computation. Our empirical studies show that the overall estimates are accurate to with... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The performance impact of false subpage sharing in KSR1

    Publication Year: 1995, Page(s):64 - 71
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    This paper presents insight into important aspects of the performance of KSR1 multiprocessor. We report performance degradations caused by false sharing of memory subpages (128 bytes long units of transfer and consistency) between local caches in the KSR1. In other words, the performance is measured when multiple processing nodes issue simultaneous write requests for a single subpage. Our measurem... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ProcSimity: an experimental tool for processor allocation and scheduling in highly parallel systems

    Publication Year: 1995, Page(s):414 - 421
    Cited by:  Papers (6)  |  Patents (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (952 KB)

    ProcSimity is a software tool that supports research in processor allocation and scheduling for highly parallel systems. ProcSimity's multicomputer simulator supports experimentation with selected allocation and scheduling algorithms on architectures with a range of network topologies and for several current routing and flow control mechanisms. Message-passing can be simulated in detail at the fli... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Many-to-many personalized communication with bounded traffic

    Publication Year: 1995, Page(s):20 - 27
    Cited by:  Papers (17)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (572 KB)

    This paper presents solutions for the problem of many-to-many personalized communication, with bounded incoming and outgoing traffic, on a distributed memory parallel machine. We present a two-stage algorithm that decomposes the many-to-many communication with possibly high variance in message size into two communications with low message size variance. The algorithm is deterministic and takes tim... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the influence of partitioning schemes on the efficiency of overlapping domain decomposition methods

    Publication Year: 1995, Page(s):375 - 384
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (688 KB)

    One level overlapping Schwarz domain decomposition preconditioners can be viewed as a generalization of block Jacobi preconditioning. The effect of the number of blocks and the amount of overlapping between blocks on the convergence rate is well understood. This paper considers the related issue of the effect of the scheme used to partition the matrix into blocks on the convergence rate of the pre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient parallelizations of a competitive learning algorithm for text retrieval on the MasPar

    Publication Year: 1995, Page(s):4 - 11
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (488 KB)

    In this paper, we present parallel implementations of a connectionist model for text retrieval on the MasPar MP-1, an SIMD machine with up to 16 K processors. The connectionist model was originally developed on a SUN SparcStation 1+ for a sequential implementation. In our parallel implementations, we consider three strategies for mapping the network onto the MasPar: one-to-one, many-to-one, and on... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis and optimal system configuration of hierarchical two-level COMA multiprocessors

    Publication Year: 1995, Page(s):90 - 97
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (576 KB)

    The single-bus UMA architecture with the write-once cache protocol is a simple and popular choice, however, it only works well for multiprocessors with tens of processors. For larger multiprocessors, the hierarchical COMA architecture provides excellent scalability while very well maintaining the system performance. This paper analyzes the performance of hierarchical 2-level COMA multiprocessors b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Runtime support for execution of fine grain parallel code on coarse grain multiprocessors

    Publication Year: 1995, Page(s):440 - 447
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (636 KB)

    The goal of this research is to provide systems support that allows fine grain, data parallel code to execute efficiently on much coarser grain multiprocessors. The task of writing parallel applications is simplified by allowing the programmer to assume a number of processors convenient to the algorithm being implemented. This paper describes and evaluates a runtime approach that efficiently manag... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler support for out-of-core arrays on parallel machines

    Publication Year: 1995, Page(s):110 - 118
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (620 KB)

    Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient out-of-core version of the application. We are investigating a compiler-based approach to the above problem. In general, our compiler techniques attempt to choreograph I/O for an application based on high-level programmer annotations similar to Fo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Braid: integrating task and data parallelism

    Publication Year: 1995, Page(s):211 - 219
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (716 KB)

    Archetype data parallel or task parallel applications are well served by contemporary languages. However, for applications containing a balance of task and data parallelism the choice of language is less clear. While there are languages that enable both forms of parallelism, e.g., one can write data parallel programs using a task parallel language, there are few languages which support both. We pr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing multidisciplinary and multi-zonal applications using MPI

    Publication Year: 1995, Page(s):496 - 503
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (816 KB)

    Multidisciplinary and multi-zonal applications are codes where two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, a program can be divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These applications ar... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic generation of efficient array redistribution routines for distributed memory multicomputers

    Publication Year: 1995, Page(s):342 - 349
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We presen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.