By Topic

Frontiers of Massively Parallel Computation, 1995. Proceedings. Frontiers '95., Fifth Symposium on the

6-9 Feb. 1995

Filter Results

Displaying Results 1 - 25 of 62
  • Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation

    Publication Year: 1995
    Request permission for commercial reuse | PDF file iconPDF (30 KB)
    Freely Available from IEEE
  • Introducing MGAP-2 [Micro-Grain Array Processor]

    Publication Year: 1995, Page(s):281 - 288
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (588 KB)

    The Micro-Grain Array Processor (MGAP) is a family of two-dimensional, micro-grained array processors. The processor cell architecture is extremely compact and simple, ensuring fine grainness, a very high processor density, and programming flexibility. Flexibility is maintained through a programmable interconnect which clusters array cells into larger computational units. In this paper, we discuss... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing irregular computations on SIMD machines: a case study

    Publication Year: 1995, Page(s):222 - 230
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (676 KB)

    Data-parallel computations with regular structure fixed data size and predictable control patterns can be implemented efficiently on SIMD architectures. However many large applications have irregular structure, either data sets that vary in size as the computation progresses or control structures that select different subsets of the processors at each stage of the computation. In this paper we des... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A data management approach for handling large compressed arrays in high performance computing

    Publication Year: 1995, Page(s):119 - 128
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (752 KB)

    Poor parallel i/o performance has recently been recognized as a roadblock to scalability of parallel architectures, algorithms, and data sets. For i/o of large arrays, the storage of arrays by subarray divisions-chunking-has been shown to improve i/o performance substantially in many circumstances, In this paper we show how to increase the performance advantages of chunking by combining it with da... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel remapping algorithms for adaptive problems

    Publication Year: 1995, Page(s):367 - 374
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB)

    We present fast parallel algorithms for remapping a class of irregular and adaptive problems on coarse-grained distributed-memory machines. We show that the remapping of these applications, using simple index-based mapping algorithms, can be reduced to sorting a nearly sorted list of integers or merging an assorted list of integers with a sorted list of integers. By using the algorithms we have de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of cost of performing communications using various communication mechanisms

    Publication Year: 1995, Page(s):290 - 297
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (652 KB)

    There is a trend towards incorporating architectural features which support mechanisms for efficiently communicating long messages and (fixed-size) short messages. In this paper, we provide a framework for analyzing the communication costs in parallel machines which provide various communication mechanisms. First, we define four parameters to accurately model the communication time in these parall... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel homologous sequence searching in large databases

    Publication Year: 1995, Page(s):231 - 237
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (548 KB)

    We present a parallel computational method for retrieving similar sequences from large genetic and protein databases using a dynamic programming comparison algorithm. Two previously published parallel methods for performing this task are first discussed and evaluated. The advantages of these two parallel methods are combined and incorporated into our new method to obtain better performance than ei... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel I/O from the user's perspective

    Publication Year: 1995, Page(s):129 - 137
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (576 KB)

    Parallel I/O systems are gaining popularity as a means for providing scalable high bandwidth I/O. In this paper we develop abstract models of parallel I/O systems and provide empirical results that shout how I/O intensive applications interact with the elements of parallel I/O systems. The abstract models are useful for explaining parallel I/O performance to developers of applications requiring hi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the influence of partitioning schemes on the efficiency of overlapping domain decomposition methods

    Publication Year: 1995, Page(s):375 - 384
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (688 KB)

    One level overlapping Schwarz domain decomposition preconditioners can be viewed as a generalization of block Jacobi preconditioning. The effect of the number of blocks and the amount of overlapping between blocks on the convergence rate is well understood. This paper considers the related issue of the effect of the scheme used to partition the matrix into blocks on the convergence rate of the pre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectrum analysis and min-cut transformation of communication networks in parallel computers

    Publication Year: 1995, Page(s):298 - 307
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (792 KB)

    We present a formal model for the analysis of communication networks in parallel computers. Unlike most others, our model focuses on the transmission delays as opposed to the propagation delays of communication patterns. The model allows all symmetric communication networks to be examined by their spectrums and characterized by their transmission dimensions. A min-cut transformation is introduced ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal parallel algorithm for volume ray casting

    Publication Year: 1995, Page(s):238 - 245
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB)

    Volume rendering by ray casting is a computationally expensive problem. For interactive volume visualization, rendering has to be done in real time (30 frames/sec). Since the typical 3-D dataset size is 256 3 the use of parallel processing is imperative. In this paper, we present an O(log n) EREW algorithm for volume rendering using O(n3) processors which can be optimized to ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A high performance sparse Cholesky factorization algorithm for scalable parallel computers

    Publication Year: 1995, Page(s):140 - 147
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (796 KB)

    This paper presents a new parallel algorithm for sparse matrix factorization. This algorithm uses subforest-to-subcube mapping instead of the subtree-to-subcube mapping of another recently introduced scheme by A. Gupta and V. Kumar (1994). Asymptotically, both formulations are equally scalable on a wide range of architectures and a wide variety of problems. But the subtree-to-subcube mapping of th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Runtime incremental parallel scheduling (RIPS) for large-scale parallel computers

    Publication Year: 1995, Page(s):456 - 463
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (676 KB)

    Runtime incremental parallel scheduling (RIPS) is an alternative strategy to the commonly used dynamic scheduling. In this scheduling strategy, the system scheduling activity alternates with the underlying computation work. RIPS utilizes advanced parallel scheduling techniques to produce a low-overhead, high-quality load balancing and adapts to applications of nonuniform structures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Many-to-many personalized communication with bounded traffic

    Publication Year: 1995, Page(s):20 - 27
    Cited by:  Papers (17)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (572 KB)

    This paper presents solutions for the problem of many-to-many personalized communication, with bounded incoming and outgoing traffic, on a distributed memory parallel machine. We present a two-stage algorithm that decomposes the many-to-many communication with possibly high variance in message size into two communications with low message size variance. The algorithm is deterministic and takes tim... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visualizing distributed data structures

    Publication Year: 1995, Page(s):480 - 487
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (552 KB)

    A new programming style for large-scale parallel programs centered around distributed data structures has emerged. The current parallel program visualization tools were intended for the old style and do not deal with distributed data structures. We show, with several examples of visualizations and animations developed for large scale pC++ programs, that visualizing and animating distributed data s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Migrating CM Fortran applications to HPF

    Publication Year: 1995, Page(s):37 - 40
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (320 KB)

    In the course of developing pghpf, the PGI High Performance Fortran compiler, several CM Fortran applications have been converted to HPF and run on multiple platforms as part of customer benchmarking exercises. The existing base of CM Fortran codes, along with MP Fortran codes developed for the MasPar machines, represent perhaps the largest body of existing data parallel applications. With product... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploitation of control parallelism in data parallel algorithms

    Publication Year: 1995, Page(s):385 - 392
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (540 KB)

    This paper considers the matrix decomposition A=LDLT, as a vehicle to explore the improvement in performance obtainable through the execution of multiple streams of control on SIMD architectures. Several methods for partitioning the SIMD array are considered. Architectural support for and feasibility of using control parallelism in SIMD algorithms is briefly considered. Techniques for c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of finite buffered multistage interconnection networks under first-blocked-first-unblock conflict resolution

    Publication Year: 1995, Page(s):308 - 314
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (492 KB)

    Previous models of multistage interconnection networks assume that when a conflict occurs for an output-port of a switching element (SE), the conflict is resolved randomly at each network cycle. While this assumption leads to simple analytic models, packets that are left behind because of the backpressure mechanism will contend for the same output-port of the SE in the following cycle. In our mode... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The performance impact of data placement for wavelet decomposition of two-dimensional image data on SIMD machines

    Publication Year: 1995, Page(s):246 - 251
    Cited by:  Papers (4)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (456 KB)

    Wavelet transform is a mathematical tool through which 2D spatial image data can be mapped into wavelet space for compact representation and for various signal analyses. The highly regular structure of the wavelet decomposition algorithm makes it well-suited for fine-grained parallelization. Most existing parallelization approaches focus on how to map computing functions to processors, but pay lit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic generation of efficient array redistribution routines for distributed memory multicomputers

    Publication Year: 1995, Page(s):342 - 349
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We presen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing multidisciplinary and multi-zonal applications using MPI

    Publication Year: 1995, Page(s):496 - 503
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (816 KB)

    Multidisciplinary and multi-zonal applications are codes where two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, a program can be divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These applications ar... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallelization and performance of three-dimensional plasma simulation

    Publication Year: 1995, Page(s):148 - 155
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (600 KB)

    Plasma-assisted materials processing is being applied for several applications including manufacture of integrated circuits. A need of these applications is the ability to computer simulate the plasma systems and plasma processes. The motion of particles in particle type plasma simulations is inherently parallel. Further, the solution to Poisson's equation and the calculation of local electric and... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The DEC High Performance Fortran 90 compiler front end

    Publication Year: 1995, Page(s):46 - 53
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (584 KB)

    Digital has developed a compiler for full Fortran 90 and the High Performance Fortran extensions to Fortran 90. This compiler targets Digital's Alpha workstations, servers, shared-memory SMP servers, and distributed memory AdvantageCluster and workstation farm systems. This paper gives an overview of the structure of the compiler's front end component, responsible for lexical analysis, syntax anal... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance debugging based on scalability analysis

    Publication Year: 1995, Page(s):406 - 413
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (676 KB)

    This paper presents scalability as a basis for profiling and performance debugging of parallel programs, as only the purely scalable code runs efficiently in parallel. The approach is based on separating scalable and various kinds of non-scalable parts of a program, identifying the reasons for non-scalability, and focusing the programmer's attention on why and where non-scalable execution is occur... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Falcon: on-line monitoring and steering of large-scale parallel programs

    Publication Year: 1995, Page(s):422 - 429
    Cited by:  Papers (25)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (668 KB)

    Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such program steering is to improve the application's performance or to affect its execution behavior. This paper presents the framework of the Falcon system and its implementation, and then evaluates the performance of the system. A complex sample application, a molecular dynamics simulation pr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.