By Topic

# Proceedings Scalable High Performance Computing Conference SHPCC-92.

## Filter Results

Displaying Results 1 - 25 of 71
• ### Proceedings. Scalable High Performance Computing Conference SHPCC-92 (Cat. No.92TH0432-5)

Publication Year: 1992
| PDF (33 KB)
• ### Incremental mapping for solution-adaptive multigrid hierarchies

Publication Year: 1992, Page(s):401 - 408
| | PDF (548 KB)

The full multigrid method uses a hierarchy of successively finer grids. In a solution-adaptive grid hierarchy each grid is obtained by adaptive refinement of the grid on the previous level. On a distributed memory multiprocessor, each grid level must be partitioned and mapped so as to minimize the multigrid cycle execution time. In this report, several grid partitioning and load (re)mapping strate... View full abstract»

• ### Adaptive methods and rectangular partitioning problem

Publication Year: 1992, Page(s):409 - 415
| | PDF (544 KB)

Partitioning problems for rectangular domains having nonuniform workload for mesh-connected SIMD architectures are discussed. The considered rectangular workloads result from application of adaptive methods to the solution of hyperbolic differential equations on SIMD machines. A new form of the partitioning problem is defined in which sub-meshes of processors are assigned to tasks, each task being... View full abstract»

• ### Portable parallel Level-3 BLAS in Linda

Publication Year: 1992, Page(s):416 - 423
Cited by:  Papers (1)
| | PDF (564 KB)

Describes an approach towards providing an efficient Level-3 BLAS library over a variety of parallel architectures using C-Linda. A blocked linear algebra program calling the sequential Level-3 BLAS can now run on both shared and distributed memory environments (which support Linda) by simply replacing each call by a call to the corresponding parallel Linda Level-3 BLAS. The authors summarise some... View full abstract»

• ### A parallel scalable approach to short-range molecular dynamics on the CM-5

Publication Year: 1992, Page(s):240 - 245
Cited by:  Papers (2)
| | PDF (356 KB)

Presents a scalable algorithm for short-range molecular dynamics which minimizes interprocessor communications at the expense of a modest computational redundancy. The method combines Verlet neighbor lists with coarse-grained cells. Each processing node is associated with a cubic volume of space and the particles it owns are those initially contained in the volume. Data structures for own' and v... View full abstract»

• ### Complete exchange on a circuit switched mesh

Publication Year: 1992, Page(s):300 - 306
Cited by:  Papers (31)  |  Patents (3)
| | PDF (424 KB)

The complete exchange (`all-to-all personalized') communication pattern is at the heart of numerous important multicomputer algorithms. Recent research has shown how this pattern can efficiently be performed on circuit-switched hypercubes. However, on circuit-switched meshes, this pattern is difficult to perform efficiently because the sparsity of the mesh interconnect leads to severe link content... View full abstract»

• ### Debugging mapped parallel programs

Publication Year: 1992, Page(s):200 - 203
| | PDF (320 KB)

As more sophisticated tools for parallel programming become available, programmers will inevitably want to use them together. However, some parallel programming tools can interact with each other in ways that make them less useful. In particular, it a mapping tool is used to adapt a parallel program to run on relatively few processors, the information presented by a debugger may become difficult t... View full abstract»

• ### HeNCE: graphical development tools for network-based concurrent computing

Publication Year: 1992, Page(s):129 - 136
Cited by:  Papers (9)
| | PDF (460 KB)

HeNCE (heterogeneous network computing environment) is an X Window based graphical parallel programming environment that was created to assist scientists and engineers with the development of parallel programs. HeNCE provides a graphical interface for creating, compiling, executing, and debugging parallel programs, as well as configuring a distributed virtual computer (using PVM). HeNCE programs c... View full abstract»

• ### A methodology for visualizing performance of loosely synchronous programs

Publication Year: 1992, Page(s):424 - 432
Cited by:  Papers (2)
| | PDF (756 KB)

Introduces a new set of views for displaying the progress of loosely synchronous computations involving large numbers of processors on large problems. The authors suggest a methodology for employing these views in succession in order to gain progressively more detail concerning program behavior. At each step, focus is refined to include just those program sections or processors which have been det... View full abstract»

• ### Scalability of data transport

Publication Year: 1992, Page(s):1 - 8
| | PDF (584 KB)

Peak floating point rate is a very limited way to characterize high performance computer systems. A better method is to use the bandwidth and latency of data transport for the major components of a system. Bandwidth scales well with increasing system size, but latency does not. The demands placed by a program on data transport determine how well an architecture will execute it. The article discuss... View full abstract»

• ### A runtime data mapping scheme for irregular problems

Publication Year: 1992, Page(s):216 - 219
Cited by:  Papers (3)
| | PDF (304 KB)

In scalable multiprocessor systems, high performance demands that computational load be balanced evenly among processors and that interprocessor communication be limited as much as possible. In this paper, the authors study the problem of automatically choosing data distributions for irregular problems. Irregular problems are programs where the data access pattern cannot be determined during compi... View full abstract»

• ### A parallel implementation of the chemically reacting CFD code, SPARK

Publication Year: 1992, Page(s):342 - 349
Cited by:  Papers (1)
| | PDF (616 KB)

Describes a parallel version of the two-dimensional, chemically reacting CFD code, SPARK. The sequential code has been ported to run on the Intel iPSC/860-based parallel computers. Routines have been added to the code which partition the problem based on the global mesh, and then assign the resulting subdomains across the processors. Two subdomain mappings have been considered. The routines which ... View full abstract»

• ### Scalable parallel molecular dynamics on MIMD supercomputers

Publication Year: 1992, Page(s):246 - 251
Cited by:  Papers (2)
| | PDF (520 KB)

Presents two parallel algorithms suitable for molecular dynamics simulations over a wide range of sizes, from a few hundred to millions of atoms. One of the algorithms is optimally scalable, offering performance proportional to N/P where N is the number of atoms (or molecules) and P is the number of processors. Their implementation on three MIMD parallel computers (nCUBE2, Intel Gamma, and Intel D... View full abstract»

• ### Towards a distributed memory implementation of Sisal

Publication Year: 1992, Page(s):385 - 392
Cited by:  Papers (3)
| | PDF (680 KB)

Sisal is a functional language for scientific applications implemented efficiently on shared memory, vector, and hierarchical memory multiprocessors. The current compiler assumes a flat, shared addressing space, and the runtime system is implemented using locks and shared queues. This paper describes a first implementation of Sisal on the nCUBE 2 distributed memory architecture. Most of the effort... View full abstract»

• ### A matrix product algorithm and its comparative performance on hypercubes

Publication Year: 1992, Page(s):190 - 194
Cited by:  Papers (7)
| | PDF (280 KB)

A matrix product algorithm is studied in which one matrix operand is transposed prior to the computation. This algorithm is compared with the Fox-Hey-Otto algorithm on hypercube architectures. The Transpose algorithm simplifies communication for nonsquare matrices and for computations where the number of processors is not a perfect square. The results indicate superior performance for the Transpos... View full abstract»

• ### Applications of FORALL-formed computations in large scale stochastic dynamic programming

Publication Year: 1992, Page(s):182 - 185
Cited by:  Papers (2)
| | PDF (224 KB)

Data parallel broadcasting methods have been developed by taking the advantages of the properties of stochastic, nonlinear, continuous-time dynamical systems. The stochastic components include both Gaussian and Poisson random white noise. An example of a grand challenge level application is the resource management problem. The purpose of this paper is to demonstrate that broadcasting can be effici... View full abstract»

• ### Load information distribution via active interconnection networks

Publication Year: 1992, Page(s):174 - 177
| | PDF (292 KB)

Existing multicomputers typically use passive, dedicated network interfaces. By comparison, an active interconnect network can manipulate the data in messages transitting through a node; these might use existing systolic processors as the network interface. Active interconnects will become increasingly common in distributed memory multicomputers because they can be used to implement a variety of r... View full abstract»

• ### An object oriented approach to boundary conditions in finite difference fluid dynamics codes

Publication Year: 1992, Page(s):145 - 148
Cited by:  Patents (2)
| | PDF (276 KB)

Parallel computers have been used to solve computational fluid dynamics (CFD) problems for many years; however, while the hardware has greatly improved, the software methods for describing CFD algorithms have remained largely unchanged. From the physics and software engineering points of view, the boundary conditions consume most of the algorithmic development and programming time, but only a smal... View full abstract»

• ### Intercube communication for the iPSC/860

Publication Year: 1992, Page(s):307 - 313
Cited by:  Papers (5)
| | PDF (408 KB)

In this paper, new functions that enable efficient intercube communication on the Intel iPSC/860 are introduced. Communication between multiple cubes (power-of-two number of processor nodes) within the Intel iPSC/860 is a desirable feature to facilitate the implementation of interdisciplinary problems such as the grand challenge problems of the High Performance Computing and Communications Project... View full abstract»

• ### Using atomic data structures for parallel simulation

Publication Year: 1992, Page(s):30 - 37
Cited by:  Patents (2)
| | PDF (432 KB)

Synchronizing access to shared data structures is a difficult problem for simulation programs. Frequently, synchronizing operations within and between simulation steps substantially curtails parallelism. The paper presents a general technique for performing this synchronization while sustaining parallelism. The technique combines fine-grained, exclusive locks with futures, a write-once data struct... View full abstract»

• ### Applications of a parallel pressure-correction algorithm to 3D turbomachinery flows

Publication Year: 1992, Page(s):153 - 156
| | PDF (308 KB)

A parallel algorithm for the solution of three-dimensional compressible flows in turbomachinery has been developed and demonstrated on a scalable distributed memory multicomputer. The algorithm solves the compressible form of the Euler or Navier-Stokes equations via a compressible pressure correction formulation. To achieve high accuracy for highly turning blade rows, the computational grid is con... View full abstract»

• ### Toward a scalable concurrent architecture for real-time processing of stochastic control and optimization problems

Publication Year: 1992, Page(s):46 - 50
| | PDF (344 KB)

Reports on the development of a scalable multiple-instruction multiple-data (MIMD) concurrent architecture which is intended to serve as an effective alternative for solving stochastic differential and optimization systems. This architecture has in turn motivated the application of group theory and invariance analysis to acquire further insights in understanding the original problem. The speed-up ... View full abstract»

• ### Parameterized memory/processor optimizing FORTRAN compiler for parallel computers

Publication Year: 1992, Page(s):204 - 207
Cited by:  Patents (2)
| | PDF (312 KB)

A new approach to generating low-conflict parallel instructions for complex applications is introduced in this paper. This method is presented within the context of a FORTRAN compiler. An approximate simulator has been incorporated within a parallel-code/domain-decomposition loop within the compiler. The simulator estimates the performance of candidate instruction segments, and guides the selectio... View full abstract»

• ### Phase modeling of a parallel scientific code

Publication Year: 1992, Page(s):322 - 327
Cited by:  Papers (1)
| | PDF (424 KB)

Describes a performance model for a parallel program that solves the nonlinear shallow water equations using the spectral transform method. The model is generated via a phase analysis, and consists of a sequence of simple models whose sum describes the performance of the entire code. This use of a sequence of simple models increases the range of validity of the model as the problem and machine par... View full abstract»

• ### Selective monitoring using performance metric predicates

Publication Year: 1992, Page(s):162 - 165
Cited by:  Papers (4)  |  Patents (1)
| | PDF (304 KB)

The field of parallel processing is going through an important evolution in technology characterized by a significant increase in the number of processors within such systems. As the number of processors increases, the conventional techniques for monitoring the performance of parallel systems will produce large amounts of data in the form of event trace files. The authors propose one possible solu... View full abstract»