By Topic

Frontiers of Massively Parallel Computation, 1990. Proceedings., 3rd Symposium on the

Date 8-10 Oct. 1990

Filter Results

Displaying Results 1 - 25 of 81
  • How to use up processors

    Publication Year: 1990 , Page(s): 515 - 518
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (265 KB)  

    Most parallelization aims for a decomposition in which the resulting units are data independent, each of the units contributes to the final output in the data flow, and synchronization is minimized. The paradigm of `possible-worlds computing' aims to explore a model in which parallelization is achieved by disregarding the first two goals in order to preserve the third. The Tahiti programming language and its supporting run-time kernel Symphora, which are intended to explore the possible-worlds paradigm, are described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Third Symposium on the Frontiers of Massively Parallel Computation. Proceedings. (Cat. No.90CH2908-2)

    Publication Year: 1990
    Save to Project icon | Request Permissions | PDF file iconPDF (49 KB)  
    Freely Available from IEEE
  • An optimal lookahead processor to prune search space

    Publication Year: 1990 , Page(s): 215 - 224
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (760 KB)  

    The discrete relaxation algorithm (DRA) is an efficient computational technique for enforcing arc consistency (AC) in a consistent labeling problem (CLP). The original sequential AC-1 algorithm suffers from O(n3m3 ) time complexity for an n-object and m-label problem. Sample problem runs show that all these sequential algorithms are too slow to meet the need for any useful real-time CLP applications. An optimal parallel DRA5 algorithm that reaches the optimal lower bound, O(nm), for parallel AC algorithms (where the number of processors is polynomial in the problem size) is given. The algorithm has been implemented on a fine-grained, massively parallel hardware computer architecture. For problems of practical interest, 4 to 10 orders of magnitude of efficiency improvement can be reached on this hardware architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Asymptotically efficient hypercube algorithms for computational geometry

    Publication Year: 1990 , Page(s): 8 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (340 KB)  

    Hypercube algorithms that solve many fundamental computational geometry problems are presented. The algorithms use decomposition techniques, which enable them to outperform asymptotically the fastest previous algorithms for these problems. Previous algorithms all run in Θ(log2n) time, even when using a sorting method that runs in o(log2n) time. The new algorithms use a recently discovered o(log2n) time sorting method to improve their asymptotic speed to o(log2n). If sorting runs in Θ(Sort(n)) time, the algorithms for two-set dominance counting, 3-D maxima, closest pair, and all points nearest neighbors run in Θ(Sort(n)) log(log n) time, and the algorithms for triangulation and visibility from a point run in Θ(Sort(n)) time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Random number generators with inherent parallel properties

    Publication Year: 1990 , Page(s): 34 - 37
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (244 KB)  

    By incorporating the spatial variable into a one-dimensional array of numbers, it is possible to generalize the well-known linear congruential random-number generator (LCG) to the spatially coupled random-number generator (SCG) given by Xi(t+1)=f[{Xi( t)}] (mod m) where i=1, 2, . . ., n can be regarded as spatial sites and f is a function of {X i} that denotes a set containing Xi and its neighbors. It was found that SCGs in general possess a very long period. Statistical and spectral tests on these SCGs show that they are excellent pseudorandom-number generators. The SCGs also have inherent parallel properties and are particularly efficient when implemented on parallel machines View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data management and control-flow constructs in a SIMD/SPMD parallel language/compiler

    Publication Year: 1990 , Page(s): 397 - 406
    Cited by:  Papers (3)  |  Patents (69)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (736 KB)  

    Features of an explicitly parallel language targeted for reconfigurable massively parallel processing systems capable of operating in the SIMD (single-instruction-stream, multiple-data-stream) and SPMD (single-program, multiple-data-stream) modes of parallelism are presented (SPMD is a subset of MIMD (multiple-instruction stream, multiple-data stream)). All aspects of the language have been provided with an SIMD-mode version and an SPMD-mode version that are functionally equivalent. The language facilitates experimentation with and exploitation of massively parallel SIMD/SPMD machines. Aspects of data management (variable specification, data manipulation operations, etc.) and control-flow constructs (data dependent and processor address dependent) are examined View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward scalable algorithms for orthogonal shared-memory parallel computers

    Publication Year: 1990 , Page(s): 12 - 21
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (676 KB)  

    The problem of developing scalable and near-optimal algorithms for orthogonal shared-memory multiprocessing systems with a multidimensional access (MDA) memory array is considered. An orthogonal shared-memory system consists of 2n processors and 2m memory modules accessed in any one of m possible access modes. Data stored in memory modules are available to processors under a mapping rule that allows conflict-free data reads and writes for any given access mode. Scalable algorithms are presented for two well-known computational problems, namely, matrix multiplication and the fast Fourier transform (FFT). A complete analysis of the algorithms based on computational time and the access modes needed is also presented. The algorithms scale very well onto higher dimensional MDA architectures but are not always optimal. This reveals a tradeoff between the scalability of an algorithm and its optimality in the MDA computational model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Large integer multiplication on massively parallel processors

    Publication Year: 1990 , Page(s): 38 - 42
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (288 KB)  

    Results obtained by multiplying large integers using the Fermat number transform are presented. The effectiveness of the approach was previously limited by word-length constraints, which are not a factor with many new computer architectures. A convolution algorithm on a massively parallel processor, based on the Fermat number transform, is presented. Examples of the tradeoffs between modulus, interprocessor communication steps, and input size are given. The application of this algorithm in the multiplication of large integers is then discussed, and performance results on a Connection Machine are reported. The results show multiplication times ranging from about 50 ms for 2-kb integers to 2600 ms for 8-Mb integers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Digital Transform Machine

    Publication Year: 1990 , Page(s): 265 - 269
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB)  

    The Digital Transform Machine, a massively parallel computer architecture based on a configurable hardware model of processing, is discussed. Some of the implications of this model of computing are examined, and the cellular structure and interconnection network of a proof-of-concept computer based on it are described. Areas that merit particular attention in future research are identified View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Early experience with object-oriented message driven computing

    Publication Year: 1990 , Page(s): 503 - 506
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (344 KB)  

    A model of parallel computation, message-driven computing (MDC), is presented, along with a language OOMDC/C that implements it. OOMDC/C is an object-oriented version of MDC that has been implemented in a sequential version and in two parallel versions on the Encore multiprocessor, one holding the messages in shared memory and the other copying the messages between processes. It is shown that the model facilitates the use of a variety of parallel data and control structures, including Actors, distributed arrays, communicating processes, remote procedure calls, broadcast and accumulation, data flow graphs, l-structures, streams, active messages, and demand-driven dynamic programming View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Designing the 3-LAP (three layers associative processor) for arithmetic and symbolic applications

    Publication Year: 1990 , Page(s): 270 - 273
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (260 KB)  

    A variant of the MULTAP architecture, called 3-LAP, is presented. This three-layer machine is designed from the middle out, beginning with its finite-state-machine diagram and working toward its low-level processing element cell specification and its high-level algorithm applications definition. The 3-LAP's operating and control parts are defined, the estimated machine throughput performance is presented (over 100 GCOPS (giga complex operations per second)), the processing element cell is defined, and arithmetic and symbolic application primitives in 3-LAP instructions are described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On single parameter characterization of parallelism

    Publication Year: 1990 , Page(s): 235 - 237
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (204 KB)  

    Issues pertinent to performance analysis of massively parallel systems are discussed. Attention is focused on the average parallelism of a software structure, which has been proposed as a single-parameter characterization of parallel software. It is argued that single-parameter characterization of parallel software or of parallel hardware rarely provides insight into the complex interactions among the software and hardware components of a parallel system. In particular, bounds for the speedup based on simple models of parallelism are violated when a model ignores the effects of communication delays View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High performance mapping for massively parallel hierarchical structures

    Publication Year: 1990 , Page(s): 251 - 254
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (380 KB)  

    Techniques for mapping image processing and computer vision algorithms onto a class of hierarchically structured systems are presented. In order to produce mappings of maximum efficiency, objective functions that measure the quality of given mappings with respect to particular optimization goals are proposed. The effectiveness and the computation complexity of mapping algorithms that yield very high performance by minimizing the objective functions are discussed. Performance results are also presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for efficient execution of array-based languages on SIMD computers

    Publication Year: 1990 , Page(s): 462 - 470
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (716 KB)  

    The author presents a framework for supporting efficient execution of machine-independent, array-based, data-parallel languages, such as Fortran-90 and Parallel Pascal, on distributed-memory SIMD (single-instruction-stream, multiple-data-stream) machines with mesh or hypercube interconnection topologies. The framework supports (1) a wide class of mappings of arrays into machines, (2) the implementation of many data selection and reorganization operations by manipulation of data descriptors instead of data movement, and (3) the decomposition of required data motions into sequences of efficient nearest-neighbor communications on the mesh. Each of these is discussed, and an application example is given. Related work is examined View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulation of neural networks on a massively parallel computer (DAP-510) using sparse matrix techniques

    Publication Year: 1990 , Page(s): 376 - 379
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (204 KB)  

    A parallel sparse matrix algorithm is proposed for the simulation of the modified Hopfield-Tank (MHT) network for solving the Traveling Salesman Problem (TSP). The MHT network using this sparse matrix algorithm has been implemented on a DAP-510, a massively parallel SIMD (single-instruction-steam, multiple-data-stream) computer consisting of 1024 processors. Problems of various sizes, ranging from eight cities up to 256 cities, have been simulated. The results show a very large speedup for the algorithm as compared with one using a standard dense matrix implementation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel sorting of large arrays on the MasPar MP-1

    Publication Year: 1990 , Page(s): 59 - 64
    Cited by:  Papers (4)  |  Patents (30)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (452 KB)  

    The problem of sorting a collection of values on a mesh-connected, distributed-memory, SIMD (single-instruction-stream, multiple-data-stream) computer using variants of Batcher's bitonic sort algorithm is considered for the case in which the number of values exceeds the number of processors in the machine. In this setting the number of comparisons can be reduced asymptotically if the processors have addressing autonomy (locally indirect addressing), and communication costs can be reduced by judicious domain decomposition. The implementation of several related adaptations of bitonic sort on a MasPar MP-1 is reported. Performance is analyzed in relation to the virtualization ratio VPR. It is concluded that the most reasonable large-array sort for this machine will combine hypercube virtualization with the processor axes transposed dynamically within an xnet embedding View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping finite element graphs on hypercubes

    Publication Year: 1990 , Page(s): 135 - 144
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (692 KB)  

    Two-way stripe partition mapping and greedy assignment mapping are proposed for mapping finite-element graphs (FEGs) onto hypercubes. They can be used to map both 2-D and 3-D FEGs on hypercubes. Two-way stripe partition mapping is a two-phase mapping approach. In the first phase, two-way stripe partition is used to achieve low communication cost. In the second phase, the load transfer heuristic is used to balance the computational load among processors. Greedy assignment mapping tries to minimize the communication cost and balance the computational load of processors simultaneously. The estimated lower bound speed up and the estimated upper bound speedup are derived for both bidirectional and unidirectional communication, to measure the mapping results. Simulation results show that the speedups for two-way stripe partition mapping are better than those for greedy assignment mapping when the load balance criterion is achieved in both approaches. The greedy approach, however, gives good performance at a much lower cost View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computer vision applications with the associative string processor

    Publication Year: 1990 , Page(s): 154 - 157
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB)  

    The use of the ASP (associative string processor), a massively parallel processing architecture, to implement representative tasks of integrated computer vision applications is discussed. An overview of the ASP architecture is provided. Two-dimensional image convolution and graph matching are then examined. The performance and versatility of the ASP architecture in implementing the computer vision tasks and in independent computer vision benchmarks (i.e. Abingdon Cross, DARPA and LAA-CERN) are discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data parallel computers and the FORALL statement

    Publication Year: 1990 , Page(s): 390 - 396
    Cited by:  Patents (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (480 KB)  

    The array constructs of Fortran 90 (formerly called Fortran 8×) map naturally onto SIMD (single-instruction-stream, multiple-data-stream) architectures that support a data parallel programming style, such as that of the Connection Machine computer system. The FORALL statement, an extension to Fortran 90 allowing for the expression of simultaneous execution of certain DO loop bodies, enhances this natural fit. A Fortran 90 compiler for data parallel machines is extended naturally to handle the FORALL statement. The data structures and algorithms used to effect this extension are described, and some examples of source code fragments and the target operations generated for them are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • What are the two most important issues facing the design and use of massively parallel computers?

    Publication Year: 1990 , Page(s): 526 - 529
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB)  

    A variety of views is presented by the participants in this panel discussion. Concerns are expressed regarding communication, control, software, programming, cost, performance measures, among others. The responses reflect the varied backgrounds and perspectives of the panelists View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Achieving multigauge behavior in bit-serial SIMD architectures via emulation

    Publication Year: 1990 , Page(s): 186 - 195
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (644 KB)  

    It is shown that the expected benefits of multigauging can be attained without any hardware modification and that additional advantages may be gained from enabling emulations. The authors start with a (physical) bit-serial architecture and build (software) support for multigauge computation on top of its native instruction set. Assumptions about this instruction set are modest and confined solely to its functionality, not its implementation. Multigauge behavior is achieved as a high-level abstraction, which is independent of the physical design. The danger that (hardware-enabled) multigauge behavior might preclude certain types of hardware optimization is avoided View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and performance of an optimizing SIMD compiler

    Publication Year: 1990 , Page(s): 507 - 510
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (292 KB)  

    The work reported has taken place within the context of the Scan Line Array Processor (SLAP) project. The SLAP is a long SIMD (single-instruction-stream, multiple-data-stream) vector of word-parallel processing elements designed to combine high-performance integer processing with real-time video I/O. An optimizing compiler has been built for SLANG, a high-level programming language designed to permit concise expression of many image-processing constructs and to provide the compiler with significant information for use in optimization. The SLAP processing elements have been fabricated and are currently being integrated into a prototype system. Early results of using the compiler as a tool for evaluating the SLAP processing element design and as a testbed for the authors' optimization techniques are reported View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massive parallelism through program restructuring

    Publication Year: 1990 , Page(s): 407 - 415
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (604 KB)  

    A technique for mapping algorithms to massively parallel processors is described. It differs from previous work by focusing on explicit program restructuring, as opposed to manual or algebraic mapping. The method is flexible, and it allows nonlinear, as well as linear, mappings. Some restructuring transformations and how they would be used are described. A limitation of the approach is the restriction of skewing and rotating by unit factors only. The method benefits from previous work in program restructuring and systolic array synthesis and thus will be simple to implement View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparative performance evaluation of a new SIMD machine

    Publication Year: 1990 , Page(s): 255 - 258
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (328 KB)  

    The performance of BLITZEN, a new massively parallel machine, is compared with that of the Massively Parallel Processor (MPP) for two image-processing functions: rotation and resampling. These functions, as implemented on the MPP, were modified to exploit new architectural features of BLITZEN. The functional simulator of BLITZEN, used for algorithm development and timing information, is described. A performance comparison based on instruction cycle counts shows a significant speedup for the new machine due to architectural features that improve data movement capability View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic generation of visualization code for the Connection Machine

    Publication Year: 1990 , Page(s): 158 - 161
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB)  

    A technique for automatically adding graphics code to application programs is described. It has been found that many visualization strategies can be described solely in terms of the productions in the programming language's BNF; therefore, a source-to-source transformation mechanism, whereby a user's nongraphics application can be transformed into a consistently functioning variant that also has graphics display operations methodically inserted, is being implemented. The ultimate goals is to discover principles whereby some clear image of an application's state can be communicated to a user by automatic source code analysis. The initial step has been to focus on a concrete architecture, the Connection Machine (CM), for which a set of high-level display functions has been fixed. Source programs written in C, and containing high-level directives, are passed through a parser and analysis program, which creates a variant program having graphics code installed. The transformation program implemented for the experiments is called CmVis. It has been prototyped using a parser generator called NewYacc, a Yacc-based tool that is enhanced by allowing rewrite rules to be associated with the language's BNF productions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.