Proceedings Sixth International Parallel Processing Symposium

23-26 March 1992

Filter Results

Displaying Results 1 - 25 of 119
  • Distributed algorithms for shortest-path, deadlock-free routing and broadcasting in a class of interconnection topologies

    Publication Year: 1992, Page(s):589 - 596
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (611 KB)

    A class of novel interconnection topologies called the generalized Fibonacci cubes is presented. The generalized Fibonacci cubes include the hypercubes, the recently proposed Fibonacci cubes (W.-J. Hsu, Proc. Int. Conf. on Parallel Processing, p.1722-3 (1991)), and some other asymmetric interconnection topologies bridging between the two mentioned above. The generalized Fibonacci cubes can serve a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The impact of wiring constraints on hierarchical network performance

    Publication Year: 1992, Page(s):580 - 588
    Cited by:  Papers (6)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (673 KB)

    A unified approach, incorporating architectural and packaging issues, is necessary in the design of high performance computer networks. Clustering enables the authors to exploit the physical hierarchy imposed by packaging. Previously the authors examined the clustering of hypercube networks within the context of wiring constraints (see 1991 Int. Conf. on Parallel Processing, Aug. 1991). The author... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CCHIME: a cache coherent hybrid interconnected memory extension

    Publication Year: 1992, Page(s):573 - 577
    Cited by:  Patents (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (416 KB)

    This paper presents a hybrid shared memory architecture which combines the scalability of a multistage interconnection network with the contention reduction benefits of coherent caches. The authors achieve this by replacing the memory modules and final stages of a multistage interconnection network with clusters of coherent caches. The performance of Cache Coherent Hybrid Interconnected Memory Ext... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Some architectural and compilation issues in the design of hierarchical shared memory multiprocessors

    Publication Year: 1992, Page(s):567 - 572
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (468 KB)

    Latency and synchronization overheads have been identified as two fundamental problems in large-scale shared memory multiprocessors. The authors discuss architectures based on hierarchical memories which exploit the notion of partial sharing of variables to significantly reduce latency and synchronization overheads. They examine a particular class of architectures, the tree-structured hierarchical... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of two address space allocation schemes for an optically interconnected distributed shared memory system

    Publication Year: 1992, Page(s):562 - 566
    Cited by:  Papers (8)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (485 KB)

    An Optically Interconnected Distributed Shared Memory (OIDSM) system is introduced and analyzed. Distributed shared memory systems place a heavy traffic requirement on the interconnection network. Complex memory allocation schemes have been introduced to reduce the network load. The photonic network of the system introduced in this paper alleviates the traffic load concern, and enables the develop... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Matching algorithms and architecture in hierarchical shared-memory multiprocessor (HSM) systems

    Publication Year: 1992, Page(s):558 - 561
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (310 KB)

    The authors map several interprocessor communication and linear algebra algorithms on a memory coherent hierarchical shared-memory multiprocessor (HSM) system and their communication complexities are evaluated. The results show that the hierarchical architecture is ill-suited to algorithms exhibiting no temporal locality on data accesses or to the algorithms with point-to-point communication.<&... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The odd-even expansion storage scheme and its implementation issues

    Publication Year: 1992, Page(s):550 - 557
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (630 KB)

    The authors present a parallel storage scheme to distribute the elements of an N*N matrix over N memory banks, where N is any (odd or even) power of two, such that any rows, columns, forward and backward diagonals, and square or rectangular blocks can be accessed simultaneously without memory conflict. They present a simple scheme for address generation, which requires only logic operations and ca... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load balancing for distributed branch &amp; bound algorithms

    Publication Year: 1992, Page(s):543 - 548
    Cited by:  Papers (19)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (467 KB)

    The authors present a new load balancing strategy and its application to distributed branch & bound algorithms and demonstrate its efficiency by solving some NP-complete problems on a network of up to 256 transputers. The parallelization of their branch & bound algorithm is fully distributed. Every processor performs the same algorithm but each on a different part of the solution tree. In ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel coprocessor for Kohonen's self-organizing neural network

    Publication Year: 1992, Page(s):537 - 542
    Cited by:  Papers (2)  |  Patents (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (465 KB)

    A new efficient integrated circuit implementation of the Self-Organising Feature Map algorithm is described. The fully digital hardware is designed for high speed parallel processing and modular expandability. The hardware implementation acts as a neural coprocessor which uses synchronous, bit-serial arithmetic. It includes functional units which perform the Euclidean distance computation, the min... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The effects of communication overhead on the speedup of parallel 3-D finite element applications

    Publication Year: 1992, Page(s):531 - 536
    Cited by:  Papers (3)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (407 KB)

    The use of parallel processors for implementing the finite element method has made feasible the analyses of large applications, especially three-dimensional applications. The speedup, however, is limited by the interprocessor communication requirements. The authors analyze the effects of interprocessor communications on the resultant speedup of the parallel execution of regular three-dimensional f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A conceptual framework for implementing neural networks on massively parallel machines

    Publication Year: 1992, Page(s):527 - 530
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (396 KB)

    This paper describes a framework for implementing neural networks on massively parallel machines. The framework is generic and applies to a range of neural networks (Multi Layer Perceptron, Competitive Learning, Self-Organising Map, etc.) as well as a range of massively parallel machines (Connection Machine, Distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal allocation of shared data over distributed memory hierarchies

    Publication Year: 1992, Page(s):518 - 526
    Cited by:  Patents (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (716 KB)

    Nonreplicated shared data of distributed applications is optimally allocated to pre-specified multilevel memory partitions at the sites of a heterogeneous multicomputer network to minimize a weighted combination of systemwide mean time delay performance and mean communication cost per access request. Greedy and fast optimization algorithms are presented for nonqueueing lightly-loaded as well as he... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive deadlock-free worm-hole routing in hypercubes

    Publication Year: 1992, Page(s):512 - 515
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (232 KB)

    Two new algorithms for worm-hole routing in the hypercube are presented. The first hypercube algorithm is adaptive, but non-minimal in the sense that some derouting is permitted. Then another deadlock-free adaptive worm-hole based routing algorithm for the hypercube interconnection is presented which is minimal. Finally some well-known worm-hole algorithms for the hypercube were evaluated together... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The interplay between granularity, performance and availability in a replicated Linda tuple space

    Publication Year: 1992, Page(s):508 - 511
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (358 KB)

    Replication is a common method for increasing the availability of data in a distributed environment. The authors' interest is in the application of replication techniques in the domain of parallel processing. They explore the issues concerning degree of replication and granularity in the context of a distributed and highly available Linda tuple space. In particular, they study the performance effe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The vesicular dataflow model

    Publication Year: 1992, Page(s):502 - 507
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (436 KB)

    The Vesicular Dataflow (VDF) model is presented in the paper. The VDF model has been formulated to introduce a way of storing and retrieving information and hence to reduce the main drawback of the basic DF model. Tokens can be stored in vesicles in the VDF model and then distributed in non-deterministic way. State-dependent computations and global variables can be expressed in the dataflow manner... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A functional execution model for a non-dataflow tagged token architecture

    Publication Year: 1992, Page(s):496 - 501
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (487 KB)

    The author proposes a new execution model for a non-dataflow tagged-token architecture which is not Petri-net based but rather more closely related to the lambda calculus. The model exploits a functional programming style having applicative-order evaluation. The computation's execution graph is dynamically generated according to easily understood dynamic tagging rules which have been demonstrated ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performability studies of hypercube architectures

    Publication Year: 1992, Page(s):488 - 495
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (569 KB)

    The authors propose a novel technique to study composite reliability and performance (performability) measures of hypercube systems using generalized stochastic Petri nets (GSPNs). This technique essentially consists of the following: (i) a GSPN reliability model; (ii) a GSPN performance model; and (iii) a way of combining the results from these two models. Models and performability results for an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A library environment for distributed memory multiprocessors

    Publication Year: 1992, Page(s):483 - 486
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (357 KB)

    The authors propose the design of a library environment, called PARUL (PARallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distributed for use of a library function as well as the results generated by the function to be retained in the network of processors to be used by subsequent library functions. The user of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Prototyping N-body simulation in Proteus

    Publication Year: 1992, Page(s):476 - 482
    Cited by:  Papers (3)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (633 KB)

    This paper explores the use of Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes communicating through shared memory. Several different parallel algorithms for N-body simulation are presented in Proteus, il... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compile-time estimation of communication costs on multicomputers

    Publication Year: 1992, Page(s):470 - 475
    Cited by:  Papers (21)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (510 KB)

    An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Any strategy for automatic data partitioning needs a mechanism for estimating the performance of a program under a given partitioning scheme, the most crucial part of which involves determining the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Aroma: language support for distributed objects

    Publication Year: 1992, Page(s):686 - 690
    Cited by:  Papers (3)  |  Patents (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (446 KB)

    Aroma simplifies the task of parallelizing large applications in multicomputers by providing applications with a shared object space. Aroma supports both traditional monolithic objects and aggregate objects that can be partitioned across multiple nodes. Aggregate objects support data parallelism efficiently. An Aroma program consists of tasks that operate on shared objects. Tasks typically execute... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybrid loop interchange: optimization for parallel programs

    Publication Year: 1992, Page(s):680 - 685
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (363 KB)

    Parallel loops account for the greatest amount of parallelism in numerical programs. Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems. However, in parallel processing systems with caches or local memories in memory hierarchies, 'thrashing problem' may arise whenever data moves back and forth between t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting SIMD computers for general purpose computation

    Publication Year: 1992, Page(s):675 - 679
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (343 KB)

    This paper proposes a strategy for exploiting massively parallel SIMD computers for general purpose computation. The approach places compiled programs into the local memory space of each distinct processing element (PE). Within each PE, a local program counter is initialized and the instructions are interpreted in parallel across all of the PEs by control signals emanating from the central control... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparisons and analysis of massively parallel SIMD architectures for parallel logic simulation

    Publication Year: 1992, Page(s):671 - 674
    Cited by:  Patents (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (382 KB)

    This paper compares and analyzes massively parallel SIMD architectures as processing environments for parallel logic simulation. The CM-2 and the MP-1 are considered as target machines for the comparison. Detailed contrasts between the two parallel schemes are made based on actual simulation results and system performance. Distributed event-driven simulation protocols are used to obtain experiment... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Preventing recursion deadlock in concurrent object-oriented systems

    Publication Year: 1992, Page(s):665 - 670
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (522 KB)

    This paper presents solutions to the problem of deadlock due to recursion in concurrent object-oriented programming languages. Two language-independent, system-level mechanisms are proposed: a novel technique using multi-ported objects, and a named-threads scheme that borrows from previous work in distributed computing. The authors compare the solutions, and present an analysis of their relative m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.