Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing

26-29 Oct. 1994

Filter Results

Displaying Results 1 - 25 of 87
  • Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing

    Publication Year: 1994
    Request permission for commercial reuse | |PDF file iconPDF (59 KB)
    Freely Available from IEEE
  • Parallelism and locality in priority queues

    Publication Year: 1994, Page(s):490 - 496
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (600 KB)

    We explore two ways of incorporating parallelism into priority queues. The first is to speed up the execution of individual priority operations so that they can be performed one operation per time step, unlike sequential implementations which require O(log N) time steps per operation for an N element heap. We give an optimal parallel implementation that uses a linear array of O(log N) processors. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient mapping of randomly sparse neural networks on parallel vector supercomputers

    Publication Year: 1994, Page(s):170 - 177
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (572 KB)

    This paper presents efficient mappings of large sparse neural networks on a distributed-memory MIMD multicomputer with high performance vector units. We develop parallel vector code for an idealized network and analyze its performance. Our algorithms combine high performance with a reasonable memory requirement. Due to the high cost of scatter/gather operations, generating high performance paralle... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Causality versus time: how to specify and verify distributed algorithms

    Publication Year: 1994, Page(s):249 - 256
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (508 KB)

    This paper presents and advocates a method for formally specifying and verifying distributed programs. The method, which is based on the partial order of local states generated during execution, avoids the notion of time or physical global state. Programs are specified by documenting the relationship between states which are adjacent to each other in the partial order. Program properties are prove... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Upper and lower bounds for selection on the mesh

    Publication Year: 1994, Page(s):497 - 504
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (708 KB)

    A distance-optimal algorithm for selection on the mesh has proved to be elusive, although distance-optimal algorithms for the related problems of routing and sorting have recently been discovered. In this paper, we explain, using the notion of adaptiveness, why techniques used in the currently best selection algorithms cannot lead to a distance-optimal algorithm. We also present the first algorith... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • WICI: an efficient switching scheme for large scalable networks

    Publication Year: 1994, Page(s):385 - 392
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (560 KB)

    Many recent supercomputers employ either some kind of cluster-based design or other highly scalable networks. Clustering is built-in in the hierarchical system, while highly scalable networks like mesh or torus could be easily partitioned into modules to form several clusters. The paper explores potential use of prevalent routing schemes of wormhole and virtual cut-through and outline their relati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical adaptive routing: a framework for fully adaptive and deadlock-free wormhole routing

    Publication Year: 1994, Page(s):688 - 695
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (712 KB)

    Adaptive routing can improve network performance and fault-tolerance by providing multiple routing paths. However, the implementation complexity of adaptive routing can be significant, discouraging its use in commercial massively parallel systems. In this paper we introduce Hierarchical Adaptive Routing (HAR), a new adaptive routing framework which provides a unified framework for simple and high ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Derivation and performance of a pipelined transaction processor

    Publication Year: 1994, Page(s):178 - 185
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (564 KB)

    Transaction processing can be formulated as a simple functional program operating on a stream of transaction requests and a tree-structured database. In this paper we use algebraic transformation of the initial program to yield an optimistic implementation in which unnecessary synchronization is eliminated, thereby allowing concurrent processing of transactions. A detailed simulation is used to st... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synchronization expressions and languages

    Publication Year: 1994, Page(s):257 - 264
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (532 KB)

    New constructs for synchronization termed synchronization expressions (SEs) have been developed as high-level language constructs for parallel programming languages. We introduce a new family of languages named synchronization languages which we use to give a precise semantic description for SEs. Under this description, relations such as equivalence and inclusion between SEs can be easily understo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of SPEC92 on prime-mapped vector cache

    Publication Year: 1994, Page(s):569 - 576
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (532 KB)

    In this paper, we use execution-driven simulation to study and compare vector processing performances, in terms of the total execution time of an application program, of cache-based vector computers with that of uncached vector computers having a large number of interleaved memory banks. The cache memory used here is a new cache organization called prime-mapped cache. Simulation results on SPEC92 ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal polling in communication networks

    Publication Year: 1994, Page(s):224 - 231
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (524 KB)

    Polling is the process in which an issuing node of a communication network (polling station) broadcasts a query to every other node in the network and must receive a unique response from each of them. Polling can be thought as a combination of broadcasting and gathering and finds wide applications in the control of distributed systems. In this paper we consider the problem of polling in minimum ti... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimum dependence distance tiling of nested loops with non-uniform dependences

    Publication Year: 1994, Page(s):74 - 81
    Cited by:  Papers (13)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (716 KB)

    We address the problem of partitioning nested loops with non-uniform (irregular) dependence vectors. Although many methods exist for nested loop partitioning, most of these perform poorly when parallelizing nested loops with irregular dependencies. We apply the results of classical convex theory and principles of linear programming to iteration spaces and show the correspondence between minimum de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal hypercube algorithm for the all nearest smaller values problem

    Publication Year: 1994, Page(s):505 - 512
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (628 KB)

    Given a sequence of n elements, the All Nearest Smaller Values (ANSV) problem is to find, for each element in the sequence, the nearest element to the left (right) that is smaller, or to report that no such element exists. Berkman, Schieber, and Vishkin (1993) give an ANSV algorithm that runs in O(lg n) time on an (n/lg n)-processor CREW PRAM. In this paper, we present an O(lg n)-time n-processor ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new family of fixed degree Cayley networks for multiprocessor design

    Publication Year: 1994, Page(s):394 - 401
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (548 KB)

    We propose a new family of trivalent network graphs with constant node degree 3 for design of massively parallel systems. These graphs are shown to be regular, to have logarithmic diameter in the number of nodes, and to be maximally fault tolerant. We investigate different algebraic properties of these networks (including fault tolerance) and propose simple and optimal routing algorithms. We also ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wormhole routing algorithms for twisted cube networks

    Publication Year: 1994, Page(s):696 - 703
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (420 KB)

    The hypercube can be “improved” by “twisting” or rearranging edges to create new networks with smaller diameter and average distance. There are two criticisms of these twisted cube networks. First, these networks have not been shown to have deadlock-free routing algorithms. Second, while they can sometimes provide a better performance for a store-and-forward routing strateg... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal fault-tolerant communication algorithms on product networks using spanning trees

    Publication Year: 1994, Page(s):188 - 195
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (624 KB)

    Over the last years cartesian product graphs have started to receive increasing attention as general class of networks for multiprocessor systems. One reason is that many efficient and popular networks such as the meshes, tori, hypercubes, hyper de Bruijn, product shuffle, and the newly proposed folded Petersen networks belong to this class of networks. Secondly, with the help of cartesian product... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-the-fly replay: a practical paradigm and its implementation for distributed debugging

    Publication Year: 1994, Page(s):266 - 272
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (472 KB)

    This paper presents a practical paradigm, called on-the-fly replay. This paradigm consists of running a distributed program twice at the same time: an original computation is running in a regular fashion, which also includes steps of making non-deterministic choices; this execution is driving a twin execution, whose non-deterministic choices do not have to be evaluated (since they are taken from t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance model for a prioritized multiple-bus multiprocessor system

    Publication Year: 1994, Page(s):577 - 584
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (488 KB)

    The performance of a shared memory multiprocessor system with a multiple-bus interconnection network is studied in this paper. The effect of bus and memory contention is modeled. An analytical model to evaluate the acceptance probability of each processor in such a system is presented. It is assumed that each processor in the system has a distinct priority assigned to it and that arbitration is ba... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Characterization of applications with I/O for processor scheduling in multiprogrammed parallel systems

    Publication Year: 1994, Page(s):298 - 307
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (764 KB)

    Most studies of processor scheduling in multiprogrammed parallel systems have ignored the I/O performed by applications. Recent studies have demonstrated that significant I/O operations are performed by a number of different classes of parallel applications. This paper focuses on some basic issues that underlie scheduling in multiprogrammed parallel environments running applications with I/O. Char... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel bidirectional heuristic search on the EM-4 multiprocessor

    Publication Year: 1994, Page(s):100 - 107
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (648 KB)

    Solving search problems takes a large amount of computational resources both in terms of execution time and memory usage. This report presents experimental results of Parallel Bidirectional Heuristic Search (PBiHS) on the 80-processor EM-4 multithreaded data-flow multiprocessor. The PBiHS searches from two directions in parallel while search in each direction is also performed in parallel. Importa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Election on faulty rings with incomplete size information

    Publication Year: 1994, Page(s):232 - 239
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (640 KB)

    This paper considers the election problem on asynchronous ring networks in which one link may undetectably fail. It is assumed that the size of the ring in inexact form is known to the processes, i.e., a lower and/or upper bound. For the case of u/2<l⩽u, where l is a lower bound and u is an upper bound on ring size, an optimal algorithm of worst case message complexity O(nlog n) is given. F... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous access renegable priority queues

    Publication Year: 1994, Page(s):370 - 376
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (508 KB)

    A renegable priority queue has been designed on two different types of network. The first design uses hypercube networks, and has a response time and a pipeline cycle time O(log p), where p is the maximum number of processors that may access the design simultaneously. The second design uses reconfigurable meshes with both response time and pipeline cycle time being constants. Each of these designs... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient submesh permutations in wormhole-routed meshes

    Publication Year: 1994, Page(s):672 - 678
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (568 KB)

    This paper studies how to concurrently permute related logical or physical submeshes in a d-dimensional n×…×n physical mesh via wormhole and dimension-ordered routing. Our objective is to minimize the congestion for realizing the permutations, while maximizing the number and dimensionality of permuted submeshes. We show that for d⩽2α+β, concurrent independent perm... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computational bounds for the simple and the MRMW PRAM

    Publication Year: 1994, Page(s):552 - 557
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (404 KB)

    We define the simple PRAM, where consecutive transfers between memory and processors cannot be done in a single step. Some acclaimed “surprising” results in PRAM theory, as computing the OR of n bits in less that log2n steps, are proved not to hold in the new model, and are replaced with more natural results. In particular, the OR can be computed in log2n+O(1) ste... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An adaptive SOR algorithm and its parallel implementation for power system applications

    Publication Year: 1994, Page(s):84 - 91
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (648 KB)

    In our earlier papers,we investigated the parallelization and implementation of Gauss-Seidel (G-S) and Successive Overrelaxation (SOR) power flow analysis on shared memory, (SM) and distributed (DM) machines. For the SOR case, constant acceleration factors obtained from experiments are used to speedup convergence. In this paper, we introduce a new adaptive nonlinear SOR (ANSOR) algorithm which use... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.