Proceedings of 8th International Parallel Processing Symposium

26-29 April 1994

Filter Results

Displaying Results 1 - 25 of 138
  • Proceedings of 8th International Parallel Processing Symposium

    Publication Year: 1994
    Request permission for commercial reuse | PDF file iconPDF (26 KB)
    Freely Available from IEEE
  • Latency hiding in message-passing architectures

    Publication Year: 1994, Page(s):704 - 709
    Cited by:  Papers (22)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    The paper demonstrates the advantages of having two processors in the node of a distributed memory architecture, one for computation and one for communication. The architecture of such a dual-processor node is discussed. To exploit fully the potential for parallel execution of computation threads and communication threads, a novel, compiler-optimized IPC mechanism allows for an unbuffered no-wait ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Information sharing mechanisms in parallel programs

    Publication Year: 1994, Page(s):461 - 468
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (768 KB)

    Most parallel programming models provide a single generic mode in which processes can exchange information with each other. However, empirical observation of parallel programs suggests that processes share data in a few distinct and specific modes. We argue that such modes should be identified and explicitly supported in parallel languages and their associated models. The paper describes a set of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The generalized class of g-chain periodic sorting networks

    Publication Year: 1994, Page(s):424 - 432
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (572 KB)

    A periodic sorter is a sorting network which is a cascade of a number of identical blocks, where output i of each block is input i of the next block. Previously, (Dowd et al., 1989) introduced the balanced merging network, with N=2k inputs/outputs and log N stages of comparators. Using an intricate proof, they showed that a cascade of log N such blocks constitutes a sorting network. We ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel implementation of a model-based spectral estimator for Doppler blood flow instrumentation

    Publication Year: 1994, Page(s):810 - 814
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (392 KB)

    This paper presents a parallel implementation of a model-based spectral estimation method, based on the modified covariance parametric algorithm. This method has been implemented in order to take advantage of its resolution benefits when it is applied to spectral estimation in Doppler blood flow instrumentation. As the algorithm is computationally intensive and must be executed in realtime, parall... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Queue locks on cache coherent multiprocessors

    Publication Year: 1994, Page(s):165 - 171
    Cited by:  Papers (34)  |  Patents (19)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (548 KB)

    Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A key issue for execution performance of many common applications is the synchronization cost. The communication scalability of synchronization has been improved by the introduction of queue-based spin-locks instead of Test&(Test&Set). For architectures with long access latencies for global da... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of pC++: a portable data-parallel programming system for scalable parallel computers

    Publication Year: 1994, Page(s):75 - 84
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (836 KB)

    pC++ is a language extension to C++ designed to allow programmers to compose distributed data structures with parallel execution semantics. These data structures are organized as “concurrent aggregate” collection classes which can be aligned and distributed over the memory hierarchy of a parallel machine in a manner consistent with the High Performance Fortran Forum (HPF) directives fo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient embedding K-ary complete trees into hypercubes

    Publication Year: 1994, Page(s):710 - 714
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (376 KB)

    Dilated embedding and precise embedding of K-ary complete trees into hypercubes are studied. For dilated embedding, a nearly optimal algorithm is proposed which embeds a K-ary complete tree of height h, T K(h), into an (h-1)[logK]+[log(K+2)] dimensional hypercube with dilation max(2, φ(K)*), φ(K+2). For precise embedding, we show a (K-1)h+1 dimensional hypercube is large enough ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Processor mapping techniques toward efficient data redistribution

    Publication Year: 1994, Page(s):469 - 476
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (660 KB)

    Run-time data redistribution can affect algorithm performance in distributed-memory machines. Redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Additionally, data redistribution can occur at subprogram boundaries. Redistribution, however, represents increased progr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Priority based real-time communication for large scale wormhole networks

    Publication Year: 1994, Page(s):433 - 438
    Cited by:  Papers (25)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (912 KB)

    As advances are made in parallel processing technology, an increasing number of real-time applications are being developed for large-scale parallel processors. Since the wormhole network is a popular communication system used in the new generation of large-scale parallel multiprocessors, real-time communication support on wormhole networks becomes an important issue. We evaluate a priority mapping... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A verified integration of parallel programming paradigms in CC++

    Publication Year: 1994, Page(s):44 - 50
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (468 KB)

    CC++ is an object-oriented parallel programming language that uses parallel composition, atomic functions, and single-assignment variables to express concurrency. We show that this programming paradigm is equivalent to several traditional imperative communication and synchronization models, namely monitors and asynchronous channels. Furthermore, the object-oriented nature of CC++ provides an ideal... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the parallel implementation of OSI protocol processing systems

    Publication Year: 1994, Page(s):815 - 819
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (344 KB)

    In a heterogeneous computing environment, computers have to use a suitable transfer syntax to communicate with each other because of the differences in internal data representations. Transfer syntax conversions take over 90% of the total processing power needed in OSI protocol processing. Application specific architectures in a heterogeneous system may not be efficient in performing the protocol p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel dictionaries on AVL trees

    Publication Year: 1994, Page(s):878 - 882
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (480 KB)

    AVL trees are efficient data structures for implementing dictionaries. We present a parallel dictionary, using AVL trees, on the EREW-PRAM by proposing optimal algorithms to perform k operations with p (1⩽p⩽k) processors. An explicit processor scheduling is devised to avoid simultaneous reads in our parallel algorithm to perform k searches, which avoids the need for any additional memory i... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation of task grain programs

    Publication Year: 1994, Page(s):644 - 648
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (400 KB)

    Behavioral programs are graph like objects that describe the execution of parallel programs supplied with given inputs. They quantify the amount of computation a run entails and outline the run time data dependencies. Other characteristics of the real machine (e.g. the multiprocessor management overhead oh, the communication delay dy, or the round robin time quanta tq) further affect performance d... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Solving static optimal matching problem in heterogeneous processing with generalized stable marriage algorithms

    Publication Year: 1994, Page(s):248 - 252
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (456 KB)

    Concepts and algorithms from a generalization of the stable marriage problem are used to optimally match machine features to task computational requirements in heterogeneous processing. Given a bilateral group-to-group linear preference order and a linear objective function, we can always find a one-to-one matching of machines and tasks in which no machine and task jointly prefer each other to the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal mesh computer algorithm for constrained Delaunay triangulation

    Publication Year: 1994, Page(s):102 - 109
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (540 KB)

    We present an optimal parallel algorithm that runs in O(√n) time on a √n×√n mesh to compute the constrained Delaunay triangulation of a planar straight line graph G whose vertices lie in an n-element set S. Implications of our result also include an efficient PRAM algorithm for the same problem, a new optimal mesh algorithm to compute a planar Voronoi diagram, as well as a ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The effect of speculative execution on cache performance

    Publication Year: 1994, Page(s):172 - 179
    Cited by:  Papers (12)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (676 KB)

    Superscalar microprocessors obtain high performance by exploiting parallelism at the instruction level. To effectively use the instruction-level parallelism found in general purpose, non-numeric code, future processors will need to speculatively execute far beyond instruction fetch limiting conditional branches. One result of this deep speculation is an increase in the number of instruction and da... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A scalable MIMD volume rendering algorithm

    Publication Year: 1994, Page(s):916 - 920
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (440 KB)

    Volume rendering is a compute intensive graphics algorithm with wide application. Researchers have sought to speed it up using parallel computers. The algorithm distributes the data for storage efficiency, avoids bottlenecks, and scales to more processors than rays. The main contribution is explicit partitioning of the input volume for higher memory utilization, while retaining viewpoint freedom a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computational geometry on a reconfigurable mesh

    Publication Year: 1994, Page(s):86 - 93
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (596 KB)

    We develop O(1) time algorithms to compute the 3D maxima, convex hull, smallest enclosing box, and ECDF of a set of planar points. The algorithms are for the reconfigurable mesh with buses (RMESH) architecture and run on the RMESH, PARBUS (processor array with a reconfigurable bus system), and MRN models View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new combinatorial approach to optimal embeddings of rectangles

    Publication Year: 1994, Page(s):715 - 722
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (600 KB)

    An important problem in graph embeddings and parallel computing is to embed a rectangular grid into other graphs. We present a novel, general, combinatorial approach to (one-to-one) embedding rectangular grids into their ideal rectangular grids and optimal hypercubes. In contrast to earlier approaches of Aleliunas and Rosenberg, and Ellis (1982), our approach is based on a special kind of doubly s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Redundant synchronization elimination for DOACROSS loops

    Publication Year: 1994, Page(s):477 - 481
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (388 KB)

    Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. In this paper, we investigate the problem of redundant synchronization elimination in DOACROSS loops and present an algorithm that identifies redundant synchronizations in doubl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HyperC: portable parallel programming in C

    Publication Year: 1994, Page(s):682 - 687
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (392 KB)

    We introduce the HyperC language, a data parallel extension of C intended for portability over a wide range of architectures. We present the main topics of the language: the explicit parallelism through the data, the synchronous semantics and the parallel flow control that allows asynchronous execution, new function qualifiers to emphasize locality properties code and, finally, new communication t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fuzzy communication for guided loop scheduling in multicomputers

    Publication Year: 1994, Page(s):439 - 443
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (344 KB)

    We propose the use of guided loop scheduling and fuzzy communications to map shared-variable communications into message passing operations among multicomputers. The mapping mechanism converts scalar message passing operations into multiple broadcast or multiple multicast operations. The proposed method is evaluated by both simulation experiments and theoretical analysis. The performance results, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallelization of linearized applications in Fortran D

    Publication Year: 1994, Page(s):51 - 60
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (760 KB)

    Fortran D extends Fortran to parallel computers via specification of the distribution of array variables across processors. When multidimensional arrays have been linearized for optimal performance on vector processors, Fortran D cannot produce the best parallelization because it is limited to one-dimensional distribution, which is less efficient due to surface-to-volume effects. We propose Fortra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low cost complexity of a general multicast network

    Publication Year: 1994, Page(s):23 - 29
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (592 KB)

    This paper presents a new multicasting network constructed with a bit-level cost complexity of O(N log N) and a bit-level time complexity of O(log 2 N) using comparators with bit-level O(1) time and cost complexities. The requested addresses for connection and the addresses of the source nodes to be connected to, are sorted together in a pipeline fashion (worm-hole routed) bit-serially most-signif... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.