Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999

12-16 April 1999

Filter Results

Displaying Results 1 - 25 of 116
  • Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999

    Publication Year: 1999
    Request permission for reuse | PDF file iconPDF (467 KB)
    Freely Available from IEEE
  • The characterization of data-accumulating algorithms

    Publication Year: 1999, Page(s):2 - 6
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (98 KB)

    A data-accumulating algorithm (d-algorithm for short) works on an input considered as a virtually endless stream. The computation terminates when all the currently arrived data have been processed before another datum arrives. In this paper the class of d-algorithms is characterized. It is shown that this class is identical to the class of on-line algorithms. The parallel implementation of d-algor... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Prefix computations on symmetric multiprocessors

    Publication Year: 1999, Page(s):7 - 13
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (178 KB)

    We introduce a new optimal prefix computation algorithm on linked lists which builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Moreover, whereas Reid-Miller and Blelloch (1996) targeted their algorithm for imp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing I/O complexity by simulating coarse grained parallel algorithms

    Publication Year: 1999, Page(s):14 - 20
    Cited by:  Papers (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (147 KB)

    Block-wise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully parallel disk I/O. In this paper we present a deterministic simulation technique which transforms parallel algorithms into (parallel) external memory algorithm. Specifically; we present a deterministic simulation technique w... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lower bounds on the loading of degree-2 multiple bus networks for binary-tree algorithms

    Publication Year: 1999, Page(s):21 - 25
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (238 KB)

    A binary-tree algorithm, Bin(n), proceeds level-by-level from the leaves of a 2/sup n/-leaf balanced binary tree to its root. This paper deals with running binary-tree algorithms on multiple bus networks (MBNs) in which processors communicate via buses. Every "binary-tree MBN" has a degree (maximum number of buses connected to a processor) of at least 2. There exists a degree-2 MBN for Bin(n) that... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A time-optimal solution for the path cover problem on cographs

    Publication Year: 1999, Page(s):26 - 30
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    We show that the notoriously difficult problem of finding and reporting the smallest number of vertex-disjoint paths that cover the vertices of a graph can be solved time- and work-optimally for cographs. Our algorithm solves this problem in O(log n) time using n/log n processors on the EREW-PRAM for an n-vertex cograph G represented by its cotree. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system

    Publication Year: 1999, Page(s):31 - 35
    Cited by:  Papers (8)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (141 KB)

    The known fast sequential algorithms for multiplying two N/spl times/N matrices (over an arbitrary ring) have time complexity O(N/sup /spl alpha//), where 2</spl alpha/<3. The current best value of /spl alpha/ is less than 2.3755. We show that for all 1/spl les/p/spl les/N/sup /spl alpha//, multiplying two N/spl times/N matrices can be performed on a p-processor linear array with a reconfigu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving collective I/O performance using threads

    Publication Year: 1999, Page(s):38 - 45
    Cited by:  Papers (17)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (97 KB)

    Massively parallel computers are increasingly being used to solve large, I/O intensive applications in many different fields. For such applications, the I/O requirements quite often present a significant obstacle in the way of achieving good performance, and an important area of current research is the development of techniques by which these costs can be reduced. One such approach is collective I... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear aggressive prefetching: a way to increase the performance of cooperative caches

    Publication Year: 1999, Page(s):46 - 54
    Cited by:  Papers (4)  |  Patents (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (87 KB)

    Cooperative caches offer huge amounts of caching memory that is not always used as well as it could be. We might find blocks in the cache that have not been requested for many hours. These blocks will hardly improve the performance of the system while the buffers they occupy could be better used to speedup the I/O operations. In this paper, we present a family of simple prefetching algorithms that... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hiding communication latency in reconfigurable message-passing environments

    Publication Year: 1999, Page(s):55 - 60
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (99 KB)

    Communication overhead is one of the most important factors affecting the performance of message passing multicomputers. We present evidence (through the analysis of several parallel benchmarks) that there exists communications locality, and that this locality is "structured". We have devised a number of heuristics that can "predict" the target of subsequent communication requests. This technique,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The impact of memory hierarchies on cluster computing

    Publication Year: 1999, Page(s):61 - 69
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (280 KB)

    Using of-the-shelf commodity workstations and PCs to build a cluster for parallel computing has become a common practice. A choice of a cost-effective cluster computing platform for a given budget and for certain types of application workloads is mainly determined by its memory hierarchy and interconnection network of the cluster. Finding such a solution from exhaustive simulations would be highly... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A factorial performance evaluation for hierarchical memory systems

    Publication Year: 1999, Page(s):70 - 74
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (55 KB)

    We introduce an evaluation methodology for advanced memory systems. This methodology is based on statistical factorial analysis. It is two-fold: it first determines the impact of memory systems and application programs toward overall performance; it also identifies the bottleneck in a memory hierarchy and provides cost/performance comparisons via scalability analysis. Different memory systems can ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A performance model of speculative prefetching in distributed information systems

    Publication Year: 1999, Page(s):75 - 80
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (381 KB)

    Previous studies in speculative prefetching focus on building and evaluating access models for the purpose of access prediction. The paper investigates a complementary area which has been largely ignored, that of performance modelling. We use improvement in access time as the performance metric, for which we derive a formula in terms of resource parameters (time available and time required for pre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Run-time selection of block size in pipelined parallel programs

    Publication Year: 1999, Page(s):82 - 87
    Cited by:  Papers (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (230 KB)

    Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where cross-processor data dependencies can inhibit efficient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor (node) performs its computa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing parallel overheads through dynamic serialization

    Publication Year: 1999, Page(s):88 - 92
    Cited by:  Papers (12)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (164 KB)

    If parallelism can be successfully exploited in a program, significant reductions in execution time can be achieved. However, if sections of the code are dominated by parallel overheads, the overall program performance can degrade. We propose a framework, based on an inspector-executor model, for identifying loops that are dominated by parallel overheads and dynamically serializing these loops. We... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using channels for multimedia communication

    Publication Year: 1999, Page(s):93 - 98
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    We present a paradigm to express streams and its implementation. Streams are a convenient mechanism to communicate multimedia data, for example video or audio, between systems. The paradigm is based on channels, as found in CSP and Occam, but with two important modifications. First, our Flexible Channels can be connected dynamically, and passed around as first class objects. Second, although synch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Paderborn university BSP (PUB) library-design, implementation and performance

    Publication Year: 1999, Page(s):99 - 104
    Cited by:  Papers (6)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (92 KB)

    The Paderborn University BSP (PUB) library is a parallel C library based on the BSP model. The basic library supports buffered and unbuffered asynchronous communication between any pair of processors, and a mechanism for synchronizing the processors in a barrier style. In addition, it provides routines for collective communication on arbitrary subsets of processors, partition operations, and a zer... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A capabilities based communication model for high-performance distributed applications: The Open HPC++ approach

    Publication Year: 1999, Page(s):105 - 109
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (54 KB)

    Typical high-performance distributed applications consist of clients accessing computational and information resources implemented by remote sewers. Different clients may have different requirements for accessing a single server resource. A server resource may also want to provide different kinds of accesses for different clients, depending on factors such as the amount of trust between the server... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Average-case analysis of isospeed scalability of parallel computations on multiprocessors

    Publication Year: 1999, Page(s):112 - 116
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (137 KB)

    We investigate the average-case speed and scalability of parallel algorithms executing on multiprocessors. Our performance metrics are average-speed and isospeed scalability. By modeling parallel algorithms on multiprocessors using task precedence graphs, we are mainly interested in the effects of synchronization overhead and load imbalance on the performance of parallel computations. Thus, we foc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fully-scalable fault-tolerant simulations for BSP and CGM

    Publication Year: 1999, Page(s):117 - 124
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (199 KB)

    In this paper we consider general simulations of algorithms designed for fully operational BSP and CGM machines on machines with faulty processors. The faults are deterministic (i.e., worst-case distributions of faults are considered) and static (i.e., they do not change in the course of computation). We assume that a constant fraction of processors are faulty. We present a deterministic simulatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coarse grained parallel maximum matching in convex bipartite graphs

    Publication Year: 1999, Page(s):125 - 129
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (99 KB)

    We present a coarse grained parallel algorithm for computing a maximum matching in a convex bipartite graph G=(A,B,E). For p processors with N/p memory per processor, N=|A|+|B|,N/p/spl ges/p, the algorithm requires O(log p) communication rounds and O(T/sub sequ/(n/p,m/p)+n/p log p) local computation, where n=|A|,m=|B| and T/sub sequ/(n,m) is the sequential time complexity for the problem. For the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Experimental evaluation of QSM, a simple shared-memory model

    Publication Year: 1999, Page(s):130 - 136
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    Parallel programming models should attempt to satisfy two conflicting goals. On one hand, they should hide architectural details so that algorithm designers can write simple, portable programs. On the other hand, models must expose architectural details so that designers can evaluate and optimize the performance of their algorithms. In this paper we experimentally examine the trade-offs made by a ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A consistent history link connectivity protocol

    Publication Year: 1999, Page(s):138 - 142
    Cited by:  Papers (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (73 KB)

    The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating reliable distributed systems by leveraging commercially available personal computers and interconnect technologies. Fault-tolerance is introduced into the communication infrastructure by using multiple network interfaces per compute node. When using multiple network connections per compute node, the question ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation of the ServerNet(R) SAN under self-similar traffic

    Publication Year: 1999, Page(s):143 - 147
    Cited by:  Papers (12)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (134 KB)

    Self-similar traffic distributions have been observed in a wide range of networking applications and models such as LANs, WANs, telnet, FTP, WWW, ISDN, SS7 and VBR traffic over ATM. Therefore, it has been suggested that many other theoretical protocols and systems need to be reevaluated under this different type of traffic before practical implementations potentially show their faults. The ServerN... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-latency message passing on workstation clusters using SCRAMNet

    Publication Year: 1999, Page(s):148 - 152
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (69 KB)

    Clusters of workstations have emerged as a popular platform for parallel and distributed computing. Commodity high speed networks which are used to connect workstation clusters provide high bandwidth, but also have high latency. SCRAMNet is an extremely low latency replicated non-coherent shared memory, network, so far used only for real-time applications. This paper reports our early experiences ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.