By Topic

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)

20-20 Dec. 1998

Filter Results

Displaying Results 1 - 25 of 63
  • Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)

    Publication Year: 1998
    Request permission for commercial reuse | PDF file iconPDF (95 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 1998, Page(s):v - x
    Request permission for commercial reuse | PDF file iconPDF (250 KB)
    Freely Available from IEEE
  • A simple optimal list ranking algorithm

    Publication Year: 1998, Page(s):60 - 64
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (104 KB)

    We consider the problem of ranking an N element list on a P processor EREW PRAM. Recent work on this problem has shown the importance of grain size. While several optimal O(N/P+log P) time list ranking algorithms are known, Reid-Miller and Blelloch (1994) recently showed that these do not lead to good implementations in practice, because of the fine-grained nature of these algorithms. In Reid-Mill... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GLB: a low-cost scheduling algorithm for distributed-memory architectures

    Publication Year: 1998, Page(s):294 - 301
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (160 KB)

    This paper proposes a new compile time scheduling algorithm for distributed-memory systems, called Global Load Balancing (GLB). GLB is intended as the second step in the multi-step class of scheduling algorithms. Experimental results show that compared with known scheduling algorithms of the same low-cost complexity, the proposed algorithm improves schedule lengths up to 30%. Compared to algorithm... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple token distributed loop local area networks: analysis

    Publication Year: 1998, Page(s):400 - 407
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (112 KB)

    With increased data rates, the packet transmission time of a LAN could approach or even become less than the medium propagation delay. The performance of many LAN schemes degrades rapidly under these conditions. Generally, the overhead associated with the medium access protocol increases with the increase in propagation time relative to packet transmission time. In token ring networks this overhea... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New number representation and conversion techniques on reconfigurable mesh

    Publication Year: 1998, Page(s):2 - 10
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (200 KB)

    Several new number representations based on the residue number system are presented which use the smallest prime numbers as moduli and are suited for parallel computations on a reconfigurable mesh architecture. It is shown how to convert in O(1) time any integer ranging between 0 and n-1, from any commonly used representation to any new representation proposed in the paper (and vice versa) using a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel algorithms for vehicle routing problems

    Publication Year: 1998, Page(s):171 - 178
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (188 KB)

    In a complete directed weighted graph there are jobs located at nodes of the graph. Job i has an associated processing time or handling time hi, and the job must start within a prespecified time window [ri, di]. A vehicle can move on the arcs of the graph, at unit speed and that has to execute the jobs within their respective time windows. We consider three differe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing a parallel list on the SB-PRAM

    Publication Year: 1998, Page(s):52 - 59
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (196 KB)

    We give a description of a C++ implementation of a dynamic parallel list developed for the SB-PRAM, a massively parallel scalable shared memory computer. We show that access time on the elements stored in the parallel list is comparable with that of a sequential list. The implementation can easily be ported to other shared memory platforms supporting fast locking mechanisms and parallel prefix ope... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Augmented Composite Banyan Network

    Publication Year: 1998, Page(s):285 - 292
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (168 KB)

    A new multipath multistage interconnection network called the Augmented Composite Banyan Network (ACBN) is proposed. The ACBN is created by adding a link to each SE of the Composite Banyan Network (CBN), which is a multipath network with at least two disjoint paths and was originally proposed in (Seo and Feng, 1995). Therefore, the basic building blocks in the ACBN are 4×4 SEs with log2... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Available parallelism with data value prediction

    Publication Year: 1998, Page(s):194 - 201
    Cited by:  Papers (3)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1332 KB)

    Data dependences (data flow constraints) present a major hurdle to the amount of instruction-level parallelism that can be exploited from a program. Recent work has focused on the use of data value prediction to overcome the limits imposed by data dependences. That is, when an instruction is fetched, its result can be predicted so that subsequent instructions that depend on the result can execute ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Measurement-based modeling and analysis methodology for characterizing parallel I/O performance

    Publication Year: 1998, Page(s):391 - 398
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (116 KB)

    A parallel I/O characterization methodology that consists of a hierarchical modeling and measurement analysis environment for investigating I/O performance is presented. The methodology is illustrated via a case study of a video server workload running under the parallel I/O file system (PIOFS) of IBM SP/2. The measurements demonstrate that for video server and read-intensive workloads, spreading ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcasting on a budget in the multi-service communication model

    Publication Year: 1998, Page(s):163 - 170
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (148 KB)

    In this paper we introduce the MULTI_SERVICE model of network communication. This model attempts to capture recent communication technology trends, such as aspects of quality-of-service and their relation to the emerging technology of automatic pricing, e.g. for Internet services. The MULTI_SERVICE model differs from related models by taking communication and service activation time ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design alternatives for shared memory multiprocessors

    Publication Year: 1998, Page(s):41 - 50
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (268 KB)

    We consider the design alternatives available for building the next generation DSM machine (e.g., the choice of memory architecture, network technology, and amount and location of per-node remote data cache). To investigate this design space, we have simulated five applications on a wide variety of possible DSM architectures that employ significantly different caching techniques. We also examine t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A clustering approach in characterizing interconnection networks

    Publication Year: 1998, Page(s):277 - 284
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (204 KB)

    Networks of workstations (NOW) have gained importance in recent years. The interconnection network of NOW systems often consist of generic switches connected in an irregular topology. Traditionally, interconnection networks are characterized by their topological properties, such as number of nodes, diameter, and bisection width. These parameters are not sufficient in characterizing irregular netwo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting image processing locality in cache pre-fetching

    Publication Year: 1998, Page(s):466 - 472
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    Emerging trends in computer design attempt to include specific solutions for handling images also in general-purpose computers, because of the current spread of multimedia, image processing and computer graphics applications. In this context, we propose hardware pre-fetching techniques specific for caching images: the main issue we state is that most algorithms working on images exhibit a 2D spati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A simple mechanism to deal with sequential code in dataflow architectures

    Publication Year: 1998, Page(s):188 - 193
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (132 KB)

    The aim of this work is to propose a simple and efficient mechanism to deal with the problem of executing sequential code in a pure dataflow machine. Our results is obtained with a simulator of Wolf architecture. The implemented mechanism improved the architecture performance when executing sequential code and we expect that this improvement could be better if we use some heuristics to deal with s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory bank disambiguation using modulo unrolling for Raw machines

    Publication Year: 1998, Page(s):212 - 220
    Cited by:  Papers (5)  |  Patents (68)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (100 KB)

    We present modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh of simple, replicated tiles connected by an interconnect which supports fast, static near-neighbor communication. Like all other resources, memory is distributed across the tiles. Management of the memory can b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Message passing support on StarT-Voyager

    Publication Year: 1998, Page(s):228 - 237
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (116 KB)

    No single message passing mechanism can efficiently support all types of communication that commonly occur in most parallel or distributed programs. MIT's StarT-Voyager, a hybrid message passing/shared memory parallel machine, provides four message passing mechanisms to achieve high performance over a wide spectrum of communication types and sizes. Hardware and address translation enforced protect... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping instruction sequences onto EPOM-processor arrays: a framework for parallel data processing

    Publication Year: 1998, Page(s):105 - 113
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (440 KB)

    The paper introduces an optimized mapping methodology for mapping instruction sequences (ISs) onto EPOM-processor arrays. The new features of this mapping methodology result from a systematic specification and exploitation of both instruction and processor level parallelism: ultra-low granularity of ISs requires an allocation and scheduling of individual instructions onto the given processor array... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An improved parallel disk scheduling algorithm

    Publication Year: 1998, Page(s):383 - 390
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (140 KB)

    We address the problems of prefetching and I/O scheduling for read-once reference strings in a parallel I/O system. Read-once reference strings, in which each block is accessed exactly once, arise naturally in applications like databases and video retrieval. Using the standard parallel disk model with D disks and a shared I/O buffer of size M, we present a novel algorithm, red-black prefetching (R... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • How to improve local load balancing policies by distorting load information

    Publication Year: 1998, Page(s):318 - 325
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (140 KB)

    The paper focuses on local load balancing policies for massively parallel architectures and introduces a new scheme for load information exchange between neighbor nodes. The idea is to distort the exchanged load information to let the policy keep into account a more global view of the system and overcome the limits of the local scope. The presented scheme has been integrated into two variants of a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • One to all broadcast in hyper butterfly networks

    Publication Year: 1998, Page(s):155 - 162
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (116 KB)

    The authors further investigate the topological properties of the hyper butterfly networks; they develop algorithms for constructing edge disjoint spanning trees in wrapped butterfly graphs and hyper butterfly networks and they use those results to design asymptotically optimal one-to-all broadcast algorithms in those two classes of networks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extrapolation in distributed adaptive integration

    Publication Year: 1998, Page(s):88 - 95
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (132 KB)

    The paper addresses the design of distributed methods which incorporate numerical extrapolation into adaptive multivariate integration, in order to increase the functionality of the integration algorithms. When attempting to deal with singularities, adaptive integration algorithms need a very fine subdivision in the proximity of these “hot spots”. This is not practical in higher dimens... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PERL-a registerless architecture

    Publication Year: 1998, Page(s):33 - 40
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (160 KB)

    Reducing processor memory speed gap is one of the major challenges computer architects face today. Efficient use of CPU registers reduces the number of memory accesses. However, registers do incur extra overhead of load/store, register allocation and saving of register context across procedure calls. Caches however do not have any such overheads and cache technology has matured to the extent that ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient address sequence generation for two-level mappings in High Performance Fortran

    Publication Year: 1998, Page(s):132 - 139
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (208 KB)

    Data-parallel languages like High Performance Fortran allow users to specify mappings of arrays by first aligning elements to an abstract Cartesian grid called templates and then distributing the templates across processors. Code generation then includes the generation of the sequence of local addresses accessed on a processor. Address sequence generation for non-unit alignment strides, referred t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.