Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)

20-20 Dec. 1998

Filter Results

Displaying Results 1 - 25 of 63
  • Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)

    Publication Year: 1998
    Request permission for reuse | PDF file iconPDF (95 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 1998, Page(s):v - x
    Request permission for reuse | PDF file iconPDF (250 KB)
    Freely Available from IEEE
  • New number representation and conversion techniques on reconfigurable mesh

    Publication Year: 1998, Page(s):2 - 10
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (194 KB)

    Several new number representations based on the residue number system are presented which use the smallest prime numbers as moduli and are suited for parallel computations on a reconfigurable mesh architecture. It is shown how to convert in O(1) time any integer ranging between 0 and n-1, from any commonly used representation to any new representation proposed in the paper (and vice versa) using a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Precise control of instruction caches

    Publication Year: 1998, Page(s):11 - 18
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (257 KB)

    Instruction caches are usually designed to fetch the whole block from memory in case of a miss. However, the fetched blocks might contain branch instructions which if taken, will render the rest of the block useless. A novel approach is introduced, namely the Precise Control, which fetches only the words of a cache block that are likely to be used. The performance of Precise Control is evaluated a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • More on arbitrary boundary packed arithmetic

    Publication Year: 1998, Page(s):19 - 24
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (68 KB)

    Recent microprocessors have been enhanced with media instruction sets for accelerating media algorithms. They exploit the fact that media algorithms have small data types, and widths much less than that of the processor. Current media instruction sets support only 8-, 16- and 32-bit sub-datatypes. This scheme is inefficient in several applications where bit lengths of 9, 12 and so on are used. We ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data prefetching with co-operative caching

    Publication Year: 1998, Page(s):25 - 32
    Cited by:  Papers (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1268 KB)

    Recent research in data cache prefetching is found to be selective in nature: achieving high prediction accuracy over a set of selected references such as array access with constant strides. As a result, for applications where the memory latency is mainly due to data accesses in the set of non selected references of a program, they lose their effectiveness. In fact, their performance might be wors... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PERL-a registerless architecture

    Publication Year: 1998, Page(s):33 - 40
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (154 KB)

    Reducing processor memory speed gap is one of the major challenges computer architects face today. Efficient use of CPU registers reduces the number of memory accesses. However, registers do incur extra overhead of load/store, register allocation and saving of register context across procedure calls. Caches however do not have any such overheads and cache technology has matured to the extent that ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design alternatives for shared memory multiprocessors

    Publication Year: 1998, Page(s):41 - 50
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (263 KB)

    We consider the design alternatives available for building the next generation DSM machine (e.g., the choice of memory architecture, network technology, and amount and location of per-node remote data cache). To investigate this design space, we have simulated five applications on a wide variety of possible DSM architectures that employ significantly different caching techniques. We also examine t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing a parallel list on the SB-PRAM

    Publication Year: 1998, Page(s):52 - 59
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (189 KB)

    We give a description of a C++ implementation of a dynamic parallel list developed for the SB-PRAM, a massively parallel scalable shared memory computer. We show that access time on the elements stored in the parallel list is comparable with that of a sequential list. The implementation can easily be ported to other shared memory platforms supporting fast locking mechanisms and parallel prefix ope... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A simple optimal list ranking algorithm

    Publication Year: 1998, Page(s):60 - 64
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (96 KB)

    We consider the problem of ranking an N element list on a P processor EREW PRAM. Recent work on this problem has shown the importance of grain size. While several optimal O(N/P+log P) time list ranking algorithms are known, Reid-Miller and Blelloch (1994) recently showed that these do not lead to good implementations in practice, because of the fine-grained nature of these algorithms. In Reid-Mill... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel skeletonization algorithm and its VLSI architecture

    Publication Year: 1998, Page(s):65 - 72
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (150 KB)

    This paper presents a new algorithm to extract the skeleton and its Euclidean distance values from a binary image. A VLSI implementation of the algorithm in a locally connected cellular array is also given. The algorithm runs in O(n) time for an image of size n/spl times/n. The extracted skeleton reconstructs the objects in the image exactly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving error bounds for multipole-based treecodes

    Publication Year: 1998, Page(s):73 - 80
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (139 KB)

    Rapid evaluation of potentials in particle systems is an important and time-consuming step in many physical simulations. Over the past decade (1988-98), the development of treecodes such as the Fast Multipole Method (FMM) and the Barnes-Hut method has enabled large scale simulations in domains such as astrophysics, molecular dynamics, and material science. FMM and related methods rely on fixed deg... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computation of penetration measures for convex polygons and polyhedra for graphics applications

    Publication Year: 1998, Page(s):81 - 87
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (118 KB)

    Algorithms to compute measures of penetration between convex polygonal objects in /spl Rfr//sup 2/ and convex polyhedral objects in /spl Rfr//sup 3/ are presented. The algorithms are analyzed for their asymptotic complexity. Details of implementation on a single processor machine are given. Parallelization of the algorithms is discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extrapolation in distributed adaptive integration

    Publication Year: 1998, Page(s):88 - 95
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (127 KB)

    The paper addresses the design of distributed methods which incorporate numerical extrapolation into adaptive multivariate integration, in order to increase the functionality of the integration algorithms. When attempting to deal with singularities, adaptive integration algorithms need a very fine subdivision in the proximity of these "hot spots". This is not practical in higher dimensions where a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data structure distribution and multi-threading of Linux file system for multiprocessors

    Publication Year: 1998, Page(s):97 - 104
    Cited by:  Patents (11)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (235 KB)

    The standard Linux design assumes a uniprocessor architecture. Allowing several processors to execute simultaneously in the kernel mode on behalf of different processes can cause consistency problems unless appropriate exclusion mechanisms are used. In addition, if the file system data structures are not distributed, performance can be affected. We discuss a multiprocessor file system design for L... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping instruction sequences onto EPOM-processor arrays: a framework for parallel data processing

    Publication Year: 1998, Page(s):105 - 113
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (434 KB)

    The paper introduces an optimized mapping methodology for mapping instruction sequences (ISs) onto EPOM-processor arrays. The new features of this mapping methodology result from a systematic specification and exploitation of both instruction and processor level parallelism: ultra-low granularity of ISs requires an allocation and scheduling of individual instructions onto the given processor array... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Java data parallel extensions with runtime system support

    Publication Year: 1998, Page(s):114 - 118
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (167 KB)

    In order to provide Java with the ability for supporting scientific parallel computing, we introduce a data parallel extension to Java language with runtime system support. We provide the distributed array extension to Java, and discuss the related operation and control over the new distributed array. Communication involving distributed arrays are handles through a standard of a collective communi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A general distributed event model

    Publication Year: 1998, Page(s):119 - 123
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (84 KB)

    This paper identifies some of the issues that need to be explored in systems that support content-based delivery. Distributed systems today are based on address-based delivery of messages. Each message carries the unique addresses of the intended recipients. The idea in content-based delivery is that information is delivered to those objects that have subscribed for that information. The ideas on ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Apportioning: a technique for efficient reachability analysis of concurrent object-oriented programs

    Publication Year: 1998, Page(s):124 - 131
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (246 KB)

    The object-oriented paradigm has been found to be useful for the construction of large and complex concurrent systems. Reachability analysis is an important and well-known tool for static (pre-run-time) analysis of concurrent programs. However, direct application of traditional reachability analysis to concurrent object-oriented programs has many problems, such as incomplete analysis for reusable ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient address sequence generation for two-level mappings in High Performance Fortran

    Publication Year: 1998, Page(s):132 - 139
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (202 KB)

    Data-parallel languages like High Performance Fortran allow users to specify mappings of arrays by first aligning elements to an abstract Cartesian grid called templates and then distributing the templates across processors. Code generation then includes the generation of the sequence of local addresses accessed on a processor. Address sequence generation for non-unit alignment strides, referred t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient algorithms for delay-bounded minimum cost path problem in communication networks

    Publication Year: 1998, Page(s):141 - 146
    Cited by:  Papers (5)  |  Patents (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (249 KB)

    As the amount of data transmitted over a network increases and high bandwidth applications requiring point to multipoint communications like videoconferencing, distributed database management or cooperative work become widespread, it becomes very important to optimize network resources. One such optimization is multicast tree generation. The problem of generating a minimum cost multicast tree give... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtual channel multiplexing in networks of workstations with irregular topology

    Publication Year: 1998, Page(s):147 - 154
    Cited by:  Papers (10)  |  Patents (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (125 KB)

    Networks of workstations are becoming a cost-effective alternative for small-scale parallel computing. Although they may not provide the closely coupled environment of multicomputers and multiprocessors, they meet the needs of a great variety of parallel computing problems at a lower cost. However in order to achieve a high efficiency, the interconnects used to build the network of workstations mu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • One to all broadcast in hyper butterfly networks

    Publication Year: 1998, Page(s):155 - 162
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (111 KB)

    The authors further investigate the topological properties of the hyper butterfly networks; they develop algorithms for constructing edge disjoint spanning trees in wrapped butterfly graphs and hyper butterfly networks and they use those results to design asymptotically optimal one-to-all broadcast algorithms in those two classes of networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcasting on a budget in the multi-service communication model

    Publication Year: 1998, Page(s):163 - 170
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (140 KB)

    In this paper we introduce the MULTI_SERVICE model of network communication. This model attempts to capture recent communication technology trends, such as aspects of quality-of-service and their relation to the emerging technology of automatic pricing, e.g. for Internet services. The MULTI_SERVICE model differs from related models by taking communication and service activation time into account, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel algorithms for vehicle routing problems

    Publication Year: 1998, Page(s):171 - 178
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (181 KB)

    In a complete directed weighted graph there are jobs located at nodes of the graph. Job i has an associated processing time or handling time h/sub i/, and the job must start within a prespecified time window [r/sub i/, d/sub i/]. A vehicle can move on the arcs of the graph, at unit speed and that has to execute the jobs within their respective time windows. We consider three different problems on ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.