Notice
There is currently an issue with the citation download feature. Learn more

Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192)

18-18 Oct. 1998

Filter Results

Displaying Results 1 - 25 of 53
  • Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192)

    Publication Year: 1998
    Request permission for commercial reuse | PDF file iconPDF (89 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 1998, Page(s):v - viii
    Request permission for commercial reuse | PDF file iconPDF (205 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1998, Page(s):434 - 435
    Request permission for commercial reuse | PDF file iconPDF (210 KB)
    Freely Available from IEEE
  • A new heuristic for scheduling parallel programs on multiprocessor

    Publication Year: 1998, Page(s):358 - 365
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (92 KB)

    In this paper we present an efficient algorithm, called CASS-II, for task clustering without task duplication. Unlike the DSC algorithm, which is empirically the best known algorithm to date in terms of both speed and solution quality, CASS-II uses only limited “global” information and does not recompute the critical path in each refinement step. Therefore, the algorithm runs in O(|E|+... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Code generation in the polytope model

    Publication Year: 1998, Page(s):106 - 111
    Cited by:  Papers (10)  |  Patents (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (88 KB)

    Automatic parallelization of nested loops, based on a mathematical model, the polytope model, has been improved significantly over the last decade: state-of-the-art methods allow flexible distributions of computations in space and time, which lead to high-quality parallelism. However, these methods have not found their way into practical parallelizing compilers due to the lack of code generation s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic hammock predication for non-predicated instruction set architectures

    Publication Year: 1998, Page(s):278 - 285
    Cited by:  Papers (24)  |  Patents (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (84 KB)

    Conventional speculative architectures use branch prediction to evaluate the most likely execution path during program execution. However certain branches are difficult to predict. One solution to this problem is to evaluate both paths following such a conditional branch. Predicated execution can be used to implement this form of multi-path execution. Predicated architectures fetch and issue instr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • General parallel computation can be performed with a cycle-free heap

    Publication Year: 1998, Page(s):96 - 103
    Cited by:  Papers (1)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (80 KB)

    We argue that a powerful and general programming model for parallel computation exists that honors the principles of modular software construction, but disallows the formation of heap cycles. We believe this cycle-free frame and heap model can be used as the basis for a new species of computer systems that satisfies all principles of modular software construction and offers performance and program... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrated compilation and scalability analysis for parallel systems

    Publication Year: 1998, Page(s):385 - 392
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (100 KB)

    Despite the performance potential of parallel systems, several factors have hindered their widespread adoption. Of these, performance variability is among the most significant. Data parallel languages, which facilitate the programming of those systems, increase the semantic distance between the program's source code and its observable performance, thus aggravating the optimization problem. In this... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized code generation for heterogeneous computing environment using parallelizing compiler TINPAR

    Publication Year: 1998, Page(s):426 - 433
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (76 KB)

    This paper presents a compiling technique to generate optimized codes for heterogeneous computing environment. This paper also proposes a new dynamic load redistribution mechanism which can adaptively and dynamically distribute tasks among computers according to their available computing power which may vary during the computation. As the results of the performance evaluation, we could confirm tha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The START-VOYAGER parallel system

    Publication Year: 1998, Page(s):185 - 194
    Cited by:  Papers (4)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (152 KB)

    This paper presents the communication architecture of the START-VOYAGER system, a parallel machine composed of a cluster of unmodified IBM 604e-based SMP's connected via a high speed interconnection network. A custom network interface unit (NIU) plugs into a processor card slot of each SMP, providing a high-performance message passing substrate that supports both fast user-level message passing an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optical versus electronic bus for address-transactions in future SMP architectures

    Publication Year: 1998, Page(s):22 - 29
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (392 KB)

    The fast evolution of processor performance necessitates a permanent evolution of all the multiprocessor components, even for small to medium-scale symmetric multiprocessors (SMP) build around shared busses. This kind of multiprocessor is especially attractive because the problem of data coherency in caches can be solved by a class of snooping protocols specific to these shared-bus architecture. B... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of message-passing systems using a zero-copy communication protocol

    Publication Year: 1998, Page(s):264 - 271
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (92 KB)

    Despite technological advantages in microprocessors and network technology over the last few years, commercially-available networks of workstations (NOW's) contain inherent communication bottlenecks. Traditional layered network protocols will inevitably fail to achieve high throughput if they access data several times. As a result, applications on NOW often fail to observe the performance speed-up... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Split last-address predictor

    Publication Year: 1998, Page(s):230 - 237
    Cited by:  Papers (3)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    Recent works have proposed the use of prediction techniques to execute speculatively true data-dependent operations. However, the predictability of the operations do not spread uniformly among them. Then, we propose the use of run-time classification of instructions to increase the efficiency of the predictors. At run time, the proposed mechanism classifies instructions according to their predicta... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Athapascan-1: On-line building data flow graph in a parallel language

    Publication Year: 1998, Page(s):88 - 95
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    In order to achieve practical efficient execution on a parallel architecture, a knowledge of the data dependencies related to the application appears as the key point for building an efficient schedule. By restricting accesses in shared memory, we show that such a data dependency graph can be computed on-line on a distributed architecture. The overhead introduced is bounded with respect to the par... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating loop and data transformations for global optimisation

    Publication Year: 1998, Page(s):12 - 19
    Cited by:  Papers (20)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (108 KB)

    This paper is concerned with integrating global data transformations and local loop transformations in order to minimise overhead on distributed shared memory machines such as the SGi Origin 2000. By first developing an extended algebraic transformation framework, a new technique to allow the static application of global data transformations, such as partitioning, to reshaped arrays is presented, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data dependence analysis of assembly code

    Publication Year: 1998, Page(s):340 - 347
    Cited by:  Papers (5)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (112 KB)

    Determination of data dependences is a task typically performed with high-level language source code in today 's optimizing and parallelizing compilers. Very little work has been done in the field of data dependence analysis on assembly language code, but this area will be of growing importance, e.g. for increasing ILP. A central element of a data dependence analysis in this case is a method for m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Capturing the effects of code improving transformations

    Publication Year: 1998, Page(s):118 - 123
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (152 KB)

    Symbolic debugging of transformed code requires information about the impact of applying transformations on statement instances so that the appropriate values can be displayed to a user. We present a technique to automatically identify statement instance correspondences between untransformed and transformed code and generate mappings reflecting these correspondences as code improving transformatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A direct-execution framework for fast and accurate simulation of superscalar processors

    Publication Year: 1998, Page(s):286 - 293
    Cited by:  Papers (29)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (184 KB)

    Multiprocessor system evaluation has traditionally been based on direct-execution based Execution-Driven Simulations (EDS). In such environments, the processor component of the system is not fully modeled. With wide issue superscalar processors being the norm in today's multiprocessor nodes, there is an urgent need for modeling the processor accurately. However, using direct execution to model a s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving compiler and run-time support for adaptive irregular codes

    Publication Year: 1998, Page(s):393 - 400
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    Irregular reductions form the core of adaptive irregular codes. On distributed-memory multiprocessors, they are parallelized either using sophisticated run-time systems (e.g., CHAOS, PILAR) or the shared-memory interface supported by software DSMs (e.g., GYM, TreadMarks). We introduce LOCALWRITE, a new technique based on the owner-computes rule which eliminates the need for buffers or synchronized... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting fine- and coarse-grain parallelism in embedded programs

    Publication Year: 1998, Page(s):60 - 67
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (120 KB)

    Due to the technological advances, mapping of embedded applications onto single-chip multiprocessor systems becomes a feasible and very interesting option. What is needed is an environment that supports the designer in transforming an algorithmic specification into a suitable parallel implementation. In this paper we present the results of our experiments with one such an environment, which we dev... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Command vector memory systems: high performance at low cost

    Publication Year: 1998, Page(s):68 - 77
    Cited by:  Papers (24)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2276 KB)

    The focus of this paper is on designing both a low cost and high performance, high bandwidth vector memory system that takes advantage of modern commodity SDRAM memory chips. To successfully extract the full bandwidth from SDRAM parts, we propose a new memory system organization based on sending commands to the memory system as opposed to sending individual addresses. A command specifies, in a few... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting method-level parallelism in single-threaded Java programs

    Publication Year: 1998, Page(s):176 - 184
    Cited by:  Papers (15)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (52 KB)

    Method speculation of object-oriented programs attempts to exploit method-level parallelism (MLP) by executing sequential method invocations in parallel, while still maintaining correct sequential ordering of data dependencies and memory accesses. In this paper, we show why the Java virtual machine is an effective environment for exploiting method-level parallelism, and demonstrate how method spec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive receiver notification for non-dedicated workstation clusters

    Publication Year: 1998, Page(s):256 - 263
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (100 KB)

    Efficient communication in a NOW environment can be a challenging task. Depending on the application, the architecture of the nodes and the characteristics of other processes running on the nodes, different communication strategies can be appropriate. In this paper, we evaluate an adaptive scheme which selects between multiple communication strategies depending on the current situation. We focus o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load balancing in individual-based spatial applications

    Publication Year: 1998, Page(s):350 - 357
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (120 KB)

    Individual-based spatial simulations are a class of applications in which a collection of entities interact locally with one another within a simulated space to generate some global collective behavior. An Eulerian implementation of such a system, partitions the simulated space and assigns each partition, together with the corresponding entities, to a different physical node. Load balancing is ach... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Static methods in hybrid branch prediction

    Publication Year: 1998, Page(s):222 - 229
    Cited by:  Papers (9)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    Hybrid branch predictors combine the predictions of multiple single-level or two-level branch predictors. The prediction-combining hardware-the “meta-predictor”-may itself be large, complex and slow. We show that the combination function is better performed statically, using prediction hints in the branch instructions. The hints are set by profiling or static analysis. Although the met... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.