By Topic

Parallel Architectures and Compilation Techniques, 1998. Proceedings. 1998 International Conference on

Date 18-18 Oct. 1998

Filter Results

Displaying Results 1 - 25 of 53
  • Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192)

    Publication Year: 1998
    Request permission for commercial reuse | PDF file iconPDF (89 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 1998, Page(s):v - viii
    Request permission for commercial reuse | PDF file iconPDF (205 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1998, Page(s):434 - 435
    Request permission for commercial reuse | PDF file iconPDF (210 KB)
    Freely Available from IEEE
  • Optimized code generation for heterogeneous computing environment using parallelizing compiler TINPAR

    Publication Year: 1998, Page(s):426 - 433
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (76 KB)

    This paper presents a compiling technique to generate optimized codes for heterogeneous computing environment. This paper also proposes a new dynamic load redistribution mechanism which can adaptively and dynamically distribute tasks among computers according to their available computing power which may vary during the computation. As the results of the performance evaluation, we could confirm tha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MADELEINE: an efficient and portable communication interface for RPC-based multithreaded environments

    Publication Year: 1998, Page(s):240 - 247
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (88 KB)

    Due to their ever-growing success in the development of distributed applications, today's multithreaded environments have to be highly portable and efficient on a large variety of hardware. Most of these environments have an implementation built on top of standard communication interfaces such as PVM or MPI, which are widely available on existing architectures. Obviously, this approach ensures a h... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multithreaded runtime environment with thread migration for a HPF data-parallel compiler

    Publication Year: 1998, Page(s):418 - 425
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (92 KB)

    This paper studies the benefits of compiling data-parallel languages onto a multithreaded runtime environment providing dynamic thread migration facilities. Each abstract process is mapped onto a thread, so that dynamic load balancing can be achieved by migrating threads among the processing nodes. We describe and evaluate an implementation of this idea in the adaptor HPF compiler. We show that no... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design study of shared memory in VLIW video signal processors

    Publication Year: 1998, Page(s):52 - 59
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (88 KB)

    Programmable video signal processors (VSPs) play an important role in multimedia applications due to their high performance and flexibility. In order to exploit the huge amount of parallelism inherent in the applications, VSPs employ aggressive parallel architectures, among which Very Long Instruction Word (VLIW) is becoming increasingly popular. For video signal processing a carefully designed me... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving static branch prediction in a compiler

    Publication Year: 1998, Page(s):214 - 221
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (180 KB)

    An ILP (Instruction-Level Parallelism) compiler uses aggressive optimizations to reduce a program's running time. These optimizations have been shown to be effective when profile information is available. Unfortunately, users are not always willing or able to profile their programs. A method of overcoming this issue is for an ILP compiler to statically infer the information normally obtained from ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallelization of benchmarks for scalable shared-memory multiprocessors

    Publication Year: 1998, Page(s):401 - 408
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (128 KB)

    This paper identifies practical compiling techniques for scalable shared memory machines. For this, we have focused on experimental studies using a real machine and representative codes. In the experiments, we transformed conventional codes to shared memory codes using several existing techniques and ran the output on the target machine to evaluate those techniques and to identify where improvemen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving compiler and run-time support for adaptive irregular codes

    Publication Year: 1998, Page(s):393 - 400
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    Irregular reductions form the core of adaptive irregular codes. On distributed-memory multiprocessors, they are parallelized either using sophisticated run-time systems (e.g., CHAOS, PILAR) or the shared-memory interface supported by software DSMs (e.g., GYM, TreadMarks). We introduce LOCALWRITE, a new technique based on the owner-computes rule which eliminates the need for buffers or synchronized... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Static methods in hybrid branch prediction

    Publication Year: 1998, Page(s):222 - 229
    Cited by:  Papers (8)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    Hybrid branch predictors combine the predictions of multiple single-level or two-level branch predictors. The prediction-combining hardware-the “meta-predictor”-may itself be large, complex and slow. We show that the combination function is better performed statically, using prediction hints in the branch instructions. The hints are set by profiling or static analysis. Although the met... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Command vector memory systems: high performance at low cost

    Publication Year: 1998, Page(s):68 - 77
    Cited by:  Papers (24)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2276 KB)

    The focus of this paper is on designing both a low cost and high performance, high bandwidth vector memory system that takes advantage of modern commodity SDRAM memory chips. To successfully extract the full bandwidth from SDRAM parts, we propose a new memory system organization based on sending commands to the memory system as opposed to sending individual addresses. A command specifies, in a few... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optical versus electronic bus for address-transactions in future SMP architectures

    Publication Year: 1998, Page(s):22 - 29
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (392 KB)

    The fast evolution of processor performance necessitates a permanent evolution of all the multiprocessor components, even for small to medium-scale symmetric multiprocessors (SMP) build around shared busses. This kind of multiprocessor is especially attractive because the problem of data coherency in caches can be solved by a class of snooping protocols specific to these shared-bus architecture. B... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • General parallel computation can be performed with a cycle-free heap

    Publication Year: 1998, Page(s):96 - 103
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (80 KB)

    We argue that a powerful and general programming model for parallel computation exists that honors the principles of modular software construction, but disallows the formation of heap cycles. We believe this cycle-free frame and heap model can be used as the basis for a new species of computer systems that satisfies all principles of modular software construction and offers performance and program... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast interrupt handling scheme for VLIW processors

    Publication Year: 1998, Page(s):136 - 141
    Cited by:  Papers (7)  |  Patents (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (44 KB)

    Interrupt handling in out-of-order execution processors requires complex hardware schemes to maintain the sequential state. The amount of hardware will be substantial in VLIW architectures due to the nature of issuing a very large number of instructions in each cycle. It is hard to implement precise interrupts in out-of-order execution machines, especially in VLIW processors. In this paper, we wil... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrated compilation and scalability analysis for parallel systems

    Publication Year: 1998, Page(s):385 - 392
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (100 KB)

    Despite the performance potential of parallel systems, several factors have hindered their widespread adoption. Of these, performance variability is among the most significant. Data parallel languages, which facilitate the programming of those systems, increase the semantic distance between the program's source code and its observable performance, thus aggravating the optimization problem. In this... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new framework for integrated global local scheduling

    Publication Year: 1998, Page(s):167 - 174
    Cited by:  Papers (5)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (96 KB)

    Global Instruction Schedulers can be classified as either structure or profile driven. Structure driven approaches attempt to find instruction level parallelism by redistributing instructions along all possible execution paths. When resources are limited, poor choices may penalize the frequently executed paths. By contrast, profile driven approaches use feedback information to identify frequently ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient edge profiling for ILP-processors

    Publication Year: 1998, Page(s):294 - 303
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (96 KB)

    Compilers for VLIW and superscalar machines increasingly use dynamic application behavior or profiling information in optimizations such as instruction scheduling, speculative code motion, and code layout. Hence it is extremely useful to develop inexpensive techniques that gather accurate profiling information. This paper presents novel edge profiling techniques that greatly reduce run-time overhe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instance-wise reaching definition analysis for recursive programs using context-free transductions

    Publication Year: 1998, Page(s):332 - 339
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (136 KB)

    Automatic parallelization of recursive programs is still an open problem today, lacking suitable and precise static analyses. We present a novel reaching definition framework based on context-free transductions. The technique achieves a global and precise description of the data flow and discovers important semantic properties of programs. Taking the example of a real-world non-derecursivable prog... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting fine- and coarse-grain parallelism in embedded programs

    Publication Year: 1998, Page(s):60 - 67
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (120 KB)

    Due to the technological advances, mapping of embedded applications onto single-chip multiprocessor systems becomes a feasible and very interesting option. What is needed is an environment that supports the designer in transforming an algorithmic specification into a suitable parallel implementation. In this paper we present the results of our experiments with one such an environment, which we dev... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Origin 2000 design enhancements for communication intensive applications

    Publication Year: 1998, Page(s):30 - 39
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    The SGI Origin 2000 is designed to support a wide range of applications and has low local and remote memory latencies. However it often has a high ratio of remote to local misses. In this paper, we evaluate the Origin 2000 performance with communication intensive applications. We use detailed execution-driven simulation of six shared-memory applications. This paper evaluates a base, Origin 2000-li... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating loop and data transformations for global optimisation

    Publication Year: 1998, Page(s):12 - 19
    Cited by:  Papers (20)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (108 KB)

    This paper is concerned with integrating global data transformations and local loop transformations in order to minimise overhead on distributed shared memory machines such as the SGi Origin 2000. By first developing an extended algebraic transformation framework, a new technique to allow the static application of global data transformations, such as partitioning, to reshaped arrays is presented, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Handling cross interferences by cyclic cache line coloring

    Publication Year: 1998, Page(s):112 - 117
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (232 KB)

    Cross interference, conflicting data from several arrays, is particularly grave for caches with limited associativity. We present a uniform scheme that reduces both self and cross interference. Techniques for cyclic register allocation, namely the meeting graph, help to improve the usage of cache lines and to avoid conflicts. Cyclic graph coloring determines a new memory mapping function. Prelimin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient JavaVM just-in-time compilation

    Publication Year: 1998, Page(s):205 - 212
    Cited by:  Papers (12)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (56 KB)

    Conventional compilers are designed for producing highly optimized code without paying much attention to compile time. The design goals of Java just-in-time compilers are different: produce fast code at the smallest possible compile time. In this article we present a very fast algorithm for translating JavaVM byte code to high quality machine code for RISC processors. This algorithm handles combin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scanning polyhedra without Do-loops

    Publication Year: 1998, Page(s):4 - 11
    Cited by:  Papers (7)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (112 KB)

    We study in this paper the problem of polyhedron scanning which appears for example when generating code for transformed loop nests in automatic parallelization. After a review of related works, we detail our method to scan affine images of polyhedra. After some experimental results we show how our method applies to unions of affine images of polyhedra. We have taken the option to generate low lev... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.