Innovative Architecture for Future Generation High-Performance Processors and Systems

24-24 Oct. 1998

Filter Results

Displaying Results 1 - 16 of 16
  • Innovative Architecture for Future Generation High-Performance Processors and Systems [front matter]

    Publication Year: 1998, Page(s):iii - vi
    Request permission for commercial reuse | PDF file iconPDF (139 KB)
    Freely Available from IEEE
  • ASCI application performance and the impact of commodity processor architectural trends

    Publication Year: 1998, Page(s):3 - 6
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (365 KB)

    The purpose of this paper is to summarize recent performance results from an important ASCI-related application and to speculate on how trends within the computer industry and in computer architecture relate to these results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System support for dynamic optimization of application performance

    Publication Year: 1998, Page(s):7 - 20
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1466 KB)

    Today's high performance and parallel computer systems provide substantial opportunities for concurrency of execution and scalability that is largely untapped by the applications that run on them. Under traditional frameworks, developing efficient applications can be a labor-intensive process that requires an intimate knowledge of the machines, the applications, and many subtle machine-application... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Observations on universality and portability in high-performance computing

    Publication Year: 1998, Page(s):21 - 26
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (527 KB)

    The universality of the hardware model and the portability of the software model are two central themes in high-performance computing, especially as parallelism and memory hierarchy force a gradual but irreversible departure from the classical, sequential paradigm of computing. This paper considers the possibility of formulating an analytical, quantitative approach to the study of universality and... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Development of biological and chemical applications on a 64-node PC cluster

    Publication Year: 1998, Page(s):27 - 34
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (768 KB)

    We describe the development of two parallel applications running on our PC cluster which consists of 64 Pentium Pro 200MHz microprocessors. One is an integrated parallel calculation system, called the PAPIA system, dedicated to protein information analysis. The PAPIA system enables very fast protein database searches (3-0 structure matching and sequence homology search) by fully utilizing the powe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The effects of predicated execution on architectures supporting dynamic speculation

    Publication Year: 1998, Page(s):37 - 45
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (842 KB)

    Branch instructions pose a serious problem in achieving a good instruction level pamllelism (ILP) from a program. Modern microprocessors have attempted to alleviate this problem with the support of sophisticated branch prediction schemes. Dynamic speculation, as a hardware feature, is used to execute instructions out-of-order (OOO) guided by the outcomes of such prediction schemes. Previous branch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A speculative multithreading with selective multi-path execution

    Publication Year: 1998, Page(s):46 - 52
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (703 KB)

    Recent microprocessors' performance has been improved by their high-speed clock frequency and by their exploiting instruction-level parallelism (ILP). Physical limitations of clock speed and semantical limitations of control dependencies impede the improvement of performance. To overcome this dificulty, it is indispensable to make use of the thread-level parallelism. This paper proposes a speculat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • First step to combining control and data speculation

    Publication Year: 1998, Page(s):53 - 60
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (643 KB)

    Recently there are many studies of data value prediction for increasing instruction level parallelism, and it is found that data speculation affects branch prediction accuracy. Even when data dependences are speculated successfully, processor performance would be degraded if branch prediction accuracy were decreased. On the other hand, branch prediction studies are nearly matured. While it becomes... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increasing the lookahead of multilevel branch prediction

    Publication Year: 1998, Page(s):61 - 67
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    Many techniques have been proposed for tolemting memory latency in future systems, including prefetching and Decoupled-Access DRAM (DA-DRAM) architectures. In order for these techniques to be effective they need to have a suficient lookahead, i.e. to be far enough ahead of processor execution in requesting data. Bmnch prediction has been utilized before to achieve this but only small degrees of lo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New methods for exploiting program structure and behavior in computer architecture

    Publication Year: 1998, Page(s):71 - 76
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (661 KB)

    Micro-architectural techniques of the next decade will have to be more efficient and scalable in order to handle growing workloads and longer communication and memory latencies. We believe that information about program structure, the data and control relationships between instructions, can be used as a powelful framework for new techniques. We argue that program structure information has several ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Achieving high performance via co-designed virtual machines

    Publication Year: 1998, Page(s):77 - 84
    Cited by:  Papers (9)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (743 KB)

    A virtual machine (VM) uses software to support a virtual instruction set architecture on a hardware platjonn executing a native instruction set. By co-designing the hardware and software elements of a VM, and by using an implementation-dependent native instruction set, there will be many new opportunities for improved performance and flexibility. Because the hardware-supported instruction set is ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Experiences with Javatm JIT optimization

    Publication Year: 1998, Page(s):87 - 94
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (521 KB)

    This paper presents some experiences with a Java just-in-time (JIT) compiler developed at Intel Corporation. For a few optimizations, the issues that are specific to Java programs are discussed. Furthermore, because in the context of a just-intime compilation, compile-time directly contributes to the run-time of the application, some remarks are made on how program analysis time is kept limited in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of a non-strict functional programming language V on a threaded architecture EARTH

    Publication Year: 1998, Page(s):95 - 102
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (720 KB)

    The combination of a language with fine-grain implicit parallelism and a dataflow evaluation scheme is suitable for high-level programming on massively parallel architectures. We are developing a compiler of V, a non-strict functional programming language, for EARTH(Eficient Architecture for Running THreads). Our compiler generates codes in Threaded-C, which is a lower-level programming language f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture of a parallel computer Cenju-4

    Publication Year: 1998, Page(s):105 - 113
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (776 KB)

    This paper describes the architecture and the evaluation results of a parallel computer Cenju-4. Cenju-4 supports two memory architectures: distributed memory with user-level message passing communication and distributed shared memory with cache-coherent nonuniform memory access (cc-NUMA) feature. Cenju-4 system consists of from 8 to 1024 nodes connected by a multistage network which has multicast... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cc-coma: the compiler-controled coma as a framework for parallel computing

    Publication Year: 1998, Page(s):114 - 119
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (600 KB)

    In order to provide a fully-autosmutic parullelzing compiler with the furiction of both automatic datu partitioning and distribution, we propose a compilercontrolled cache only memory architecture (cc-COMA). The cc-COMA runtime system as based on a softwure-emulated COMA which covers a variety of parallel architectures from NUMA to NOW. The compiler generates both the user code and control code fo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Index of Authors

    Publication Year: 1998, Page(s): 121
    Request permission for commercial reuse | PDF file iconPDF (42 KB)
    Freely Available from IEEE