By Topic

Innovative Architecture for Future Generation High-Performance Processors and Systems, 1997

Date 24-24 Oct. 1997

Filter Results

Displaying Results 1 - 23 of 23
  • Proceedings Innovative Architecture for Future Generation High-Performance Processors and Systems

    Publication Year: 1997
    Request permission for commercial reuse | PDF file iconPDF (153 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1997, Page(s): 140
    Request permission for commercial reuse | PDF file iconPDF (45 KB)
    Freely Available from IEEE
  • Future generation processors: using hierarchy and replication

    Publication Year: 1997, Page(s):59 - 66
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (660 KB)

    The trend toward ever-faster microprocessors must be continued to meet tomorrow's computing needs. Faster, denser chip technologies help continue performance growth, but new microarchitectures are needed to push performance further and to use higher transistor counts effectively. We are quickly reaching the point where a fundamental new processor organization, beyond conventional superscalar, is n... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory-based communication facilities and asymmetric distributed shared memory

    Publication Year: 1997, Page(s):30 - 39
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (972 KB)

    In general-purpose parallel and distributed systems, performance of the protected and virtualized user-level communications and synchronizations is the most crucial issue to realize efficient execution environments. We proposed a novel high-speed user-level communication and synchronization scheme “Memory-Based Communication Facilities (MBCF) “for a general-purpose system with an off-t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The interactive restructuring of MATLAB programs using the FALCON environment

    Publication Year: 1997, Page(s):3 - 12
    Cited by:  Papers (1)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (776 KB)

    The paper describes FALCON, an environment for the interactive development of numerical programs using MATLAB. During the compilation and transformation process, the developer is able to interactively apply optimizations and transformations to the code, including both traditional compiler techniques and other transformations which utilize algebraic information about the operations performed and ta... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Technology synergy for real system performance

    Publication Year: 1997
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (44 KB)

    Summary form only given, as follows. The rate of technological advancement and innovation surrounding the computer industry is enormous. Although the exponential gains in semiconductor density and performance are widely observed, the specific implications and opportunities represented by them are the subject of much debate. Similarly, although there has been large amount of innovation and progress... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PRISM-a design for scalable shared memory

    Publication Year: 1997
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (76 KB)

    Summary form only given. This paper describes PRISM, a distributed shared-memory architecture that relies on a unified hardware and operating system structure for scalable and reliable performance. The PRISM system consists of multiple connected (UMA) shared memory multiprocessors, each controlled by its own kernel. Hardware and software are used to support coherent global shared memory segments, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Decoupled access DRAM architecture

    Publication Year: 1997, Page(s):94 - 103
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (808 KB)

    This paper discusses an approach to reducing memory latency in future systems. It focuses on systems where a single chip DRAM/processor will not be feasible even in 10 years, e.g. systems requiring a large memory and/or many CPU's. In such systems a solution needs to be found to DRAM latency and bandwidth as well as to inter-chip communication. Utilizing the projected advances in chip I/O bandwidt... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Functionally integrated systems on a chip: technologies, architectures, CAD tools, and applications

    Publication Year: 1997, Page(s):67 - 75
    Cited by:  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1516 KB)

    The most important challenge in developing next-generation electronics is producing devices that exceed today's performance levels while achieving ultra low-power efficiency and smaller package dimensions. The enabling technology is the evolution of digital, analog, and radio-frequency (RF) integrated circuits toward a monolithic implementation as a system on a chip (SOC). Applications range from ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The A-NET working prototype: a parallel object-oriented multicomputer with reconfigurable network

    Publication Year: 1997, Page(s):40 - 49
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1036 KB)

    A multicomputer prototype has been co-designed and implemented in conjunction with a programming language, A-NETL, based on a parallel object-oriented computation model. Each node processor consists of a processing element (PE) and a router. The prototype PE has an A-NETL directed high-bevel instruction set. The implementation is supported by firmware and hardware. The router has been designed to ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Which comes first: the architecture or the algorithm?

    Publication Year: 1997
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (40 KB)

    There is a constant tension between the designers of algorithms and architectures. Each designs for the other's previous generation. Several factors are making this an increasingly inadequate approach. On the hardware side, the growing disparity between the performance of memory and CPU has pushed many architectures to hierarchical memories that reward significant data reuse (and punish algorithms... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • OSCAR multi-grain architecture and its evaluation

    Publication Year: 1997, Page(s):106 - 115
    Cited by:  Papers (2)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (920 KB)

    OSCAR (Optimally Scheduled Advanced Multiprocessor) was designed to efficiently realize multi-grain parallel processing using static and dynamic scheduling. It is a shared memory multiprocessor system having centralized and distributed shared memories in addition to local memory on each processor with data transfer controller for overlapping of data transfer and task processing. Also, its Fortran ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Executing dataflow program with stock processor

    Publication Year: 1997, Page(s):76 - 82
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    In spite of the attractive features of dataflow languages, they do not fit conventional von Neumann systems very well. In this paper, we discuss practical approaches to execute dataflow program with stock processor. As research vehicles, we use two types of platforms with stock processor: completely conventional platforms such as engineering workstations, and a partially conventional platform whic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards the realistic “virtual hardware”

    Publication Year: 1997, Page(s):50 - 55
    Cited by:  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (768 KB)

    WASMII is a virtual hardware system that executes dataflow algorithms. It is based on an MPLD (Multifunction Programming Logic Device), an extended FPGA (Field Programmable Gate Array) that implements multiple sets of functions as configurations of a single chip. An algorithm to be executed on WASMII is written in the DFC dataflow language and then translated into a collection of FPGA configuratio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application user's needs for future architectures

    Publication Year: 1997
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (24 KB)

    Several trends are apparent in the use of high performance architectures by end-users in science and engineering. One prominent change is a shift towards more complicated geometric models which require irregular meshes and finite element type discretizations. A second related change is the need for more sophisticated data structures; many leading edge methods now rely on trees and networks where f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The intelligent cache controller of a massively parallel processor JUMP-I

    Publication Year: 1997, Page(s):116 - 124
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (740 KB)

    This paper describes the intelligent cache controller of JUMP-I, a distributed shared memory type MPP. JUMP-I adopts an off-the-shelf superscalar as the element processor to meet the requirement of peak performance, but such a processor lacks the ability to hide inter-processor communication latency, which may easily become too long on MPPs. Therefore JUMP-I provides an intelligent memory system t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effectiveness of register preloading on CP-PACS node processor

    Publication Year: 1997, Page(s):83 - 90
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (764 KB)

    CP-PACS is a massively parallel processor (MPP) for large scale scientific computations. On September 1996, CP-PACS equipped with 2048 processors began its operation at University of Tsukuba. At that time, CP-PACS was the fastest MPP in the world on LINPACK benchmark. CP-PACS was designed to achieve very high performance in large scientific/engineering applications. A is well known that ordinary d... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speculative resolution of ambiguous memory aliasing

    Publication Year: 1997, Page(s):17 - 26
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (828 KB)

    The ambiguous memory aliasing is proposed to be speculatively resolved. A load instruction is speculatively executed with load address prediction, and its dependent instructions are speculatively executed. A store instruction is also speculatively resolved with store address prediction, and its dependent instructions are speculatively executed. From the experimental evaluation, the author has foun... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High speed serial communication in a future parallel computer architecture

    Publication Year: 1997, Page(s):125 - 132
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1076 KB)

    In this paper, we introduce the concept and hardware configuration of a high speed serial communication interface called STAFF-Link. STAFF-link is used in a massively parallel computer JUMP-I for connecting processing elements and I/O subsystems and also used for an I/O network between I/O units. Furthermore have designed and manufactured a STAFF-Link router board for a high performance workstatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory-centric architectures: why and perhaps what

    Publication Year: 1997
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (100 KB)

    Summary form only given. Distributing processors to regions of memory necessitates partitioning the problem and decomposing the data to the partitioned regions. Both can be hard to do well statically; some codes lend themselves well to one or both, while others are not amenable to static analysis. If the problem partitioning does not match the data decomposition, extremely poor program performance... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Models of multiprocessor computing

    Publication Year: 1997
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (36 KB)

    Today's options for high-performance computing include message-passing MPP systems, clusters of workstations, shared-memory multiprocessors, and parallel vector processors. Shared-memory multiprocessors provide the most general-purpose programming model, but generally suffer from limited scalability. New multiprocessor cache coherence protocols together with high-performance CMOS VLSI and intercon... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory based light weight communication architecture for local area distributed computing

    Publication Year: 1997, Page(s):133 - 139
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (436 KB)

    A local area distributed computing system in which thousands of commodity personal computers and workstations can be connected by high speed optical interconnection is being developed. This system will provide a single parallel system image to users and support high performance parallel computing. This paper presents the overview of the memory based light weight communication architecture of such ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication-oriented computer architecture: data choreography

    Publication Year: 1997
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    Summary form only given. The full paper presents a number of approaches to choreographing regular and irregular computations, and shows how a number of model applications can be choreographed. The paper will also compare SMA architectures to UMA and NUMA architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.