By Topic

Proceedings of the IEEE

Issue 11 • Date Nov. 2001

Filter Results

Displaying Results 1 - 14 of 14
  • Special issue on microprocessor architecture and compiler technology

    Publication Year: 2001 , Page(s): 1547 - 1549
    Save to Project icon | Request Permissions | PDF file iconPDF (21 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Proceedings of the IEEE: 2002 - Celebrating ninety years of new technology! [Editorial]

    Publication Year: 2001 , Page(s): 1550 - 1552
    Save to Project icon | Request Permissions | PDF file iconPDF (20 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Compiling for EPIC architectures

    Publication Year: 2001 , Page(s): 1676 - 1693
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (197 KB) |  | HTML iconHTML  

    Designing compilers for Explicitly Parallel Instruction Computing (EPIC.) architectures presents challenges substantially different from those encountered in designing compilers for traditional sequential architectures. These challenges are addressed not only by employing new optimizations that are specific to EPIC, but also by employing new ways to architect compilers. EPIC architectures provide features that allow compilers to take a proactive role in exploiting instruction level parallelism. Compiler technology is intimately intertwined with the target processor architecture, and compiler architects must solve new analysis and optimization problems to achieve the highest levels of performance. When complex optimizations are uniformly applied to large applications, the resulting slow compile speeds are unacceptable. Demanding requirements to produce high-quality code at high compile speed shapes the fundamental structure of EPIC compilers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lighting your country house

    Publication Year: 2001 , Page(s): 1723 - 1726
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (296 KB) |  | HTML iconHTML  

    Today we assume that electricity for our lights is readily available from the mains, but a hundred years ago that was not necessarily the case. Your correspondent has recently been looking at a large house in a rural part of England where, in the 1890s, the owner decided to adopt electric light, and consequently had to install his own generating plant. Now the house is open to the public, and the present owners wish to reconstruct the 1890s generating plant as a visitor attraction View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Program decision logic optimization using predication and control speculation

    Publication Year: 2001 , Page(s): 1660 - 1675
    Cited by:  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (295 KB) |  | HTML iconHTML  

    The mainstream arrival of predication, a means other than branching of selecting instructions for execution, has required compiler architects to reformulate fundamental analyses and transformations. Traditionally, the compiler has generated branches straightforwardly to implement control flow designed by the programmer and has then performed sophisticated "global" optimizations. to move and optimize code around them. In this model, the inherent tie between the control state of the program and the location of the single instruction pointer serialized run-time evaluation of control and limited the extent to which the compiler could optimize the control structure of the program (without extensive code replication). Predication provides a means of control independent of branches and instruction fetch location, freeing both compiler and architecture from these restrictions; effective compilation of predicated code, however requires sophisticated understanding of the program's control structure. This paper explores a representational technique which, through direct code analysis, maps the program's control component into a canonical database, a reduced ordered binary decision diagram (ROBDD), which fully enables the compiler to utilize and manipulate predication. This abstraction is then applied to optimize the program's control component, transforming it into a form more amenable to instruction level parallel (ILP) execution View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction fetch architectures and code layout optimizations

    Publication Year: 2001 , Page(s): 1588 - 1609
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (407 KB) |  | HTML iconHTML  

    The design of higher performance processors has been following two major trends: increasing the pipeline depth to allow faster clock rates, and widening the pipeline to allow parallel execution of more instructions. Designing a higher performance processor implies balancing all the pipeline stages to ensure that overall performance is not dominated by any of them. This means that a faster execution engine also requires a faster fetch engine, to ensure that it is possible to read and decode enough instructions to keep the pipeline full and the functional units busy. This paper explores the challenges faced by the instruction fetch stage for a variety of processor designs, from early pipelined processors, to the more aggressive wide issue superscalars. We describe the different fetch engines proposed in the literature, the performance issues involved, and some of the proposed improvements. We also show how compiler techniques that optimize the layout of the code in memory can be used to improve the fetch performance of the different engines described Overall, we show how instruction fetch has evolved from fetching one instruction every few cycles, to fetching one instruction per cycle, to fetching a full basic block per cycle, to several basic blocks per cycle: the evolution of the mechanism surrounding the instruction cache, and the different compiler optimizations used to better employ these mechanisms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Requirements, bottlenecks, and good fortune: agents for microprocessor evolution

    Publication Year: 2001 , Page(s): 1553 - 1559
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (78 KB) |  | HTML iconHTML  

    The first microprocessor the Intel 4004, showed up in 1971. It contained 2300 transistors and operated at a clockfrequency of 108 kHz. Today, 30 years later the microprocessor contains almost 200 million transistors, operating at a frequency of more than 1 GHz. In five years, those numbers are expected to grow to more than a billion transistors on a single chip, operating at a clockfrequency of from 6 to 10 GHz. The evolution of the microprocessor from where it started in 1971 to where it is today and where it is likely to be in five years, has come about because of several contributing forces. Our position is that this evolution did not just happen, that each step forward came as a result of one of three things, and always within the context of a computer architect making tradeoffs. The three things are: 1) new requirements; 2) bottlenecks; and 3) good fortune. I call them collectively agents for evolution. This article attempts to do three things: describe a basic framework for the field of microprocessors, show some of the important developments that have come along in the 30 years since the arrival of the first microprocessor and finally, suggest some of the new things you can expect to see in a high-performance microprocessor in the next five years View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Advances and future challenges in binary translation and optimization

    Publication Year: 2001 , Page(s): 1710 - 1722
    Cited by:  Papers (8)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB) |  | HTML iconHTML  

    Binary translation and optimization have achieved a high profile in recent years. Binary translation has several potential attractions. While still in its early stages, could binary translation offer a new way to design processors, i.e. is it a disruptive technology? This paper discusses this question, examines some future possibilities for binary translation, and then gives an overview of selected projects (DAISY, Crusoe, Dynamo and LaTTe). One future possibility for binary translation is the Virtual IT Shop. Binary translation offers a possible solution for better utilization of computational resources as services over the World Wide Web. The Internet is radically changing the software landscape, and is fostering platform independence and interoperability. Along the lines of software convergence, recent advances in binary JIT (just-in-time) optimizations also present the future possibility of a convergence virtual machine (CVM). CVM aims to address research challenges in allowing the same standard operating system and application object code to run on different hardware platforms, through state-of-the-art JIT compilation and virtual device emulation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware/compiler codevelopment for an embedded media processor

    Publication Year: 2001 , Page(s): 1694 - 1709
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (424 KB) |  | HTML iconHTML  

    Embedded and portable systems running multimedia applications create a new challenge for hardware architects. A microprocessor for such applications needs to be easy to program like a general-purpose processor and have the performance and power efficiency of a digital signal processor. This paper presents the codevelopment of the instruction set, the hardware, and the compiler for the Vector IRAM media processor. A vector architecture is used to exploit the data parallelism of multimedia programs, which allows the use of highly modular hardware and enables implementations that combine high performance, low power consumption, and reduced design complexity. It also leads to a compiler model that is efficient both in terms of performance and executable code size. The memory system for the vector processor is implemented using embedded DRAM technology, which provides high bandwidth in an integrated, cost-effective manner. The hardware and the compiler for this architecture make complementary contributions to the efficiency of the overall system. This paper explores the interactions and tradeoffs between them, as well as the enhancements to a vector architecture necessary for multimedia processing. We also describe how the architecture, design, and compiler features come together in a prototype system-on-a-chip, able to execute 3.2 billion operations per second per watt View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction scheduling for instruction level parallel processors

    Publication Year: 2001 , Page(s): 1638 - 1659
    Cited by:  Papers (15)  |  Patents (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (228 KB) |  | HTML iconHTML  

    Nearly all personal computer and workstation processors, and virtually all high-performance embedded processor cores, now embody instruction level parallel (ILP) processing in the form of superscalar or very long instruction word (VLIW) architectures. ILP processors put much more of a burden on compilers; without "heroic" compiling techniques, most such processors fall far short of their performance goals. Those techniques are largely found in the high-level optimization phase and in the code generation phase; they are also collectively called instruction scheduling. This paper reviews the state of the art in code generation for ILP parallel processors. Modern ILP code generation methods move code across basic block boundaries. These methods grew out of techniques for generating horizontal microcode, so we introduce the problem by describing its history. Most modem approaches can be categorized by the shape of the scheduling "region." Some of these regions are loops, and for those techniques known broadly as "Software Pipelining" are used. Software Pipelining techniques are only considered here when there are issues relevant to the region-based techniques presented. The selection of a type of region to use in this process is one of the most controversial questions in code generation; the paper surveys the best known alternatives. The paper then considers two questions: First, given a type of region, how does one pick specific regions of that type in the intermediate code. In conjunction with region selection, we consider region enlargement techniques such as unrolling and branch target expansion. The second question, how does one construct a schedule once regions have been selected, occupies the next section of the paper. Finally, schedule construction using recent, innovative resource modeling based on finite-state automata is then reexamined. The paper includes an extensive bibliography View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic techniques for load and load-use scheduling

    Publication Year: 2001 , Page(s): 1621 - 1637
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (276 KB) |  | HTML iconHTML  

    Modern microprocessors employ dynamic instruction scheduling to select independent instructions for parallel execution. Good scheduling of loads is crucial, since the long latency of some loads makes them likely to degrade performance. A good scheduler attempts to issue loads as early as possible. Scheduling loads is not simple. First, safely resolving a load's input dependences can be done only at execution time, after the load address and all previous store addresses are known. Second, varying load latency makes it difficult to prioritize loads and to efficiently schedule load-dependent instructions. This paper surveys several techniques that optimize load scheduling. Memory disambiguation resolves store-load dependences and enables earlier execution of store-independent loads. Memory renaming and memory bypassing short-circuit memory to streamline the passing of values from stores to loads. Critical path scheduling, pre-execution, and address prediction advance long-latency loads by computing load addresses early, or predicting them. Value prediction short-circuits load execution by predicting the loaded data values. Finally, data speculation and hit-miss prediction help the scheduling of load-dependent instructions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding branches and designing branch predictors for high-performance microprocessors

    Publication Year: 2001 , Page(s): 1610 - 1620
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (174 KB) |  | HTML iconHTML  

    Branch prediction is important in high-performance processors and its importance continues to grow. In the drive for higher execution frequencies, pipelines are lengthened and memory latencies are increased. This increases the cost of branch mispredictions. In this paper we describe some behavior patterns of branches. We believe that understanding the behavior of branches is helpful when designing fetch mechanisms for high-performance microprocessors. We also examine several current branch predictors and discuss how they work. Finally, we look at some of the challenges that we are faced with when designing fetch mechanisms and predictors for future microprocessors and discuss some of the possible solutions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-power design for embedded processors

    Publication Year: 2001 , Page(s): 1576 - 1587
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB) |  | HTML iconHTML  

    Minimization of power consumption in portable and battery powered embedded systems has become an important aspect of processor and system design. Opportunities for power optimization and tradeoffs emphasizing low power are available across the entire design hierarchy. A review of low-power techniques applied at many levels of the design hierarchy is presented, and an example of low-power processor architecture is described along with some of the design decisions made in implementation of the architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Microarchitectural innovations: boosting microprocessor performance beyond semiconductor technology scaling

    Publication Year: 2001 , Page(s): 1560 - 1575
    Cited by:  Papers (8)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (196 KB) |  | HTML iconHTML  

    Semiconductor technology scaling provides faster and more plentiful transistors to build microprocessors, and applications continue to drive the demand for more powerful microprocessors. Weaving the "raw" semiconductor material into a microprocessor that offers the performance needed by modern and future applications is the role of computer architecture. This paper overviews some of the microarchitectural techniques that empower modem high-performance microprocessors. The techniques are classified into: 1) techniques meant to increase the concurrency in instruction processing, while maintaining the appearance of sequential processing and 2) techniques that exploit program behavior. The first category includes pipelining, superscalar execution, out-of-order execution, register renaming, and techniques to overlap memory-accessing instructions. The second category includes memory hierarchies, branch predictors, trace caches, and memory-dependence predictors. The paper also discusses microarchitectural techniques likely to be used in future microprocessors, including data value speculation and instruction reuse, microarchitectures with multiple sequencers and thread-level speculation, and microarchitectural techniques for tackling the problems of power consumption and reliability View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The most highly-cited general interest journal in electrical engineering and computer science, the Proceedings is the best way to stay informed on an exemplary range of topics.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
H. Joel Trussell
North Carolina State University