By Topic

Microarchitecture, 1999. MICRO-32. Proceedings. 32nd Annual International Symposium on

Date 16-18 Nov. 1999

Filter Results

Displaying Results 1 - 25 of 31
  • MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture

    Publication Year: 1999
    Request permission for commercial reuse | PDF file iconPDF (84 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 1999, Page(s):v - vii
    Request permission for commercial reuse | PDF file iconPDF (112 KB)
    Freely Available from IEEE
  • Index of authors

    Publication Year: 1999, Page(s): 299
    Request permission for commercial reuse | PDF file iconPDF (11 KB)
    Freely Available from IEEE
  • DIVA: a reliable substrate for deep submicron microarchitecture design

    Publication Year: 1999, Page(s):196 - 207
    Cited by:  Papers (305)  |  Patents (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (92 KB)

    Building a high-performance microprocessor presents many reliability challenges. Designers must verify the correctness of large complex systems and construct implementations that work reliably in varied (and occasionally adverse) operating conditions. To further complicate this task, deep submicron fabrication technologies present new reliability challenges in the form of degraded signal quality a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Core technologies in hardware and software

    Publication Year: 1999
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (12 KB)

    The computer industry continues to be transformed by much of the technology that it creates for the markets it supplies. Much of this transformation has been fueled by relentless performance improvements in high-performance microprocessors, interconnection, and communications technologies and computing platforms based on them. Legacy systems of only a decade ago have largely been transformed into ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction fetch mechanisms for multipath execution processors

    Publication Year: 1999, Page(s):38 - 47
    Cited by:  Papers (4)  |  Patents (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (120 KB)

    Branch mispredictions can have a major performance impact on high-performance processors. Multipath execution has recently been introduced to help limit the misprediction penalties incurred by branches that are difficult to predict. This paper presents efficient instruction fetch architecture designs for these multipath processor execution cores. We evaluate a number of design trade-offs for the f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delaying physical register allocation through virtual-physical registers

    Publication Year: 1999, Page(s):186 - 192
    Cited by:  Papers (24)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (48 KB)

    Register file access time represents one of the critical delays of current microprocessors, and it is expected to become more critical as future processors increase the instruction window size and the issue width. This paper presents a novel physical register management scheme that allows for a late allocation (at the end of execution) of registers. We show that it can provide significant savings ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizations and oracle parallelism with dynamic translation

    Publication Year: 1999, Page(s):284 - 295
    Cited by:  Papers (3)  |  Patents (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (100 KB)

    We describe several optimizations which can be employed in a dynamic binary translation (DBT) system, where low compilation/translation overhead is essential. These optimizations achieve a high degree of ILP, sometimes even surpassing a static compiler employing more sophisticated, and more time-consuming algorithms. We present results in which we employ these optimizations in a dynamic binary tra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Value prediction for speculative multithreaded architectures

    Publication Year: 1999, Page(s):230 - 236
    Cited by:  Papers (23)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (48 KB)

    The speculative multithreading paradigm (speculative thread-level parallelism) is based on the concurrent execution of control-speculative threads. The efficiency of microarchitectures that adopt this paradigm strongly depends on the performance of the control and data speculation techniques. While control speculation is used to predict the most effective points where a thread can be spawned, data... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving branch predictors by correlating on data values

    Publication Year: 1999, Page(s):28 - 37
    Cited by:  Papers (9)  |  Patents (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (112 KB)

    Branch predictors typically use combinations of branch PC bits and branch histories to make predictions. Recent improvements in branch predictors have come from reducing the effect of interference, i.e. multiple branches mapping to the same table entries. In contrast, the branch difference predictor (BDP) uses data values as additional information to improve the accuracy of conditional branch pred... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Read-after-read memory dependence prediction

    Publication Year: 1999, Page(s):177 - 185
    Cited by:  Papers (8)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (200 KB)

    We identify that typical programs which exhibit highly regular read-after-read (RAR) memory dependence streams. We exploit this regularity by introducing read-after-read (RAR) memory dependence prediction. We also present two RAR memory dependence prediction-based memory latency reduction techniques. In the first technique, a load can obtain a value by simply naming a preceding load with which a R... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Balance scheduling: weighting branch tradeoffs in superblocks

    Publication Year: 1999, Page(s):272 - 283
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (172 KB)

    Since there is generally insufficient instruction level parallelism within a single basic block, higher performance is achieved by speculatively scheduling operations in superblocks. This is difficult in general because each branch competes for the processor's limited resources. Previous work manages the performance tradeoffs that exist between branches only indirectly. We show here that dependenc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The use of multithreading for exception handling

    Publication Year: 1999, Page(s):219 - 229
    Cited by:  Papers (13)  |  Patents (30)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    Common hardware exceptions, when implemented by trapping, unnecessarily serialize program execution in dynamically scheduled superscalar processors. To avoid the consequences of trapping the main program thread, multithreaded CPUs can exploit control and data independence by executing the exception handler in a separate hardware context. The main thread doesn't squash instructions after the except... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic 3D graphics workload characterization and the architectural implications

    Publication Year: 1999, Page(s):62 - 71
    Cited by:  Papers (12)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (120 KB)

    Although PC-class 3D graphics hardware has made significant strides in the last several years, the underlying architectural design principles are still generally considered as a black art. The quantitative approach prevalent in mainstream computer architecture design is rarely applied, at least as far as publicly available research literature is concerned. One main reason for this deficiency is th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fetch directed instruction prefetching

    Publication Year: 1999, Page(s):16 - 27
    Cited by:  Papers (18)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (156 KB)

    Instruction supply is a crucial component of processor performance. Instruction prefetching has been proposed as a mechanism to help reduce instruction cache misses, which in turn can help increase instruction supply to the processor. In this paper we examine a new instruction prefetch architecture called Fetch Directed Prefetching, and compare it to the performance of next-line prefetching and st... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic memory disambiguation in the presence of out-of-order store issuing

    Publication Year: 1999, Page(s):170 - 176
    Cited by:  Papers (9)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (220 KB)

    With the help of the memory dependence predictor the instruction scheduler can speculatively issue load instructions at the earliest possible time without causing significant amounts of memory order violations. For maximum performance, the scheduler must also allow full out-of-order issuing of store instructions since any superfluous ordering of stores results in false memory dependencies which ad... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wavefront scheduling: path based data representation and scheduling of subgraphs

    Publication Year: 1999, Page(s):262 - 271
    Cited by:  Papers (12)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (104 KB)

    The IA-64 architecture is rich with features that enable aggressive exploitation of instruction-level parallelism. Features such as speculation, predication, multiway branches and others provide compilers with new opportunities for the extraction of parallelism in programs. Code scheduling is a central component in any compiler for the IA-64 architecture. This paper describes the implementation of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Code transformations to improve memory parallelism

    Publication Year: 1999, Page(s):147 - 155
    Cited by:  Papers (17)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (108 KB)

    Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in removing memory stall time than CPU time, making the memory system a greater bottleneck in ILP-based systems than previous-generation systems. These deficiencies arise largely because applications present limited opportuniti... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic and efficient evaluation of memory hierarchies for embedded systems

    Publication Year: 1999, Page(s):114 - 125
    Cited by:  Papers (9)  |  Patents (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (156 KB)

    Automation is the key to the design of future embedded systems as it permits application-specific customization while keeping design costs low. A key problem faced by automatic design systems is evaluating the performance of the vast number of alternative designs in a timely manner. For this paper, we focus on an embedded system consisting of the following components: a VLIW processor, instruction... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting ILP in page-based intelligent memory

    Publication Year: 1999, Page(s):208 - 218
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (288 KB)

    This study compares the speed, area, and power of different implementations of Active Pages, an intelligent memory system which helps bridge the growing gap between processor and memory performance by associating simple functions with each page of data. Previous investigations have shown up to 1000X speedups using a block of reconfigurable logic to implement these functions next to each subarray o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A superscalar 3D graphics engine

    Publication Year: 1999, Page(s):50 - 61
    Cited by:  Papers (3)  |  Patents (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (132 KB)

    3D graphics performance is increasing faster than any other computing application. Almost all PC systems now include 3D graphics accelerators for games, CAD, or visualization applications. Many of the microarchitectural techniques that have been used to enhance the performance of microprocessors can be applied to graphics systems as well. We present an architecture for an out-of-order, superscalar... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predicting the usefulness of a block result: a micro-architectural technique for high-performance low-power processors

    Publication Year: 1999, Page(s):238 - 247
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (248 KB)

    This paper proposes a micro-architectural technique in which a prediction is made for some power-hungry units of a processor. The prediction consists of whether the result of a particular unit or block of logic will be useful in order to execute the current instruction. If it is predicted useless, then that block is disabled. It would be ideal if the predictions were totally accurate, thus not dec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Control independence in trace processors

    Publication Year: 1999, Page(s):4 - 15
    Cited by:  Papers (12)  |  Patents (29)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (88 KB)

    Branch mispredictions are a major obstacle to exploiting instruction-level parallelism, at least in part because all instructions after a mispredicted branch are squashed. However, instructions that are control independent of the branch must be fetched regardless of the branch outcome, and do not necessarily have to be squashed and re-executed. Control independence exists when the two paths follow... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting a new level of DLP in multimedia applications

    Publication Year: 1999, Page(s):72 - 79
    Cited by:  Papers (17)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (80 KB)

    This paper proposes and evaluates MOM: a novel ISA paradigm targeted at multimedia applications. By fusing conventional vector ISA approaches together with more recent SIMD-like (Single Instruction Multiple Data) ISAs (such as MMX), we have developed a new matrix oriented ISA which efficiently deals with the small matrix structures typically found in multimedia applications. MOM exploits a level o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler-directed dynamic computation reuse: rationale and initial results

    Publication Year: 1999, Page(s):158 - 169
    Cited by:  Papers (13)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (136 KB)

    Recent studies on value locality reveal that many instructions are frequently executed with a small variety of inputs. This paper proposes an approach that integrates architecture and compiler techniques to exploit value locality for large regions of code. The approach strives to eliminate redundant processor execution created by both instruction-level input repetition and recurrence of input data... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.