By Topic

Micro, IEEE

Issue 1 • Date Jan.-Feb. 2006

Filter Results

Displaying Results 1 - 21 of 21
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (618 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): 2 - 3
    Save to Project icon | Request Permissions | PDF file iconPDF (196 KB)  
    Freely Available from IEEE
  • Masthead

    Page(s): 4
    Save to Project icon | Request Permissions | PDF file iconPDF (56 KB)  
    Freely Available from IEEE
  • Measuring the impact of microarchitectural ideas

    Page(s): 5 - 6
    Save to Project icon | Request Permissions | PDF file iconPDF (200 KB)  
    Freely Available from IEEE
  • Format wars all over again

    Page(s): 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB) |  | HTML iconHTML  

    Sometime soon Sony intends to embed Blu-Ray, a new optical disc format, in the PlayStation 3 and Sony’s VCRs. Sony has gone to great lengths to generate a coalition of firms to support Blu-Ray. Opposing it is the high-definition DVD, sponsored and supported by many firms, including Toshiba, NEC, Sanyo, Microsoft, and Intel. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Guest Editor's Introduction: Micro's Top Picks from Microarchitecture Conferences

    Page(s): 8 - 9
    Save to Project icon | Request Permissions | PDF file iconPDF (192 KB)  
    Freely Available from IEEE
  • Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

    Page(s): 10 - 20
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (163 KB) |  | HTML iconHTML  

    Today's high-performance processors face main-memory latencies on the order of hundreds of processor clock cycles. As a result, even the most aggressive processors spend a significant portion of their execution time stalling and waiting for main-memory accesses to return data to the execution core. Runahead execution is a promising way to tolerate long main-memory latencies because it has modest hardware cost and doesn't significantly increase processor complexity. Runahead execution improves a processors performance by speculatively pre-executing the application program while the processor services a long-latency (1,2) data cache miss, instead of stalling the processor for the duration of the L2 miss. For runahead execution to be efficiently implemented in current or future high-performance processors which will be energy-constrained, processor designers must develop techniques to reduce these extra instructions. Our solution to this problem includes both hardware and software mechanisms that are simple, implementable, and effective View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive History-Based Memory Schedulers for Modern Processors

    Page(s): 22 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (178 KB) |  | HTML iconHTML  

    Careful memory scheduling can increase memory bandwidth and overall system performance. We present a new memory scheduler that makes decisions based on the history of recently scheduled operations, providing two advantages: it can better reason about the delays associated with complex DRAM structure, and it can adapt to different observed workload View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Load and Store Processing in Latency-Tolerant Processors

    Page(s): 30 - 39
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (149 KB) |  | HTML iconHTML  

    Memory latency tolerant architectures achieve high performance by supporting thousands of in-flight instructions without scaling cycle-critical processor resources. We present new load-store processing algorithms for latency tolerant architectures. We augment primary load and store queues with secondary buffers. The secondary load buffer is a set associative structure, similar to a cache. The secondary store queue, the store redo log (SRL) is a first-in first-out (FIFO) structure recording the program order of all stores completed in parallel with a miss, and has no CAM and search functions. Instead of the secondary store queue, a cache provides temporary forwarding. The SRL enforces memory ordering by ensuring memory updates occur in program order once the miss data arrives from memory. The new algorithms remove fundamental sources of power, and area inefficiency in load and store processing by eliminating the CAM and search functions in the secondary load and store buffers, and still achieve competitive performance compared to hierarchical designs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tolerating Cache-Miss Latency with Multipass Pipelines

    Page(s): 40 - 47
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (114 KB) |  | HTML iconHTML  

    Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wish Branches: Enabling Adaptive and Aggressive Predicated Execution

    Page(s): 48 - 58
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (155 KB) |  | HTML iconHTML  

    We propose a mechanism in which the compiler generates code that can be executed either as predicated code or nonpredicated code. The compiler-generated code is the same as predicated code, except the predicated conditional branches are not removed - they are left intact in the program code. These conditional branches are called wish branches. The goal of wish branches is to use predicated execution for hard-to-predict dynamic branches, and branch prediction for easy-to-predict dynamic branches, thereby obtaining the best of both worlds. Wish loops, one class of wish branches, use predication to reduce the misprediction penalty for hard-to-predict backward (loop) branches View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unbounded Transactional Memory

    Page(s): 59 - 69
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (167 KB) |  | HTML iconHTML  

    This article advances the following thesis: transactional memory should be virtualized to support transactions of arbitrary footprint and duration. Such support should be provided through hardware and be made visible to software through the machines instruction set architecture. We call a transactional memory system unbounded if the system can handle transactions of arbitrary duration that have footprints nearly as big as the systems virtual memory. The primary goal of unbounded transactional memory is to make concurrent programming easier without incurring much implementation overhead. Unbounded transactional-memory architectures can achieve high performance in the common case of small transactions, without sacrificing correctness in large transactions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays

    Page(s): 70 - 79
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB) |  | HTML iconHTML  

    Cache-coherent shared-memory multiprocessors have wide-ranging applications, from commercial transaction processing and database services to large-scale scientific computing. Coarse-grain coherence tracking (CGCT) is a new technique that extends a conventional coherence mechanism and optimizes coherence enforcement. It monitors the coherence status of large regions of memory and uses that information to avoid unnecessary broadcasts and filter unnecessary cache tag lookups, thus improving system performance and power consumption. This article presents two CGCT implementations, RegionScout and Region Coherence Arrays, and provides simulation results for a broadcast-based multiprocessor system running commercial, scientific, and multiprogrammed workloads View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Thread-Level Speculation

    Page(s): 80 - 91
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (270 KB) |  | HTML iconHTML  

    Chip multiprocessors with thread-level speculation have become the subject of intense research, this article refutes the claim that such a design is necessarily too energy inefficient. In addition, it proposes out-of-order task spawning to exploit more sources of speculative task-level parallelism View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Opportunistic Transient-Fault Detection

    Page(s): 92 - 99
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (114 KB) |  | HTML iconHTML  

    CMOS scaling continues to enable faster transistors and lower supply voltage, improving microprocessor performance and reducing per-transistor power. The downside of scaling is increased susceptibility to soft errors due to strikes by cosmic particles and radiation from packaging materials. The result is degraded reliability in future commodity microprocessors. The authors target better coverage while incurring minimal performance degradation by opportunistically using redundancy View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • BugNet: Recording Application-Level Execution for Deterministic Replay Debugging

    Page(s): 100 - 109
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (161 KB) |  | HTML iconHTML  

    With software's increasing complexity, providing efficient hardware support for software debugging is critical. Hardware support is necessary to observe and capture, with little or no overhead, the exact execution of a program. Providing this ability to developers will allow them to deterministically replay and debug an application to pin-point the root cause of a bug View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architectures for Bit-Split String Scanning in Intrusion Detection

    Page(s): 110 - 117
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (115 KB) |  | HTML iconHTML  

    String matching is a critical element of modern intrusion detection systems because it lets a system make decisions based not just on headers, but actual content flowing through the network. Through careful codesign and optimization of an architecture with a new string matching algorithm, the authors show it is possible to build a system that is almost 12 times more efficient than the currently best known approaches View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance

    Page(s): 119 - 129
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (342 KB) |  | HTML iconHTML  

    A general dynamic-compilation environment offers power and performance control opportunities for microprocessors. The authors propose a dynamic-compiler-driven runtime voltage and frequency optimizer. A prototype of their design, implemented and deployed in a real system, achieves energy savings of up to 70 percent View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temperature-Aware On-Chip Networks

    Page(s): 130 - 139
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (436 KB) |  | HTML iconHTML  

    On-chip networks are becoming increasingly popular as a way to connect high-performance single-chip computer systems, but thermal issues greatly limit network design. Sirius, an thermal modeling and simulation framework combines with ThermalHerd, a distributed runtime scheme for thermal management to offer a path to thermally efficient on-chip network design View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The future will soon be here

    Page(s): 141 - 142
    Save to Project icon | Request Permissions | PDF file iconPDF (79 KB)  
    Freely Available from IEEE
  • How to write a patent

    Page(s): 144
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (78 KB) |  | HTML iconHTML  

    In my last column, I showed how to write patent claims in a very broad manner. This time, I’ll talk about writing the rest of the patent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

High-quality technical articles from designers, systems integrators, and users discussing the design, performance, or application of microcomputer and microprocessor systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Erik R. Altman
School of Electrical and Computer Engineering
IBM T.J. Watson Research Center