By Topic

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29

2-4 Dec. 1996

Filter Results

Displaying Results 1 - 25 of 32
  • IEEE/ACM International Symposium on Microarchitecture Micro-29

    Publication Year: 1996, Page(s):iii - vii
    Request permission for commercial reuse | PDF file iconPDF (202 KB)
    Freely Available from IEEE
  • Heuristics for register-constrained software pipelining

    Publication Year: 1996, Page(s):250 - 261
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1213 KB)

    Software Pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. There has been a significant effort to produce throughput-optimal schedules under resource constraints, and more recently to produce throughput-optimal schedules with minimum register requirements. Unfortunately even a throughput-optimal schedule w... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (1248 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1996, Page(s): 359
    Request permission for commercial reuse | PDF file iconPDF (53 KB)
    Freely Available from IEEE
  • Java bytecode to native code translation: the Caffeine prototype and preliminary results

    Publication Year: 1996, Page(s):90 - 97
    Cited by:  Papers (17)  |  Patents (65)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (936 KB)

    The Java bytecode language is emerging as a software distribution standard. With major vendors committed to porting the Java run-time environment to their platforms, programs in Java bytecode are expected to run without modification on multiple platforms. These first generation runtime environments rely on an interpreter to bridge the gap between the bytecode instructions and the native hardware. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hot cold optimization of large Windows/NT applications

    Publication Year: 1996, Page(s):80 - 89
    Cited by:  Papers (11)  |  Patents (29)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (868 KB)

    A dynamic instruction trace often contains many unnecessary instructions that are required only by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique that realizes this performance opportunity. HCO uses profile information to partition each routine into frequently executed (hot) and infrequently executed (cold) parts. Unnecessary operations in the hot portion are rem... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speculative hedge: regulating compile-time speculation against profile variations

    Publication Year: 1996, Page(s):70 - 79
    Cited by:  Papers (11)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1020 KB)

    Path-oriented scheduling methods, such os trace scheduling and hyperblock scheduling, use speculation to extract instruction-level parallelism from control-intensive programs. These methods predict important execution paths in the current scheduling scope using execution profiling or frequency estimation. Aggressive speculation is then applied to the important execution paths, possibly at the cost... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Profile-driven instruction level parallel scheduling with application to super blocks

    Publication Year: 1996, Page(s):58 - 67
    Cited by:  Papers (20)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (856 KB)

    Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler optimization research in light of the increased use of long-instruction-word machines. Unfortunately optimum scheduling is computationally intractable, and one must resort to carefully crafted heuristics in practice. If the scope of application of a scheduling heuristic is limited to basic blocks, cons... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient path profiling

    Publication Year: 1996, Page(s):46 - 57
    Cited by:  Papers (132)  |  Patents (67)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (964 KB)

    A path profile determines how many times each acyclic path in a routine executes. This type of profiling subsumes the more common basic block and edge profiling, which only approximate path frequencies. Path profiles have many potential uses in program performance tuning, profile-directed compilation, and software test coverage. This paper describes a new algorithm for path profiling. This simple,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tango: a hardware-based data prefetching technique for superscalar processors

    Publication Year: 1996, Page(s):214 - 225
    Cited by:  Papers (13)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1152 KB)

    We present a new hardware-based data prefetching mechanism for enhancing instruction level parallelism and improving the performance of superscalar processors. The emphasis in our scheme is on the effective utilization of slack time and hardware resources not used for the main computation. The scheme suggests a new hardware construct, the program progress graph (PPG), as a simple extension to the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate and practical profile-driven compilation using the profile buffer

    Publication Year: 1996, Page(s):36 - 45
    Cited by:  Papers (15)  |  Patents (74)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (892 KB)

    Profiling is a technique of gathering program statistics in order to aid program optimization. In particular, it is an essential component of compiler optimization for the extraction of instruction-level parallelism. Code instrumentation has been the most popular method of profiling. However real-time, interactive, and transaction processing applications suffer from the high execution-time overhea... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction fetch mechanisms for VLIW architectures with compressed encodings

    Publication Year: 1996, Page(s):201 - 211
    Cited by:  Papers (22)  |  Patents (66)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1132 KB)

    VLIW architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. This report uses the TINKER experimental testbed to examine instruction fetch and instruction cache mechanisms for VLIWs. A compressed instruction encoding for VLIWs is defined and a classification scheme for i-fetch hardware for such an encoding is ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trace cache: a low latency approach to high bandwidth instruction fetching

    Publication Year: 1996, Page(s):24 - 34
    Cited by:  Papers (168)  |  Patents (123)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1236 KB)

    As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements will also increase. It will become necessary to fetch multiple basic blocks per cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. We propose supplementing the conventional instruction cache with a trace cache. T... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Global predicate analysis and its application to register allocation

    Publication Year: 1996, Page(s):114 - 125
    Cited by:  Papers (17)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1088 KB)

    To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different basic blocks to be converted to straight-line code guarded by boolean predicates. However predicated ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler synthesized dynamic branch prediction

    Publication Year: 1996, Page(s):153 - 164
    Cited by:  Papers (17)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1140 KB)

    Branch prediction is the predominant approach for minimizing the pipeline breaks caused by branch instructions. Traditionally, branch prediction is accomplished in one of two ways, static prediction at compile-time via compiler analysis or dynamic prediction at run-time via special hardware structures. In this paper, we propose a novel technique that aims to combine the strengths of the two approa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction scheduling and executable editing

    Publication Year: 1996, Page(s):288 - 297
    Cited by:  Papers (26)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (984 KB)

    Modern microprocessors offer more instruction-level parallelism than most programs and compilers can currently exploit. The resulting disparity between a machine's peak and actual performance, while frustrating for computer architects and chip manufacturers, opens the exciting possibility of low-cost instrumentation for measurement, simulation, or emulation. Instrumentation code that executes in p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increasing the instruction fetch rate via block-structured instruction set architectures

    Publication Year: 1996, Page(s):191 - 200
    Cited by:  Papers (21)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (944 KB)

    To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rare. We define an optimization, called ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Custom-fit processors: letting applications define architectures

    Publication Year: 1996, Page(s):324 - 335
    Cited by:  Papers (31)  |  Patents (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (940 KB)

    In this paper we report on a system which automatically designs realistic VLIW architectures highly optimized for one given application (the input for this system), while running all other code correctly. The system uses a product-quality compiler that generates very aggressive VLIW code. We retarget the compiler until we have found a VLIW architecture idealized for the application on the basis of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating a misprediction recovery cache (MRC) into a superscalar pipeline

    Publication Year: 1996, Page(s):14 - 23
    Cited by:  Papers (9)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1060 KB)

    In modern processors, deep pipelines couple with superscalar techniques to allow each pipe stage to process multiple instructions. When such a pipe must be pushed and refilled, as when predicted program flow beyond a branch is subsequently recognized as wrong, the temporary performance loss is significant. While modern branch target buffer (BTB) technology makes this flush/refill penalty fairly ra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis techniques for predicated code

    Publication Year: 1996, Page(s):100 - 113
    Cited by:  Papers (18)  |  Patents (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1184 KB)

    Predicated execution offers new approaches to exploiting instruction-level parallelism (ILP), but it also presents new challenges for compiler analysis and optimization. In predicated code, each operation is guarded by a boolean operand whose run-time value determines whether the operation is executed or nullified. While research has shown the utility of predication in enhancing ILP, there has bee... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exceeding the dataflow limit via value prediction

    Publication Year: 1996, Page(s):226 - 237
    Cited by:  Papers (157)  |  Patents (52)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1248 KB)

    For decades, the serialization constraints imposed by true data dependences have been regarded as an absolute limit-the dataflow limit-on the parallel execution of serial programs. This paper proposes a new technique-value prediction-for exceeding that limit that allows data dependent instructions to issue and execute in parallel without violating program semantics. This technique is built on the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modulo scheduling of loops in control-intensive non-numeric programs

    Publication Year: 1996, Page(s):126 - 137
    Cited by:  Papers (16)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1140 KB)

    Much of the previous work on modulo scheduling has targeted numeric programs, in which, often, the majority of the loops are well-behaved loop-counter-based loops without early exits. In control-intensive non-numeric programs, the loops frequently have characteristics that make it more difficult to effectively apply modulo scheduling. These characteristics include multiple control flow paths, loop... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Software pipelining loops with conditional branches

    Publication Year: 1996, Page(s):262 - 273
    Cited by:  Papers (16)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1260 KB)

    Software pipelining is an aggressive scheduling technique that generates efficient code for loops and is particularly effective for VLIW architectures. Few software pipelining algorithms, however, are able to efficiently schedule loops that contain conditional branches. We have developed an algorithm we call All Paths Pipelining (APP) that addresses this shortcoming of software pipelining. APP is ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wrong-path instruction prefetching

    Publication Year: 1996, Page(s):165 - 175
    Cited by:  Papers (20)  |  Patents (17)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1032 KB)

    Instruction cache misses can severely limit the performance of both superscalar processors and high speed sequential machines. Instruction prefetch algorithms attempt to reduce the performance degradation by bringing lines into the instruction cache before they are needed by the CPU fetch unit. There have been several algorithms proposed to do this, most notably next line prefetching and target pr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction scheduling for the HP PA-8000

    Publication Year: 1996, Page(s):298 - 307
    Cited by:  Papers (4)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (896 KB)

    The PA-8000 is capable of reordering independent operations at run time, a task normally performed only by the instruction scheduler in the compiler. This paper presents some of the unique issues faced by an instruction scheduler for the PA-8000. Several features of the micro-architecture are presented along with the heuristics used in the production compiler to model that feature. These features ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.