2014 47th Annual IEEE/ACM International Symposium on Microarchitecture

13-17 Dec. 2014

Filter Results

Displaying Results 1 - 25 of 67
  • [Front cover]

    Publication Year: 2014, Page(s): C4
    Request permission for commercial reuse | |PDF file iconPDF (1267 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2014, Page(s): i
    Request permission for commercial reuse | |PDF file iconPDF (72 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2014, Page(s): iii
    Request permission for commercial reuse | |PDF file iconPDF (127 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2014, Page(s): iv
    Request permission for commercial reuse | |PDF file iconPDF (122 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2014, Page(s):v - ix
    Request permission for commercial reuse | |PDF file iconPDF (313 KB)
    Freely Available from IEEE
  • Message from the General Chair

    Publication Year: 2014, Page(s): x
    Request permission for commercial reuse | |PDF file iconPDF (50 KB) | HTML iconHTML
    Freely Available from IEEE
  • Message from the Program Co-Chairs

    Publication Year: 2014, Page(s):xi - xii
    Request permission for commercial reuse | |PDF file iconPDF (48 KB) | HTML iconHTML
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2014, Page(s): xiii
    Request permission for commercial reuse | |PDF file iconPDF (45 KB)
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2014, Page(s):xiv - xv
    Request permission for commercial reuse | |PDF file iconPDF (48 KB)
    Freely Available from IEEE
  • External Reviewer Committee

    Publication Year: 2014, Page(s):xvi - xviii
    Request permission for commercial reuse | |PDF file iconPDF (52 KB)
    Freely Available from IEEE
  • External Reviewers

    Publication Year: 2014, Page(s): xix
    Request permission for commercial reuse | |PDF file iconPDF (37 KB)
    Freely Available from IEEE
  • Keynote abstracts

    Publication Year: 2014, Page(s):xx - xxii
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (40 KB) | HTML iconHTML

    These Keynotes speeches the following: From IoT to services - efficiency matters; The end of Moore's law - again; Investigating the brain's computational paradigm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache

    Publication Year: 2014, Page(s):1 - 12
    Cited by:  Papers (21)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (178 KB) | HTML iconHTML

    This paper analyzes the trade-offs in architecting stacked DRAM either as part of main memory or as a hardware-managed cache. Using stacked DRAM as part of main memory increases the effective capacity, but obtaining high performance from such a system requires Operating System (OS) support to migrate data at a page-granularity. Using stacked DRAM as a hardware cache has the advantages of being tra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transparent Hardware Management of Stacked DRAM as Part of Memory

    Publication Year: 2014, Page(s):13 - 24
    Cited by:  Papers (24)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (1195 KB) | HTML iconHTML

    Recent technology advancements allow for the integration of large memory structures on-die or as a die-stacked DRAM. Such structures provide higher bandwidth and faster access time than off-chip memory. Prior work has investigated using the large integrated memory as a cache, or using it as part of a heterogeneous memory system under management of the OS. Using this memory as a cache would waste a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

    Publication Year: 2014, Page(s):25 - 37
    Cited by:  Papers (31)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (284 KB) | HTML iconHTML

    Recent research advocates large die-stacked DRAM caches in many core servers to break the memory latency and bandwidth wall. To realize their full potential, die-stacked DRAM caches necessitate low lookup latencies, high hit rates and the efficient use of off-chip bandwidth. Today's stacked DRAM cache designs fall into two categories based on the granularity at which they manage data: block-based ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth

    Publication Year: 2014, Page(s):38 - 50
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (555 KB) | HTML iconHTML

    In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses the miss rate versus off-chip bandwidth dilemma by organizin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Citadel: Efficiently Protecting Stacked Memory from Large Granularity Failures

    Publication Year: 2014, Page(s):51 - 62
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (376 KB) | HTML iconHTML

    Stacked memory modules are likely to be tightly integrated with the processor. It is vital that these memory modules operate reliably, as memory failure can require the replacement of the entire socket. To make matters worse, stacked memory designs are susceptible to newer failure modes (for example, due to faulty through-silicon vias, or TSVs) that can cause large portions of memory, such as a ba... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Locality-Aware Mapping of Nested Parallel Patterns on GPUs

    Publication Year: 2014, Page(s):63 - 74
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (351 KB) | HTML iconHTML

    Recent work has explored using higher level languages to improve programmer productivity on GPUs. These languages often utilize high level computation patterns (e.g., Map and Reduce) that encode parallel semantics to enable automatic compilation to GPU kernels. However, the problem of efficiently mapping patterns to GPU hardware becomes significantly more difficult when the patterns are nested, wh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists

    Publication Year: 2014, Page(s):75 - 87
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (597 KB) | HTML iconHTML

    Although GPGPUs are traditionally used to accelerate workloads with regular control and memory-access structure, recent work has shown that GPGPUs can also achieve significant speedups on more irregular algorithms. Data-driven implementations of irregular algorithms are algorithmically more efficient than topology-driven implementations, but issues with memory contention and memory-access irregula... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

    Publication Year: 2014, Page(s):88 - 100
    Cited by:  Papers (25)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (2894 KB) | HTML iconHTML

    GPU is often equipped with complex memory systems, including globalmemory, texture memory, shared memory, constant memory, and variouslevels of cache. Where to place the data is important for theperformance of a GPU program. However, the decision is difficult for aprogrammer to make because of architecture complexity and thesensitivity of suitable data placements to input and architecturechanges.T... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures

    Publication Year: 2014, Page(s):101 - 113
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (710 KB) | HTML iconHTML

    Data-parallel architectures must provide efficient support for complex control-flow constructs to support sophisticated applications coded in modern single-program multiple-data languages. As these architectures have wide data paths that process a single instruction across parallel threads, a mechanism is needed to track and sequence threads as they traverse potentially divergent control paths thr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Managing GPU Concurrency in Heterogeneous Architectures

    Publication Year: 2014, Page(s):114 - 126
    Cited by:  Papers (22)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (2442 KB) | HTML iconHTML

    Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are projected to be the dominant computing platforms for many classes of applications. The design of such systems is more complex than that of homogeneous architectures because maximizing resource utilization while minimizing shared resource interference between CPU and GPU applications is difficult. We sh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load Value Approximation

    Publication Year: 2014, Page(s):127 - 139
    Cited by:  Papers (29)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (1379 KB) | HTML iconHTML

    Approximate computing explores opportunities that emerge when applications can tolerate error or inexactness. These applications, which range from multimedia processing to machine learning, operate on inherently noisy and imprecise data. We can trade-off some loss in output value integrity for improved processor performance and energy-efficiency. As memory accesses consume substantial latency and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Arbitrary Modulus Indexing

    Publication Year: 2014, Page(s):140 - 152
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (1344 KB) | HTML iconHTML

    Modern high performance processors require memory systems that can provide access to data at a rate that is well matched to the processor's computation rate. Common to such systems is the organization of memory into local high speed memory banks that can be accessed in parallel. Associative look up of values is made efficient through indexing instead of associative memories. These techniques lose ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems

    Publication Year: 2014, Page(s):153 - 165
    Cited by:  Papers (24)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (779 KB) | HTML iconHTML

    Byte-addressable nonvolatile memories promise a new technology, persistent memory, which incorporates desirable attributes from both traditional main memory (byte-addressability and fast interface) and traditional storage (data persistence). To support data persistence, a persistent memory system requires sophisticated data duplication and ordering control for write requests. As a result, applicat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.