Scheduled System Maintenance
On Saturday, December 10, single article sales and account management will be unavailable from 5:00 AM-7:30 PM ET.
We apologize for the inconvenience.
By Topic

2012 Third Workshop on Applications for Multi-Core Architecture

24-25 Oct. 2012

Filter Results

Displaying Results 1 - 20 of 20
  • [Back cover]

    Publication Year: 2012, Page(s): C4
    Request permission for commercial reuse | PDF file iconPDF (1248 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2012, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (87 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2012, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (189 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2012, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (136 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2012, Page(s):v - vi
    Request permission for commercial reuse | PDF file iconPDF (144 KB)
    Freely Available from IEEE
  • Message from the Organizers

    Publication Year: 2012, Page(s): vii
    Request permission for commercial reuse | PDF file iconPDF (121 KB) | HTML iconHTML
    Freely Available from IEEE
  • Committees

    Publication Year: 2012, Page(s): viii
    Request permission for commercial reuse | PDF file iconPDF (117 KB)
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2012, Page(s): ix
    Request permission for commercial reuse | PDF file iconPDF (118 KB)
    Freely Available from IEEE
  • A Load Distribution Algorithm Based on Profiling for Heterogeneous GPU Clusters

    Publication Year: 2012, Page(s):1 - 6
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (213 KB) | HTML iconHTML

    Clusters of GPUs are becoming commonly used to execute computationally demanding applications. Due to the frequent changes in GPU architecture, many clusters contain heterogeneous types of GPUs, leading to the problem of load distribution among the machines. In this work, we propose a load distribution algorithm for scientific applications executed in heterogeneous GPU clusters. The algorithm find... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Study on Mixed Precision Techniques for a GPU-based SIP Solver

    Publication Year: 2012, Page(s):7 - 12
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (240 KB) | HTML iconHTML

    This article presents the study and application of mixed precision techniques to accelerate a GPU-based implementation of the Strongly Implicit Procedure (SIP) to solve hepta-diagonal linear systems. In particular, two different options to incorporate mixed precision in the GPU implementation are discussed and one of them is implemented. The experimental evaluation of our proposal demonstrates tha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture of Request Distributor for GPU Clusters

    Publication Year: 2012, Page(s):13 - 18
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (441 KB) | HTML iconHTML

    The advent of GPU computing has enabled development of many strategies for accelerating different kinds of simulations. Even further, instead of processing an application by just using one GPU, it is a common to use a collection of GPUs as a solution. These GPUs can be located in the same machine, network, or even across a wide area network. Unfortunately, distribution and management of GPUs requi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Hybrid CPU-GPU Local Search Heuristic for the Unrelated Parallel Machine Scheduling Problem

    Publication Year: 2012, Page(s):19 - 23
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (175 KB) | HTML iconHTML

    This work addresses the development of a hybrid CPU-GPU local search heuristic for the unrelated parallel machine scheduling problem. In this scheduling problem setup times are sequence-dependent and also machine-dependent. The objective is to minimize the maximum completion time of the schedule, known as make span. Since the problem belongs to the NP-hard class there is no known polynomial time a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A High-Level Implementation of STM Haskell with Write/Write Conflict Detection

    Publication Year: 2012, Page(s):24 - 29
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (170 KB) | HTML iconHTML

    This paper describes a high level implementation of Software Transactional Memory (STM) for the Haskell language. The library is implemented completely in Haskell and, as opposed to all other implementation of STM Haskell, it features early detection of write/write conflicts. Preliminary performance measurements using the Haskell STM benchmark show that the library performs much better than a TL2~... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal Virtual Channel Insertion for Contention Alleviation and Deadlock Avoidance in Custom NoCs

    Publication Year: 2012, Page(s):30 - 35
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (470 KB) | HTML iconHTML

    Deadlock and contention can be avoided in an NoC architecture by employing virtual channels (VC). VC insertion can result in power and chip area increases with little performance improvements. We present a novel VC insertion technique for deadlock avoidance and contention relief in irregular NoC architectures that avoids significant power and area increase. Given a resource pool of VCs, deadlock/c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Virtual Channel Implementation Technique for Multi-core On-chip Communication

    Publication Year: 2012, Page(s):36 - 41
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (321 KB) | HTML iconHTML

    In this paper, a new approach for implementing virtual channels (VC) for multi-core interconnection networks is presented. In this approach, the flits of different packets interleave in a channel with a single buffer of nominal depth by using a rotating flit-by-flit arbitration. The routing path of each flit is guaranteed because the flits belonging to the same packet are attached with an ID tag a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Autotuning Wavefront Abstractions for Heterogeneous Architectures

    Publication Year: 2012, Page(s):42 - 47
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1009 KB) | HTML iconHTML

    We present our auto tuned heterogeneous parallel programming abstraction for the wave front pattern. An exhaustive search of the tuning space indicates that correct setting of tuning factors can average 37x speedup over a sequential baseline. Our best automated machine learning based heuristic obtains 92% of this ideal speedup, averaged across our full range of wave front examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time-to-Solution and Energy-to-Solution: A Comparison between ARM and Xeon

    Publication Year: 2012, Page(s):48 - 53
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (163 KB) | HTML iconHTML

    Most High Performance Computing (HPC) systems today are known as "power hungry" because they aim at computing speed regardless to energy consumption. Some scientific applications still claim more speed and the community expects to reach exascale by the end of the decade. Nevertheless, to reach exascale we need to search alternatives to cope with energy constraints. A promising step forward in this... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scheduling Cyclic Task Graphs with SCC-Map

    Publication Year: 2012, Page(s):54 - 59
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (643 KB) | HTML iconHTML

    The Dataflow execution model has been shown to be a good way of exploiting TLP, making parallel programming easier. In this model, tasks must be mapped to processing elements (PEs) considering the trade-off between communication and parallelism. Previous work on scheduling dependency graphs have mostly focused on directed a cyclic graphs, which are not suitable for dataflow (loops in the code beco... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2012, Page(s): 60
    Request permission for commercial reuse | PDF file iconPDF (67 KB)
    Freely Available from IEEE
  • [Publisher's information]

    Publication Year: 2012, Page(s): 62
    Request permission for commercial reuse | PDF file iconPDF (135 KB)
    Freely Available from IEEE