2014 LLVM Compiler Infrastructure in HPC

17-17 Nov. 2014

Filter Results

Displaying Results 1 - 11 of 11
  • Title Page iii

    Publication Year: 2014, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (186 KB)
    Freely Available from IEEE
  • Copyright Page

    Publication Year: 2014, Page(s): ii
    Request permission for commercial reuse | PDF file iconPDF (127 KB)
    Freely Available from IEEE
  • Table of Contents

    Publication Year: 2014, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (124 KB)
    Freely Available from IEEE
  • Foreword

    Publication Year: 2014, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (92 KB) | HTML iconHTML
    Freely Available from IEEE
  • Program Committee Members

    Publication Year: 2014, Page(s): v
    Request permission for commercial reuse | PDF file iconPDF (103 KB)
    Freely Available from IEEE
  • PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

    Publication Year: 2014, Page(s):1 - 11
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (298 KB) | HTML iconHTML

    We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coordinating GPU Threads for OpenMP 4.0 in LLVM

    Publication Year: 2014, Page(s):12 - 21
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (387 KB) | HTML iconHTML

    GPUs devices are becoming critical building blocks of High-Performance platforms for performance and energy efficiency reasons. As a consequence, parallel programming environment such as OpenMP were extended to support offloading code to such devices. OpenMP compilers are faced with offering an efficient implementation of device-targeting constructs. One main issue in implementing OpenMP on a GPU ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SamplePGO - The Power of Profile Guided Optimizations without the Usability Burden

    Publication Year: 2014, Page(s):22 - 28
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (251 KB) | HTML iconHTML

    Profile-guided optimizations (PGO) offer more optimization opportunities that are typically hard to obtain with static heuristics and techniques. In several application domains, significant performance can be gained by using runtime profiles to guide optimization. However, traditional PGO techniques that rely on compiler instrumentation are difficult enough to use that they have not become very po... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture-Independent Modeling of Intra-Node Data Movement

    Publication Year: 2014, Page(s):29 - 39
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (272 KB) | HTML iconHTML

    A primary concern of future high performance systems is the way data movement is managed; the sheer scale of data to be processed directly affects the achievable performance these systems can attain. However, the increasingly complex but inherently symbiotic relationships between upcoming scientific applications and high-performance architectures necessitate increasingly informative and flexible t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications

    Publication Year: 2014, Page(s):40 - 47
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (400 KB) | HTML iconHTML

    Neither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state-of-the-art dynamic rac... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author Index

    Publication Year: 2014, Page(s): 48
    Request permission for commercial reuse | PDF file iconPDF (57 KB)
    Freely Available from IEEE