2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

16-19 Dec. 2015

Filter Results

Displaying Results 1 - 25 of 63
  • [Front cover]

    Publication Year: 2015, Page(s): C4
    Request permission for commercial reuse | PDF file iconPDF (8888 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2015, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (98 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2015, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (133 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2015, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (114 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2015, Page(s):v - ix
    Request permission for commercial reuse | PDF file iconPDF (161 KB)
    Freely Available from IEEE
  • Message from the General Co-chairs and Vice-General Co-chairs

    Publication Year: 2015, Page(s):x - xi
    Request permission for commercial reuse | PDF file iconPDF (43 KB) | HTML iconHTML
    Freely Available from IEEE
  • Message from the Program Chair

    Publication Year: 2015, Page(s): xii
    Request permission for commercial reuse | PDF file iconPDF (36 KB) | HTML iconHTML
    Freely Available from IEEE
  • Message from the Steering Chair

    Publication Year: 2015, Page(s): xiii
    Request permission for commercial reuse | PDF file iconPDF (37 KB) | HTML iconHTML
    Freely Available from IEEE
  • HiPC 2015 Committees

    Publication Year: 2015, Page(s):xiv - xviii
    Request permission for commercial reuse | PDF file iconPDF (52 KB)
    Freely Available from IEEE
  • HiPC 2015 Technical Program

    Publication Year: 2015, Page(s): xix
    Request permission for commercial reuse | PDF file iconPDF (65 KB)
    Freely Available from IEEE
  • Scale-out Beyond Map-Reduce

    Publication Year: 2015, Page(s): 1
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (90 KB) | HTML iconHTML

    Summary form only given. Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and u... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Which Verification for Soft Error Detection?

    Publication Year: 2015, Page(s):2 - 11
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (331 KB)

    Many methods are available to detect silent errors in high-performance computing (HPC) applications. Each comes with a given cost and recall (fraction of all errors that are actually detected). The main contribution of this paper is to characterize the optimal computational pattern for an application: which detector(s) to use, how many detectors of each type to use, together with the length of the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Throughput Regulation in Shared Memory Multicore Processors

    Publication Year: 2015, Page(s):12 - 20
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (504 KB) | HTML iconHTML

    Performance scaling is now synonymous with scaling the number of cores. One of the consequences of this shift is the increasing difficulty of designing processors with predictable and controllable performance. To address this challenge this paper proposes a chip-scale throughput regulation technique that is based on dynamic tracking of instruction execution dynamics in each core. A new variable ga... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application Taxonomy via Algorithmic Commonality for Domain-Specific Architecture Desgin

    Publication Year: 2015, Page(s):21 - 29
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1021 KB) | HTML iconHTML

    In this paper, we propose an approach of application taxonomy from a perspective of algorithmic commonality. The taxonomy exploits algorithm-inherent characterization to imply a categorization of domain-specific architecture in the initial phase of architecture design. First, we introduce both metrics and graph-based mining algorithm to evaluate the commonality across multiple applications. Second... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FlexCore: A Reconfigurable Processor Supporting Flexible, Dynamic Morphing

    Publication Year: 2015, Page(s):30 - 39
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (421 KB) | HTML iconHTML

    In the realm of desktop and server class processors, the prevailing trend is to use out-of-order superscalar cores that exploit the hidden instruction-level parallelism in a program. In superscalar designs, the performance (as measured by the IPC, instructions committed per clock cycle) does not go up linearly with the dispatch width, say, n, due to dependencies in the program and higher branching... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High Efficiency Generalized Parallel Counters for Xilinx FPGAs

    Publication Year: 2015, Page(s):40 - 46
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (197 KB) | HTML iconHTML

    Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Prior work on GPC synthesis using FPGAs has focused on utilizing the fast carry chain and mapping the logic onto LUTs. This mapping is not optimal in the sense that the LUT fabric is not fully utilized. This results in low efficiency GPCs. Modern day Xilinx FPGAs support 6-input LUTs that can be u... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2QW-Clock: An Efficient SSD Buffer Management Algorithm

    Publication Year: 2015, Page(s):47 - 53
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (307 KB) | HTML iconHTML

    Modern solid state disk (SSD) has a buffer (SDRAM), which is used to store commonly used data and map in the near future. How to efficient management of this buffer is an important things of improving performance of SSD. Flash read and write speed have asymmetric characteristic. SSD buffer management algorithms must consider this characteristic of flash. Current page mapping SSD buffer management ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures

    Publication Year: 2015, Page(s):54 - 63
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (267 KB) | HTML iconHTML

    Recent studies have shown the potential of task-based programming paradigms for implementing robust, scalable sparse direct solvers for modern computing platforms. Yet, designing task flows that efficiently exploit heterogeneous architectures remains highly challenging. In this paper we first tackle the issue of data partitioning using a method suited for heterogeneous platforms. On the one hand, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

    Publication Year: 2015, Page(s):64 - 74
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (772 KB) | HTML iconHTML

    Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. Efficient CSR-based SpMV obviates the need for other GPU-specific storage formats, thereby saving runtime and storage overheads. However, existing... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Resilience of Parallel Sparse Hybrid Solvers

    Publication Year: 2015, Page(s):75 - 84
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1717 KB) | HTML iconHTML

    As the computational power of high performance computing (HPC) systems continues to increase by using a huge number of CPU cores or specialized processing units, extreme-scale applications are increasingly prone to faults. Consequently, the HPC community has proposed many contributions to design resilient HPC applications. These contributions may be system-oriented, theoretical or numerical. In th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New Tridiagonal Systems Solvers on GPU Architectures

    Publication Year: 2015, Page(s):85 - 94
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1210 KB) | HTML iconHTML

    Modern GPUs (Graphics Processing Units) offer very high computing power at relatively low cost. Nevertheless, designing efficient algorithms for the GPUs usually requires additional time and effort, even for experienced programmers. On the other hand, tridiagonal systems solvers are an important building block for a wide range of applications. In this paper, we present a new tuning parallel propos... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Stable Parallel Algorithm for Diagonally Dominant Tridiagonal Linear Systems

    Publication Year: 2015, Page(s):95 - 104
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (296 KB) | HTML iconHTML

    In this work, we present a stable parallel algorithm based on WZ factorization for solving diagonally dominant tridiagonal linear system of algebraic equations, using divide and conquer approach. Existence results are given and the backward error analysis of the method is presented. Numerical stability of the algorithm is proved. The given parallel algorithm for diagonally dominant tridiagonal lin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing Approximate Weighted Matching on Nvidia Kepler K40

    Publication Year: 2015, Page(s):105 - 114
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (18806 KB) | HTML iconHTML

    Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate we... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Communication Throughput by Multipath Load Balancing on Blue Gene/Q

    Publication Year: 2015, Page(s):115 - 124
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (241 KB) | HTML iconHTML

    Achievable networking performance of applications in a supercomputer depends on the exact combination of the communication patterns of the applications and the routing algorithms used by the supercomputer. In order to achieve the highest networking performance for the applications, the routing algorithms need to be designed optimally for those communication patterns. However, while communication p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Adaptation for Elastic System Services Using Virtual Servers

    Publication Year: 2015, Page(s):125 - 134
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (704 KB) | HTML iconHTML

    A vast majority of legacy runtime systems and middleware prevalent in cluster and supercomputing environments are static in nature. Due to the rising scale and complexity of high-performance computing systems, the static nature of systems software would prospectively impede its scalability and resilience. Traditionally, the mobility of servers is further limited since services are statically bound... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.