By Topic

2010 39th International Conference on Parallel Processing

Date 13-16 Sept. 2010

Filter Results

Displaying Results 1 - 25 of 83
  • [Front cover]

    Publication Year: 2010, Page(s): C1
    Request permission for commercial reuse | PDF file iconPDF (131 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2010, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (33 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2010, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (64 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2010, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (122 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2010, Page(s):v - xi
    Request permission for commercial reuse | PDF file iconPDF (134 KB)
    Freely Available from IEEE
  • Message from the Program Chair

    Publication Year: 2010, Page(s):xii - xiii
    Request permission for commercial reuse | PDF file iconPDF (65 KB) | HTML iconHTML
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2010, Page(s):xiv - xvi
    Request permission for commercial reuse | PDF file iconPDF (62 KB)
    Freely Available from IEEE
  • list-reviewer

    Publication Year: 2010, Page(s):xvii - xxi
    Request permission for commercial reuse | PDF file iconPDF (75 KB)
    Freely Available from IEEE
  • Panel abstract

    Publication Year: 2010, Page(s): xxii
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (55 KB)

    Provides an abstract for each of the presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors

    Publication Year: 2010, Page(s):1 - 10
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (386 KB) | HTML iconHTML

    Limiting the peak power consumption of chip multiprocessors (CMPs) has recently received a lot of attention. In order to enable chip-level power capping, the peak power consumption of on-chip L2 caches in a CMP often needs to be constrained by dynamically transitioning selected cache banks into low-power modes. However, dynamic cache resizing for power capping may cause undesired long cache access... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Theoretical Framework for Value Prediction in Parallel Systems

    Publication Year: 2010, Page(s):11 - 20
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (413 KB) | HTML iconHTML

    We present here a theoretical framework towards a fundamental understanding of the effects of value prediction. Our framework consists of two parts: first, an identification of the theoretical limit of value prediction and an indication of the potential to improve parallelism through the exploitation of value predictability; second, a demonstration of the feasibility of data prediction and a theor... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Heterogeneous Mini-rank: Adaptive, Power-Efficient Memory Architecture

    Publication Year: 2010, Page(s):21 - 29
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (310 KB) | HTML iconHTML

    Memory power consumption has become a big concern in server platforms. A recently proposed mini-rank architecture reduces the memory power consumption by breaking each DRAM rank into multiple narrow mini-ranks and activating fewer devices for each request. However, its fixed and uniform configuration may degrade performance significantly or lose power saving opportunities on some workloads. We pro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gossamer: A Lightweight Approach to Using Multicore Machines

    Publication Year: 2010, Page(s):30 - 39
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (282 KB) | HTML iconHTML

    The key to performance improvements in the multi-core era is for software to utilize the available concurrency. This paper presents a lightweight programming framework called Gossamer that is easy to use, enables the solution of a broad range of parallel programming problems, and produces efficient code. Gossamer contains (1) a set of high-level annotations that one adds to a sequential program to... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

    Publication Year: 2010, Page(s):40 - 50
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (640 KB) | HTML iconHTML

    To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich in DOACR p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploitation of Dynamic Communication Patterns through Static Analysis

    Publication Year: 2010, Page(s):51 - 60
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (487 KB) | HTML iconHTML

    Collective operations can have a large impact on the performance of parallel applications. However, the ideal implementation of a particular collective communication often depends on both the application and the targeted machine structure. Our approach combines dynamic and static analysis techniques to identify common collective communication patterns expressed as point-to-point calls and transfor... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel Exact Inference on a CPU-GPGPU Heterogenous System

    Publication Year: 2010, Page(s):61 - 70
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (549 KB) | HTML iconHTML

    Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to the GPGPU at run time. The ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimizing Stretch and Makespan of Multiple Parallel Task Graphs via Malleable Allocations

    Publication Year: 2010, Page(s):71 - 80
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1659 KB) | HTML iconHTML

    Many scientific applications can be structured as Parallel Task Graphs (PTGs), i.e., graphs of data-parallel tasks. Adding data-parallelism to a task-parallel application provides opportunities for higher performance and scalability, but poses scheduling challenges. We study the off-line scheduling of multiple PTGs on a single, homogeneous cluster. The objective is to optimize performance and fair... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient PageRank and SpMV Computation on AMD GPUs

    Publication Year: 2010, Page(s):81 - 89
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (345 KB) | HTML iconHTML

    Google's famous PageRank algorithm is widely used to determine the importance of web pages in search engines. Given the large number of web pages on the World Wide Web, efficient computation of PageRank becomes a challenging problem. We accelerated the power method for computing PageRank on AMD GPUs. The core component of the power method is the Sparse Matrix-Vector Multiplication (SpMV). Its perf... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

    Publication Year: 2010, Page(s):90 - 100
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (452 KB) | HTML iconHTML

    Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. More... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy Modeling of Wireless Sensor Nodes Based on Petri Nets

    Publication Year: 2010, Page(s):101 - 110
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (398 KB) | HTML iconHTML

    Energy minimization is of great importance in wireless sensor networks in extending the battery lifetime. Accurately understanding the energy consumption characteristics of each sensor node is a critical step for the design of energy saving strategies. This paper develops a detailed probabilistic model based on Petri nets to evaluate the energy consumption of a wireless sensor node. The model fact... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal Task Reallocation in Heterogeneous Distributed Computing Systems with Age-Dependent Delay Statistics

    Publication Year: 2010, Page(s):111 - 120
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (693 KB) | HTML iconHTML

    This paper presents a general framework for optimal task reallocation in heterogeneous distributed-computing systems and offers a rigorous analytical model for the stochastic execution time of a workload. The model takes into account the heterogeneity and stochastic nature of the tasks' service and transfer times, servers' failure times, as well as an arbitrary task-reallocation policy. The stocha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories

    Publication Year: 2010, Page(s):121 - 130
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (364 KB) | HTML iconHTML

    This paper presents a new design and an implementation of the runtime system of MapReduce for heterogeneous multicore processors with explicitly managed local memories. We advance the state of the art in runtime support for MapReduce using five instruments: (1) A new multi-threaded, event-driven controller for task instantiation, task scheduling, synchronization, and bulk-synchronous execution of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System-Level, Unified In-band and Out-of-band Dynamic Thermal Control

    Publication Year: 2010, Page(s):131 - 140
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (510 KB) | HTML iconHTML

    High-density computer racks become increasingly commonplace in supercomputing centers and data centers. With tight integration of high-powered computing components in the racks, hot spots or pockets of elevated temperatures on the chips and system can be easily formed when room air circulation is not effective. Hot spots reduce the reliability of high-density systems and increase the chances of th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Microwiper: Efficient Memory Propagation in Live Migration of Virtual Machines

    Publication Year: 2010, Page(s):141 - 149
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (426 KB) | HTML iconHTML

    Live migration of virtual machines relocates running VM across physical hosts with unnoticeable service downtime. However, propagating changing VM memory at low cost, especially for write-intensive applications or at relatively low network bandwidth, is still uncovered. This paper presents Microwiper, an improvement of memory propagation in live migration. Our idea is twofold. We propose ordered p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability and Performance Optimization of Pipelined Real-Time Systems

    Publication Year: 2010, Page(s):150 - 159
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1610 KB) | HTML iconHTML

    We consider pipelined real-time systems, commonly found in assembly lines, consisting of a chain of tasks executing on a distributed platform. Their processing is pipelined: each processor executes only one interval of consecutive tasks. We are therefore interested in minimizing both the input-output latency and the period. For dependability reasons, we are also interested in maximizing the reliab... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.