By Topic

Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on

Date 26-28 April 2009

Filter Results

Displaying Results 1 - 25 of 34
  • IEEE International symposium on performance analysis of systems and software

    Publication Year: 2009 , Page(s): i
    Request Permissions | PDF file iconPDF (76 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2009 , Page(s): ii
    Request Permissions | PDF file iconPDF (141 KB)  
    Freely Available from IEEE
  • Message from the Program Chair

    Publication Year: 2009 , Page(s): iii - iv
    Request Permissions | PDF file iconPDF (121 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • ISPASS 2009 people

    Publication Year: 2009 , Page(s): v - vi
    Request Permissions | PDF file iconPDF (117 KB)  
    Freely Available from IEEE
  • ISPASS 2009 reviewers

    Publication Year: 2009 , Page(s): vii
    Request Permissions | PDF file iconPDF (116 KB)  
    Freely Available from IEEE
  • Accelerating architecture research

    Publication Year: 2009 , Page(s): viii
    Request Permissions | Click to expandAbstract | PDF file iconPDF (119 KB)  

    With the recent demonstration of 32nm processors we have seen Moore's law providing another large increase in the number of transistors. While more transistors provides architects with a great opportunity, I believe we have been observing increasing challenges in finding the most effective uses for these transistors. Design team size, mask costs and fabrication costs are all increasing, thus there... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis in the real world of on line services

    Publication Year: 2009 , Page(s): ix
    Request Permissions | Click to expandAbstract | PDF file iconPDF (119 KB)  

    Performance analysis has always been an integral part of a computer architect's agenda. However, the term performance is used largely to measure “speed”. The dictionary defines performance more broadly as “the manner in which or the efficiency with which something reacts or fulfills its intended purpose”. In today's internet based on line computing environment, performa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Table of contents

    Publication Year: 2009
    Request Permissions | PDF file iconPDF (107 KB)  
    Freely Available from IEEE
  • [Blank page]

    Publication Year: 2009 , Page(s): xii
    Request Permissions | PDF file iconPDF (49 KB)  
    Freely Available from IEEE
  • Differentiating the roles of IR measurement and simulation for power and temperature-aware design

    Publication Year: 2009 , Page(s): 1 - 10
    Cited by:  Papers (14)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (789 KB) |  | HTML iconHTML  

    In temperature-aware design, the presence or absence of a heatsink fundamentally changes the thermal behavior with important design implications. In recent years, chip-level infrared (IR) thermal imaging has been gaining popularity in studying thermal phenomena and thermal management, as well as reverse-engineering chip power consumption. Unfortunately, IR thermal imaging needs a peculiar cooling ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • User- and process-driven dynamic voltage and frequency scaling

    Publication Year: 2009 , Page(s): 11 - 22
    Cited by:  Papers (8)  |  Patents (5)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (279 KB) |  | HTML iconHTML  

    We describe and evaluate two new, independently-applicable power reduction techniques for power management on processors that support dynamic voltage and frequency scaling (DVFS): user-driven frequency scaling (UDFS) and process-driven voltage scaling (PDVS). In PDVS, a CPU-customized profile is derived offline that encodes the minimum voltage needed to achieve stability at each combination of CPU... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accuracy of performance counter measurements

    Publication Year: 2009 , Page(s): 23 - 32
    Cited by:  Papers (10)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (433 KB) |  | HTML iconHTML  

    Many experimental performance evaluations depend on accurate measurements of the cost of executing a piece of code. Often these measurements are conducted using infrastructures to access hardware performance counters. Most modern processors provide such counters to count micro-architectural events such as retired instructions or clock cycles. These counters can be difficult to configure, may not b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GARNET: A detailed on-chip network model inside a full-system simulator

    Publication Year: 2009 , Page(s): 33 - 42
    Cited by:  Papers (98)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (425 KB) |  | HTML iconHTML  

    Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. The interconnect power was also insignificant compared to the transistor power. With uniprocessor designs providing diminishing returns and the advent of chip multiprocessors (CMPs) in mainstream systems, the on-chip netwo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cetra: A trace and analysis framework for the evaluation of Cell BE systems

    Publication Year: 2009 , Page(s): 43 - 52
    Cited by:  Papers (1)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (755 KB) |  | HTML iconHTML  

    The cell broadband engine architecture (CBEA) is an heterogeneous multiprocessor architecture developed by Sony, Toshiba and IBM. The major implementation of this architecture is the cell broadband engine (cell for short), a processor that contains one generic PowerPC core and eight accelerators. The cell is targeted at high-performance computing systems and consumer-level devices that have high c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Zesto: A cycle-level simulator for highly detailed microarchitecture exploration

    Publication Year: 2009 , Page(s): 53 - 64
    Cited by:  Papers (38)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (225 KB) |  | HTML iconHTML  

    For academic computer architecture research, a large number of publicly available simulators make use of relatively simple abstractions for the microarchitecture of the processor pipeline. For some types of studies, such as those for multi-core cache coherence designs, a simple pipeline model may suffice. For detailed microarchitecture research, such as those that are sensitive to the exact behavi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lonestar: A suite of parallel irregular programs

    Publication Year: 2009 , Page(s): 65 - 76
    Cited by:  Papers (7)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (240 KB) |  | HTML iconHTML  

    Until recently, parallel programming has largely focused on the exploitation of data-parallelism in dense matrix programs. However, many important application domains, including meshing, clustering, simulation, and machine learning, have very different algorithmic foundations: they require building, computing with, and modifying large sparse graphs. In the parallel programming literature, these ty... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring speculative parallelism in SPEC2006

    Publication Year: 2009 , Page(s): 77 - 88
    Cited by:  Papers (5)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (1354 KB) |  | HTML iconHTML  

    The computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000's. It was hoped that the continuous improvement of single-program performance could be achieved through these architectures. However, traditional parallelizing compilers often fail to effectively parallelize general-purpose applications which typically have complex control... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Machine learning based online performance prediction for runtime parallelization and task scheduling

    Publication Year: 2009 , Page(s): 89 - 100
    Cited by:  Papers (1)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (412 KB) |  | HTML iconHTML  

    With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • WARP: Enabling fast CPU scheduler development and evaluation

    Publication Year: 2009 , Page(s): 101 - 112
    Cited by:  Patents (1)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (366 KB) |  | HTML iconHTML  

    Developing CPU scheduling algorithms and understanding their impact in practice can be difficult and time consuming due to the need to modify and test operating system kernel code and measure the resulting performance on a consistent workload of real applications. To address this problem, we have developed WARP, a trace-driven virtualized scheduler execution environment that can dramatically simpl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CMPSched$im: Evaluating OS/CMP interaction on shared cache management

    Publication Year: 2009 , Page(s): 113 - 122
    Cited by:  Papers (9)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (446 KB) |  | HTML iconHTML  

    CMPs have now become mainstream and are growing in complexity with more cores, several shared resources (cache, memory, etc) and the potential for additional heterogeneous elements. In order to manage these resources, it is becoming critical to optimize the interaction between the execution environment (operating systems, virtual machine monitors, etc) and the CMP platform. Performance analysis of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding the cost of thread migration for multi-threaded Java applications running on a multicore platform

    Publication Year: 2009 , Page(s): 123 - 132
    Cited by:  Papers (4)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (1422 KB) |  | HTML iconHTML  

    Multicore systems increase the complexity of performance analysis by introducing a new source of additional costs: thread migration between cores. This paper explores the cost of thread migration for Java applications. We first present a detailed analysis of the sources of migration overhead and show that they result from a combination of several factors including application behavior (working set... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The data-centricity of Web 2.0 workloads and its impact on server performance

    Publication Year: 2009 , Page(s): 133 - 142
    Cited by:  Papers (3)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (345 KB) |  | HTML iconHTML  

    Advances in network performance and browser technologies, coupled with the ubiquity of internet access and proliferation of users, have lead to the emergence of a new class of Web applications, called Web 2.0. Web 2.0 technologies enable easy collaboration and sharing by allowing users to contribute, modify, and aggregate content using applications like Wikis, Blogs, Social Networking communities,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Characterizing and optimizing the memory footprint of de novo short read DNA sequence assembly

    Publication Year: 2009 , Page(s): 143 - 152
    Cited by:  Papers (1)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (291 KB) |  | HTML iconHTML  

    In this work, we analyze the memory-intensive bioinformatics problem of ldquode novordquo DNA sequence assembly, which is the process of assembling short DNA sequences obtained by experiment into larger contiguous sequences. In particular, we analyze the performance scaling challenges inherent to de Bruijn graph-based assembly, which is particularly well suited for the data produced by ldquonext g... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An analytic model of optimistic Software Transactional Memory

    Publication Year: 2009 , Page(s): 153 - 162
    Request Permissions | Click to expandAbstract | PDF file iconPDF (255 KB) |  | HTML iconHTML  

    An analytic model is proposed to assess the performance of optimistic software transactional memory (STM) systems with in-place memory updates for write operations. Based on an absorbing discrete-time Markov chain, closed-form analytic expressions are developed, which are quickly solved iteratively to determine key parameters of the STM system. The model covers complex implementation details such ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analyzing CUDA workloads using a detailed GPU simulator

    Publication Year: 2009 , Page(s): 163 - 174
    Cited by:  Papers (154)  |  Patents (1)
    Request Permissions | Click to expandAbstract | PDF file iconPDF (725 KB) |  | HTML iconHTML  

    Modern graphic processing units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow's manycore processors, whether those are GPUs or otherwise. The combination of multiple, multithreaded, SIMD cores makes studying these GPUs useful in understanding tradeoffs among memory, data, and thread level parallelism. While mo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.