By Topic

Performance Analysis of Systems and Software, 2004 IEEE International Symposium on - ISPASS

Date 2004

Filter Results

Displaying Results 1 - 25 of 31
  • Deconstructing commit

    Publication Year: 2004 , Page(s): 68 - 77
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1924 KB) |  | HTML iconHTML  

    Many modern processors execute instructions out of their original program order to exploit instruction-level parallelism and achieve higher performance. However even though instructions can execute in an arbitrary order, they must eventually commit, or retire from execution, in program order. This constraint provides a safety mechanism to ensure that mis-speculated instructions are not inadvertent... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The BlueGene/L pseudo cycle-accurate simulator

    Publication Year: 2004 , Page(s): 36 - 44
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1665 KB) |  | HTML iconHTML  

    The design and development of a new computer system is a lengthy process, with a considerable amount of time elapsed between the beginning of development and first hardware availability. Hence, fast and reasonably accurate simulation of processor architecture has become critical as an enabling mechanism for software engineers to develop and tune system software and applications. In this paper, we ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectral analysis for characterizing program power and performance

    Publication Year: 2004 , Page(s): 151 - 160
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1921 KB) |  | HTML iconHTML  

    Performance and power analysis in modern processors requires managing a large amount of complex information across many time-scales. For a example, thermal control issues are a power subproblem with relevant time constants of millions of cycles or more, while the so-called dI/dT problem is also a power subproblem but occurs because of current variability on a much finer granularity: tens to hundre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler-directed physical address generation for reducing dTLB power

    Publication Year: 2004 , Page(s): 161 - 168
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1737 KB) |  | HTML iconHTML  

    Address translation using the Translation Lookaside Buffer (TLB) consumes as much as 16% of the chip power on some processors because of its high associativity and access frequency. While prior work has looked into optimizing this structure at the circuit and architectural levels, this paper takes a different approach of optimizing its power by reducing the number of data TLB (dTLB) lookups for da... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation of exclusive cache hierarchies

    Publication Year: 2004 , Page(s): 89 - 96
    Cited by:  Papers (15)  |  Patents (5)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1675 KB) |  | HTML iconHTML  

    Memory hierarchy performance, specifically cache memory capacity, is a constraining factor in the performance of modern computers. This paper presents the results of two-level cache memory simulations and examines the impact of exclusive caching on system performance. Exclusive caching enables higher capacity with the same cache area by eliminating redundant copies. The experiments presented compa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architectures and compilers for multimedia

    Publication Year: 2004
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1247 KB)  

    Summary form only given. The article covers architectures and compilers for multimedia systems. Multimedia applications impose real-time constraints on continuous media; they also include a surprisingly wide variety of algorithms. Many multimedia systems also operate under power/energy constraints. As such, multimedia computing systems are an important area of interest for ISPASS. This tutorial ta... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A co-phase matrix to guide simultaneous multithreading simulation

    Publication Year: 2004 , Page(s): 45 - 56
    Cited by:  Papers (32)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1993 KB) |  | HTML iconHTML  

    Several commercial processors have architectures that include support for simultaneous multithreading (SMT), yet there is still not a validated methodology for estimating the performance of an SMT machine that does not rely on full program simulation. To create an efficient sampling approach for SMT we must determine how far to fast-forward each individual thread between samples. The fast-forwardi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eccentric and fragile benchmarks

    Publication Year: 2004 , Page(s): 2 - 11
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1779 KB) |  | HTML iconHTML  

    Benchmarks are essential for computer architecture research and performance evaluation. Constructing a good benchmark suite is, however, non-trivial: it must be representative, show different types of behavior and the benchmarks should not be easily tweaked. This paper uses principal components analysis, a statistical data analysis technique, to detect differences in behavior between benchmarks. T... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic pretenuring schemes for generational garbage collection

    Publication Year: 2004 , Page(s): 133 - 140
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1673 KB) |  | HTML iconHTML  

    Previous research efforts have shown that pretenuring can potentially reduce the copying cost by creating long lived objects into the mature memory regions directly. To date, researchers often employ profiling and static analysis to accurately select the objects that should be pretenured. However, little research efforts have been spent on dynamic approaches for pretenuring objects. In this paper,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effectiveness of simple memory models for performance prediction

    Publication Year: 2004 , Page(s): 98 - 105
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1696 KB) |  | HTML iconHTML  

    Many situations call for an estimation of the execution time of applications, e.g., during design or evaluation of computer systems. In this paper we focus on large applications where the execution times heavily depend on the performance of the memory system. Since such applications are computationally expensive, direct simulation is not an option and an analytical model is called for. This paper ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cache implications of aggressively pipelined high performance microprocessors

    Publication Year: 2004 , Page(s): 123 - 132
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1714 KB) |  | HTML iconHTML  

    One of the major design decisions when developing a new microprocessor is determining the target pipeline depth and clock rate since both factors interact closely with one another. The optimal pipeline depth of a processor has been studied before, but the impact of the memory system on pipeline performance has received less attention. This study analyzes the affect of different level-1 cache desig... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selective profiling of Java applications using dynamic bytecode instrumentation

    Publication Year: 2004 , Page(s): 141 - 150
    Cited by:  Papers (1)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1829 KB) |  | HTML iconHTML  

    Instrumentation-based profiling provides a number of benefits, but can also cause high performance overhead. The negative impact of this overhead could be mitigated considerably if only a small part of the target application (e.g. one that has previously been identified as a bottleneck) is instrumented, possibly for a short time only, while the rest of the application code runs at full speed. In t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Characterization of the data access behavior for TPC-C traces

    Publication Year: 2004 , Page(s): 115 - 122
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1696 KB) |  | HTML iconHTML  

    In this paper, we look into the characteristics of the reference stream of TPC-C workloads from the buffer pool point of view. We analyze a trace coming from DB2 UDB version 8.1 fix pack 4 and compare it to a trace from DB2 UDB version 8.1 GA. We perform three types of analysis. A static analysis of the number of reads and writes for index and data pages. We conclude that index pages receive less ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • StatCache: a probabilistic approach to efficient and accurate data locality analysis

    Publication Year: 2004 , Page(s): 20 - 27
    Cited by:  Papers (20)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1587 KB) |  | HTML iconHTML  

    The widening memory gap reduces performance of applications with poor data locality. Therefore, there is a need for methods to analyze data locality and help application optimization. In this paper we present StatCache, a novel sampling-based method for performing data-locality analysis on realistic workloads. StatCache is based on a probabilistic model of the cache, rather than a functional cache... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structures for phase classification

    Publication Year: 2004 , Page(s): 57 - 67
    Cited by:  Papers (38)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1902 KB) |  | HTML iconHTML  

    Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed algorithms automatically group these similar intervals of execution into phases, where all he intervals in a phase have homogeneous behavior and similar resource requirements. In this paper we examine different program structures for capturing phase behavior. The goal is to compare the size and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamically reducing pressure on the physical register file through simple register sharing

    Publication Year: 2004 , Page(s): 78 - 87
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1909 KB) |  | HTML iconHTML  

    Using register renaming and physical registers, modern microprocessors eliminate false data dependences from reuse of the instruction set defined registers (logical registers). High performance processors that have longer pipelines and a greater capacity to exploit instruction-level parallelism have more instructions in-flight and require more physical registers. Simultaneous multithreading archit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using cache mapping to improve memory performance handheld devices

    Publication Year: 2004 , Page(s): 106 - 114
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1733 KB) |  | HTML iconHTML  

    Processors such as the Intel StrongARM SA-1110 and the Intel XScale provide flexible control over the cache management to achieve better cache utilization. Programs can specify the cache mapping policy for each virtual page, i.e. mapping it to the main cache, the mini-cache, or neither. For the latter case, the page is marked as non-cacheable. In this paper, we use memory profiling to guide such p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication breakdown: analyzing CPU usage in commercial Web workloads

    Publication Year: 2004 , Page(s): 12 - 19
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1680 KB) |  | HTML iconHTML  

    There is increasing concern among developers that future Web servers running commercial workloads may be limited by network processing overhead in the CPU as 10Gb Ethernet becomes prevalent. We analyze CPU usage of real hardware running popular commercial workloads, with an emphasis on identifying networking overhead. Contrary to much popular belief, our experiments show that network processing is... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sockets Direct Protocol over InfiniBand in clusters: is it beneficial?

    Publication Year: 2004 , Page(s): 28 - 35
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1761 KB) |  | HTML iconHTML  

    The Sockets Direct Protocol (SDP) had been proposed recently in order to enable sockets based applications to take advantage of the enhanced features provided by InfiniBand architecture. In this paper, we study the benefits and limitations of an implementation of SDP. We first analyze the performance of SDP based on a detailed suite of micro-benchmarks. Next, we evaluate it on two different real a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient architectural design of high performance microprocessors

    Publication Year: 2004
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1248 KB)  

    Summary form only given. Designing a high performance microprocessor is extremely time-consuming taking at least several years. An important part of this design effort is architectural simulation which defines the microarchitecture or the organization of the microprocessor. The reason why these simulations are so time-consuming is fourfold: (i) the architectural design space is huge; (ii) the numb... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2004 IEEE International Symposium on Performance Analysis of Systems and Software (IEEE Cat. No.04EX818)

    Publication Year: 2004
    Save to Project icon | Request Permissions | PDF file iconPDF (14 KB)  
    Freely Available from IEEE
  • Copyright

    Publication Year: 2004 , Page(s): ii
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE
  • General Chairs' message

    Publication Year: 2004 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (54 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Program Chair's message

    Publication Year: 2004 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (60 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • ISSPASS Reviewers

    Publication Year: 2004 , Page(s): vi
    Save to Project icon | Request Permissions | PDF file iconPDF (37 KB)  
    Freely Available from IEEE