Input-Sensitive Profiling | IEEE Journals & Magazine | IEEE Xplore

Abstract:

In this article we present a building block technique and a toolkit towards automatic discovery of workload-dependentperformance bottlenecks. From one or more runs of a p...Show More

Abstract:

In this article we present a building block technique and a toolkit towards automatic discovery of workload-dependentperformance bottlenecks. From one or more runs of a program, our profiler automatically measures how the performance of individual routines scales as a function of the input size, yielding clues to their growth rate. The output of the profiler is, for each executed routine of the program, a set of tuples that aggregate performance costs by input size. The collected profiles can be used to produceperformance plots and derive trend functions by statistical curve fitting techniques. A key feature of our method is the ability toautomatically measure the size of the input given to a generic code fragment: to this aim, we propose an effective metric for estimating the input size of a routine and show how to compute it efficiently. We discuss several examples, showing that our approach can reveal asymptotic bottlenecks that other profilers may fail to detect and can provide useful characterizations of the workload and behavior of individual routines in the context of mainstream applications, yielding several code optimizations as well as algorithmic improvements. To prove the feasibility of our techniques, we implemented a Valgrind tool called aprof and performed an extensive experimentalevaluation on the SPEC CPU2006 benchmarks. Our experiments show that aprof delivers comparable performance to otherprominent Valgrind tools, and can generate informative plots even from single runs on typical workloads for mostalgorithmically-critical routines.
Published in: IEEE Transactions on Software Engineering ( Volume: 40, Issue: 12, 01 December 2014)
Page(s): 1185 - 1205
Date of Publication: 17 July 2014

ISSN Information:

Funding Agency:


1 Introduction

Performance profiling plays a crucial role in software development, allowing programmers to test the efficiency of an application and discover possible performance bottlenecks. Traditional profilers associate performance metrics to nodes or paths of the control flow or call graph by collecting runtime information on specific workloads [2], [3], [4], [5]. These approaches provide valuable information for studying the dynamic behavior of a program and guiding optimizations to portions of the code that take most resources on the considered inputs. However, they may fail to characterize how the performance of a program scales as a function of the input size, which is crucial for the efficiency and reliability of software. Seemingly benign fragments of code may be fast on some testing workloads, passing unnoticed in traditional profilers, while all of a sudden they can become major performance bottlenecks when deployed on larger inputs. As an anecdotal example, we report a story [6] related to the COSMOS circuit simulator, originally developed by Randal E. Bryant and his colleagues at CMU [7]. When the project was adopted by a major semiconductor manufacturer, it underwent a major performance tuning phase, including a modification to a function in charge of mapping signal names to electrical nodes, which appeared to be especially time-consuming: by just hashing on bounded-length name prefixes rather than on entire names, the simulator became faster on all benchmarks. However, when circuits later grew larger and adopted hierarchical naming schemes, many signal names ended up sharing long common prefixes, and thus hashed to the same buckets. As a result, the simulator startup time became intolerable, taking hours for what should have required a few minutes. Identifying the problem—introduced several years before in an effort to optimize the program-required several days of analysis. There are many other examples of large software projects where this sort of problems occurred [8].

Contact IEEE to Subscribe

References

References is not available for this document.