Skip to Main Content
Profile-guided optimization possesses huge potential to save costs for datacenters. Hardware performance monitoring units enable profiling with negligible overhead and they have been proven to be effective to help programmers find code regions to optimize by monitoring datacenter applications continuously on live traffic. However, these hardware features are inflexible and often buggy, limiting the types of data that can be gathered. Instrumentation-based profiling can complement or replace hardware functionality by providing more flexible and targeted information gathering. Unfortunately, the overhead of existing instrumentation mechanisms prevents their use in production runs. In order to be used in datacenters, we need a profiling mechanism to impose overheads of less than a few percent, in terms of both throughput and latency, while still generating meaningful profile data. This paper presents instant profiling, an instrumentation sampling technique using dynamic binary translation. Instead of instrumenting the entire execution, instant profiling periodically interleaves native execution and instrumented execution according to configurable profiling duration and frequency parameters. It further reduces the latency degradation of initial profiling phases by pre-populating a software code cache. We evaluate the performance and effectiveness of this new profiling technique on the SPEC CINT2006 benchmark suite and two datacenter application benchmarks. We show that it is well-suited for deployment to datacenters by incurring less than 6% slowdown and 3% computational overhead on average.