Skip to Main Content
The architectures which support modern supercomputing machinery are as diverse today, as at any point during the last twenty years. The variety of processor core arrangements, threading strategies and the arrival of heterogeneous computation nodes are driving modern-day solutions to petaflop speeds. The increasing complexity of such systems, as well as codes written to take advantage of the new computational abilities, pose significant frustrations for existing techniques which aim to model and analyze the performance of such hardware and software. In this paper we demonstrate the use of post-execution analysis on trace-based profiles to support the construction of simulation-based models. This involves combining the runtime capture of call-graph information with computational timings, which in turn allows representative models of code behavior to be extracted. The main advantage of this technique is that it largely automates performance model development, a burden associated with existing techniques. We demonstrate the capabilities of our approach using both the NAS Parallel Benchmark suite and a real-world supercomputing benchmark developed by the United Kingdom Atomic Weapons Establishment. The resulting models, developed in less than two hours per code, have a good degree of predictive accuracy. We also show how one of these models can be used to explore the performance of the code on over 16,000 cores, demonstrating the scalability of our solution.