In order to understand application-level power/performance tradeoffs on current computer systems, runtime monitoring capabilities are needed. Specifically, very fine-grained monitoring capabilities are needed to gain detailed insights on power and performance behavior. Performing fine-grained application-level characterizations not only helps fine-tune application code, but it also increases the chances to detect optimization opportunities for improving next-generation systems. In this paper, we describe a new experimental technique to perform automatic fine-grained power and performance characterization of applications on the IBM Blue Gene®/Q platform. We use it to perform high-resolution measurements and attendant characterizations of key benchmarks for high-performance computing systems: the Tier-1 Sequoia suite and Linpack. The characterization shows that these benchmarks exhibit large time periods in which the memory and network resources are underutilized. We quantify these periods to predict the performance gains of shifting power from the underutilized resources (the network and the memory) to the processor. We explore potential improvements in energy efficiency if power-saving and shifting mechanisms are implemented in future generation systems.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.