Skip to Main Content
Balancing the workload in parallel applications is a difficult task, even in conventional cases. Many computing cycles are wasted when the load is not evenly balanced across processing nodes. Global load balance analysis may determine that an application is well balanced, when in fact the application has hidden inefficiencies. In this paper, we consider the load balance of parallel applications which present unique challenges in the analysis process. We have performed trace analysis and simulation to demonstrate the existence of otherwise undiscovered performance issues. We also demonstrate that by collecting dynamic phase profiles, we are able to approximate the analysis results of trace analysis and simulation, and more accurately represent the performance behavior of complex parallel applications than through flat or callpath profiles alone.