Resource sharing and implementation of software stack for emerging multicore processors introduce performance and scaling challenges for large-scale scientific applications, particularly on systems with thousands of processing elements. Traditional performance optimization, tuning and modeling techniques that rely on uniform representation of computation and communication requirements are only partially useful due to the complexity of applications and underlying systems and software architecture. In this paper, we propose a workload modeling methodology that allows application developers to capture and represent hierarchical decomposition and distribution of their applications thereby allowing them to explore and identify optimal mapping of a workload on a target system. We demonstrate the proposed methodology on a Teraflops-scale fusion application that is developed using message-passing (MPI) programming paradigm. Using our analysis and projection results, we obtain insight into the performance characteristics of the application on a quad-core system and also identify optimal mapping on a Teraflops-scale platform.
Published in:
Computer Architecture and High Performance Computing, 2008. SBAC-PAD '08. 20th International Symposium on
Date of Conference: Oct. 29 2008-Nov. 1 2008