A gyrokinetic toroidal five dimensional Eulerian code GT5D [Y.Idomura et. al., Comput. Phys. Commun 179, 391 (2008)] is ported on five advanced massively parallel plat- forms and comprehensive benchmark tests are performed. Sustained performances of the GT5D kernel and their dependency on the memory bandwidth are discussed. By using a novel multi-layer hybrid parallelization model, the size of MPI communicators can be suppressed below ~ 100 up to ~ 107 cores, and the scalability is improved on multi-core platforms. In strong scaling tests, a good scalability is confirmed up to several thousands cores on every platforms, and the maximum sustained performance of ~ 19.4 Tflops (the peak ratio of ~ 10.1%) is achieved using 16384 cores of BX900.
Published in:
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Date of Conference: 12-18 Nov. 2011