Skip to Main Content
In cutting-edge CPU/GPU hybrid clusters, such as Tianhe-1A, the aggregate CPU computing capability may amount to up to 1/3 of the aggregate GPU computing capability. It thus goes without saying that the CPUs and GPUs should jointly carry out the computational work. However, to effectively and simultaneously use both the hardware components requires great care when developing the parallel implementations. The challenges include (1) finding a balanced division of the workload between the CPU and GPU sides, and (2) hiding various overheads by overlapping computations with CPU-GPU data transfers and/or MPI communications. We study these issues in the context of real-world sedimentary basin simulations. Numerical experiments show that an appropriately devised CPU-GPU hybrid implementation is able to handle a global mesh resolution of 131,072*131,072, and a double-precision rate of 62 TFlops is achieved by using 1024 GPUs and 12288 CPU cores on Tianhe-1A. Such an extreme computing capability will be of great importance for carrying out high-resolution and continental-scale stratigraphic simulations in future.