Skip to Main Content
Computational fluid dynamics (CFD) applications have an ever-growing demand for the power of high performance computing (HPC) infrastructure. Many CFD simulations have benefited from newly-acknowledged GPU clusters. However, few of them have exploited both the CPU and the GPU computational resources within the heterogeneous HPC platforms. In this paper, we endeavor to demonstrate the approach of making large-scale CFD applications benefited from GPU clusters. Taking the NPB as an example, we implement several CFD kernels with our hybrid programming pattern MOC and perform them on the TianHe-1A supercomputer. Experimental results show that: (1) CFD applications can achieve significant performance improvement on GPU clusters, even for the memory-bounded ones like CG; (2) the embarrassingly parallel applications can scale well with the number of compute node; and (3) the overlap of data transfer through the PCI-E bus and kernel execution can greatly increase the performance and scalability of CFD applications.