A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm on both AMD and Intel platforms, and a single GPU is observed to perform up to 10X faster than a quad-core CPU socket, a 40X speedup with respect to a single core. The code is shown to scale well when executed on multiple GPUs, which makes the port to CUDA valuable even when compared to parallel CPU implementations.
Published in:
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Date of Conference: 26-30 Sept. 2011