Skip to Main Content
Finding the all-pairs shortest-paths on a large graph is a fundamental problem in many practical applications such as bioinformatics, internet node traffic and network routing. In this paper, we present the designs of two efficient parallel algorithms for many-core GPUs using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using the global scope of the GPU's global memory, coalescing the global memory reads and writes, and avoiding on-chip shared memory bank conflicts, we are able to achieve a large performance benefit with a speed-up of 2,500x on a desktop computer in comparison with a single core program. Our algorithms are scalable, which can handle graphs with size larger than the memory available on the GPUs and when multiple GPUs are added into the system.
Date of Conference: 12-14 April 2010