Skip to Main Content
Finding the all-pairs shortest-paths on a large graph is a fundamental problem in many practical applications such as bioinformatics, internet node traffic and network routing. In this paper, we present the designs of two efficient parallel algorithms for many-core GPUs using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using the global scope of the GPU's global memory, coalescing the global memory reads and writes, and avoiding on-chip shared memory bank conflicts, we are able to achieve a large performance benefit with a speed-up of 2,500x on a desktop computer in comparison with a single core program. Our algorithms are scalable, which can handle graphs with size larger than the memory available on the GPUs and when multiple GPUs are added into the system.