Skip to Main Content
Due to the memory intensive workload and the erratic access pattern, irregular graph algorithms are notoriously hard to implement and optimize for high performance on distributed-memory systems. Although the PGAS paradigm proposed recently improves ease of programming, no high performance PGAS implementation of large-scale graph analysis is known. We present the first fast PGAS implementation of graph algorithms for the connected components and minimum spanning tree problems. By improving memory access locality, compared with the naive implementation, our implementation exhibits much better communication efficiency and cache performance on a cluster of SMPs. With additional algorithmic and PGASspecific optimizations, our implementation achieves significant speedups over both the best sequential implementation and the best single-node SMP implementation for large, sparse graphs with more than a billion edges.