Graph algorithms are becoming increasingly important for biology, transportation, business intelligence, and a wide range of commercial workloads. Most graph algorithms stress to the limit various architectural aspects of conventional machines. The memory access patterns are irregular, with little spatial locality and data reuse. The amount of computation per loaded byte is very small, typically involving bit manipulation; pointer-chasing is often the norm. Likewise, the generated network traffic comprises small packets that are sent to random destinations at a very high messaging rate. With our recent winning Graph 500 submissions in November 2010, June 2011, and November 2011, we have demonstrated the versatility of the IBM Blue Gene® family of supercomputers and the possibility of using them to parallelize demanding data-intensive applications. In this paper, we describe the algorithmic techniques that we used to map the Graph 500 breadth-first search (BFS) exploration on the IBM Blue Gene®/Q, achieving a performance of 254 billion traversed edges per second.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.