Abstract:
It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resourc...Show MoreMetadata
Abstract:
It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high-performance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (Gather-Apply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs.
Date of Conference: 05-08 September 2017
Date Added to IEEE Xplore: 25 September 2017
ISBN Information:
Electronic ISSN: 2168-9253
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Small Clusters ,
- Big Graph ,
- Computational Model ,
- Distribution Process ,
- Graph Partitioning ,
- Input Graph ,
- Single Server ,
- Graph Processing ,
- Edge Caching ,
- Parallelization ,
- Memory Processes ,
- GB Memory ,
- Mode Of Communication ,
- PageRank ,
- Limited Memory ,
- Memory State ,
- Updated Values ,
- Dense Array ,
- Graph Size ,
- Memory Data ,
- Source Vertex ,
- Cache Hit ,
- Suitable Mode ,
- Density Modulation ,
- Edge List ,
- Single Vertex ,
- Tile Size ,
- Unweighted Graph ,
- Multiple Servers ,
- Single Node
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Small Clusters ,
- Big Graph ,
- Computational Model ,
- Distribution Process ,
- Graph Partitioning ,
- Input Graph ,
- Single Server ,
- Graph Processing ,
- Edge Caching ,
- Parallelization ,
- Memory Processes ,
- GB Memory ,
- Mode Of Communication ,
- PageRank ,
- Limited Memory ,
- Memory State ,
- Updated Values ,
- Dense Array ,
- Graph Size ,
- Memory Data ,
- Source Vertex ,
- Cache Hit ,
- Suitable Mode ,
- Density Modulation ,
- Edge List ,
- Single Vertex ,
- Tile Size ,
- Unweighted Graph ,
- Multiple Servers ,
- Single Node
- Author Keywords