I. Introduction
Graph processing mines hidden information in graphs by traversing object connections. It has important uses in numerous practical applications including genomics [1], social network [2], machine learning [3], etc. The irregular structure of graphs poses great challenges to the performance of graph processing. These challenges are mainly reflected in two aspects: (1) The irregular connections in graphs bring irregular memory accesses, and 2) Graph processing typically behaves with a low computation-to-communication ratio. That is, during graph processing, the processor may access any vertex and only perform a small amount of computation. These inherent characteristics make it inefficient on traditional computing architectures such as central processing units (CPUs) and graphics processing units (GPUs). Therefore, dedicated graph processing accelerators have been proposed to solve this dilemma (e.g., [4], [5], [6], [7], [8], [9]).