GCN Inference Acceleration using High-Level Synthesis | IEEE Conference Publication | IEEE Xplore

GCN Inference Acceleration using High-Level Synthesis


Abstract:

GCN (Graph Convolutional Network) has become a promising solution for many applications, such as recommendation systems, social data mining, etc. Many of these applicatio...Show More

Abstract:

GCN (Graph Convolutional Network) has become a promising solution for many applications, such as recommendation systems, social data mining, etc. Many of these applications requires low latency GCN inference.In this paper, we provide a case study of a GCN inference acceleration on FPGA. We explore high-level synthesis programming model to achieve low-latency inference. First, we propose a partition-centric mapping strategy to map the execution tasks of GCN onto FPGA to exploit data reuse, which reduces external memory access overhead. Second, we provide HLS-based kernel design with improved memory performance and achieve massive data parallelism. Third, we perform design space exploration to facilitate feasible pre-placement which avoids potential Place-and-Route (PnR) failures. We evaluate our design on a state-of-the-art FPGA platform using three commonly used datasets: Reddit, Yelp and Amazon-2M. We compare our design with two state-of-the-art libraries PyTorch-Geometric (PyG) and Deep Graph Library (DGL) running on high-end CPU and GPU by evaluating their latency and energy efficiency to perform full-batch GCN inference on a two-layer Vanilla-GCN model. Compared with PyG CPU version, our design reduces the latency by 59.95× and is 96.22× more energy efficient on the average. Compared with DGL, our design achieves 2.9 × –6.4× speedup and is 5.87× more energy efficient compared with the CPU version. Compared with the DGL GPU version, although the latency of our design is 1.67 × –2.5× that of DGL GPU, our design is 1.8× more energy efficient.
Date of Conference: 20-24 September 2021
Date Added to IEEE Xplore: 01 December 2021
ISBN Information:

ISSN Information:

Conference Location: Waltham, MA, USA

Funding Agency:


I. Introduction

Graph Convolutional Network (GCN) has become popular solutions for many cloud-based applications, such as e-commerce [1] and recommendation systems [2]. Most GCN applications like recommendation systems are deployed on cloud. To achieve real-time performance, GCN acceleration has been studied on application-specific integrated circuit (ASIC) [3] and GPU platform [4]. FPGAs in the cloud become a promising solution in terms of performance, energy efficiency and flexibility. There are several challenges of deploying GCN on cloud-based FPGAs: (1) Heterogeneity of GCN workload: There are two major computation kernels in GCN [5]: aggregation and transformation. The aggregation kernel is used for graph traversal, and involves large number of irregular memory accesses. On the other hand, the transformation kernel involves regular neural network computation, such as multilayer perceptron (MLP). Thus, GCN acceleration needs to efficiently utilize external memory bandwidth as well as achieve massive computation parallelism. (2) Time to market: While GCNs are widely used, their models evolve rapidly [6]–[8]. RTL-based accelerators [3], [9] are hard to adapt to new GCN models and require significant development effort. HLS-based kernel design can be easily adapted to evolving GCN models, but requires careful optimizations to achieve high performance. (3) Architectural constraints: FPGAs contain massive on-chip resources. They are suitable for GCN acceleration, which requires massive memory bandwidth and computation parallelism. However, state-of-the-art FPGAs usually consist of multi-die with limited inter-die wire connections. The on-chip resources, such as memory ports, block RAMs and DSPs are unevenly distributed into different dies [10]. Thus, placing a large design of GCN on state-of-the-art FPGAs frequently causes PnR failures and timing violations.

Contact IEEE to Subscribe

References

References is not available for this document.