Abstract:
ReRAM-based computing-in-memory (CiM) architecture has been considered a promising solution to high-efficiency neural network accelerator, by conducting in-situ matrix mu...Show MoreMetadata
Abstract:
ReRAM-based computing-in-memory (CiM) architecture has been considered a promising solution to high-efficiency neural network accelerator, by conducting in-situ matrix multiplications and eliminating the movement of neural parameters from off-chip memory to computing units. However, we observed specific features of graph convolutional network (GCN) tasks pose design challenges to implement a high-efficiency ReRAM GCN accelerator. The ultralarge input feature data in some GCN tasks incur massive data movements, the extremely sparse adjacency matrix and input feature data involve the valid computation, and the super-large adjacency matrix that exceeds available ReRAM capacity causes frequent expensive write operations. To address the above challenges, we propose TARe, a task-adaptive CiM architecture, which consists of a hybrid in-situ computing mode to support the input feature in crossbar computing, a compact mapping scheme for efficient sparse matrix computing, and a write-free mapping to eliminate write activities in the computations with the super-large adjacency matrix. Additionally, TARe is facilitated with a task adaptive selection algorithm to generate optimized design schemes for graph neural network (GNN) tasks that have various operand sizes and data sparsity. We evaluate TARe on 11 diverse GNN tasks and compare it with different design counterparts, and the results show that achieves 168.06 \times speedup and 10.95 \times energy consumption reduction on average over the baseline in common GCN workloads.
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 43, Issue: 9, September 2024)