Abstract:
Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog...Show MoreMetadata
Abstract:
Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog in-memory computing (AIMC) is a well-known AI inference solution that overcomes computational bottlenecks by performing matrix-vector multiplication operations (MVMs) in constant time. However, the tiles of AIMC-based accelerators are limited by the number of weights they can hold. State-of-the-art research often sizes neural networks to AIMC tiles (or vice-versa), but does not consider cases where AIMC tiles cannot cover the whole network due to lack of tile resources or the network size. In this work, we study the trade-offs of available AIMC tile resources, neural network coverage, AIMC tile proximity to compute resources, and multi-core load balancing techniques. We first perform a study of single-layer performance and energy scalability of AIMC tiles in the two most typical AIMC acceleration targets: dense/fully-connected layers and convolutional layers. This study guides the methodology with which we approach parameter allocation to AIMC tiles in the context of large edge neural networks, both where AIMC tiles are close to the CPU (tightly-coupled) and cannot share resources across the system, and where AIMC tiles are far from the CPU (loosely-coupled) and can employ workload stealing. We explore the performance and energy trends of six modern CNNs using different methods of load balancing for differently-coupled system configurations with variable AIMC tile resources. We show that, by properly distributing workloads, AIMC acceleration can be made highly effective even on under-provisioned systems. As an example, 5.9x speedup and 5.6x energy gains were measured on an 8-core system, for a 41% coverage of neural network parameters.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 35, Issue: 10, October 2024)