Journals & Magazines >IEEE Transactions on Parallel... >Volume: 35 Issue: 10

Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog...Show More

Metadata

Abstract:

Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog in-memory computing (AIMC) is a well-known AI inference solution that overcomes computational bottlenecks by performing matrix-vector multiplication operations (MVMs) in constant time. However, the tiles of AIMC-based accelerators are limited by the number of weights they can hold. State-of-the-art research often sizes neural networks to AIMC tiles (or vice-versa), but does not consider cases where AIMC tiles cannot cover the whole network due to lack of tile resources or the network size. In this work, we study the trade-offs of available AIMC tile resources, neural network coverage, AIMC tile proximity to compute resources, and multi-core load balancing techniques. We first perform a study of single-layer performance and energy scalability of AIMC tiles in the two most typical AIMC acceleration targets: dense/fully-connected layers and convolutional layers. This study guides the methodology with which we approach parameter allocation to AIMC tiles in the context of large edge neural networks, both where AIMC tiles are close to the CPU (tightly-coupled) and cannot share resources across the system, and where AIMC tiles are far from the CPU (loosely-coupled) and can employ workload stealing. We explore the performance and energy trends of six modern CNNs using different methods of load balancing for differently-coupled system configurations with variable AIMC tile resources. We show that, by properly distributing workloads, AIMC acceleration can be made highly effective even on under-provisioned systems. As an example, 5.9x speedup and 5.6x energy gains were measured on an 8-core system, for a 41% coverage of neural network parameters.

Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 35, Issue: 10, October 2024)

Page(s): 1780 - 1795

Date of Publication: 02 August 2024

ISSN Information:

DOI: 10.1109/TPDS.2024.3437657

Funding Agency:

Contents

References is not available for this document.

Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?