Loading web-font TeX/Math/Italic
CSCMAC - Cyclic Sparsely Connected Neural Network Manycore Accelerator | IEEE Conference Publication | IEEE Xplore

CSCMAC - Cyclic Sparsely Connected Neural Network Manycore Accelerator


Abstract:

This paper presents an energy-efficient, domain-specific manycore accelerator also referred to as the “CSCMAC” - Cyclic Sparsely Connected Neural Network Manycore Acceler...Show More

Abstract:

This paper presents an energy-efficient, domain-specific manycore accelerator also referred to as the “CSCMAC” - Cyclic Sparsely Connected Neural Network Manycore Accelerator, which effectively maps and executes deep neural networks (DNNs) compressed with cyclic sparsely connected (CSC) architectures. CSC layers are architectures that structurally compress and sparsify DNNs, which can reduce the memory footprint of fully connected (FC) layers from O(N^{2}) to O(N\log N) with respect to layers nodes, and is shown to be hardware implementable-friendly. We implement CSC layers for inference on a manycore unit, take advantage of their cyclic architecture, and show that their implementation in software even for a parallel-computing processor is affable. To further take advantage of their implementation simplicity, we propose customized instructions for the manycore that fuse frequently used sequences of machine codes and evaluate the optimization gained by the customization. Our experimental results using a LeNet300100 on MNIST and a Multi-Layer Perceptron (MLP) on Physical Activity Monitoring indicate that by replacing FC layers with CSC layers, we can achieve 46\times and 6\times compression respectively within a margin of 2% accuracy loss. A 64-cluster architecture of the CSCMAC is fully placed and routed using 65\mathrm{nm}, TSMC CMOS technology. The layout of each cluster occupies an area of 0.73\ mm^{2} and consumes 230.2 \mathrm{mW} power at 980 MHz clock frequency. Our proposed CSCMAC achieves 1.48\times higher throughput and 1.49\times lower energy compared to its equivalent predecessor manycore (PENC). Also, the CSCMAC achieves 85\times higher throughput and consumes 66.4\times lower energy compared to CPU implementation of the NVIDIA Jetson TX2 platform.
Date of Conference: 25-26 March 2020
Date Added to IEEE Xplore: 09 July 2020
ISBN Information:
Print on Demand(PoD) ISSN: 1948-3287
Conference Location: Santa Clara, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.