Abstract:
The pipeline is an efficient solution to boost performance in non-volatile memory based computing in memory (nvCIM) convolution neural network (CNN) accelerators. However...Show MoreMetadata
Abstract:
The pipeline is an efficient solution to boost performance in non-volatile memory based computing in memory (nvCIM) convolution neural network (CNN) accelerators. However, the previous works seldom focus on pipeline optimization from the perspective of the whole system, especially overlooking the effect of buffer access. In this work, we propose a high-performance NVM-based CNN accelerator with a balanced pipeline design, which takes account of both the macro computing and the buffer access. At the operator level, a matrix-based weight mapping method is proposed to reduce buffer access delay. At the macro level, decoupled access and execution design is introduced to shorten the single-layer latency. At the system level, a hybrid inter/intra-tile design is presented to balance the overall latency across CNN layers. With the collaboration among three methods, we construct a well-balanced pipeline for the nvCIM accelerator at a smaller hardware cost. Experiments show that our pipeline design can achieve 3.7х, 7.5х, and 3.5х throughput improvement for recognition of ImageNet with ResNet18, VGG19, and ResNet34 models, respectively.
Published in: 2023 60th ACM/IEEE Design Automation Conference (DAC)
Date of Conference: 09-13 July 2023
Date Added to IEEE Xplore: 15 September 2023
ISBN Information: