Abstract:
The proposed saturation RRAM for in-memory computing of a pre-trained Convolutional Neural Network (CNN) inference imposes a limit on the maximum analog value output from...Show MoreMetadata
Abstract:
The proposed saturation RRAM for in-memory computing of a pre-trained Convolutional Neural Network (CNN) inference imposes a limit on the maximum analog value output from each bitline in order to reduce analog-to-digital (A/D) conversion costs. The proposed scheme uses term quantization (TQ) to enable flexible bit annihilation at any position for a value in the context of a group of weights values in RRAM. This enables a drastic reduction in the required ADC resolution while still maintaining CNN model accuracy. Specifically, we show that the A/D conversion errors after TQ have a minimum impact on the classification accuracy of the inference task. For instance, for a 64×64 RRAM, reducing the ADC resolution from 6 bits to 4 bits enables a 1.58× reduction in the total system power, without a significant impact to classification accuracy.
Date of Conference: 22-28 May 2021
Date Added to IEEE Xplore: 27 April 2021
Print ISBN:978-1-7281-9201-7
Print ISSN: 2158-1525