Loading [MathJax]/extensions/MathZoom.js
Hardware-Aware Bayesian Neural Architecture Search of Quantized CNNs | IEEE Journals & Magazine | IEEE Xplore

Hardware-Aware Bayesian Neural Architecture Search of Quantized CNNs


0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
In this letter, we present a NAS method to search for efficient CNN architectures that require moderate computational resources at inference. The idea is to first learn a...

Abstract:

Advances in neural architecture search (NAS) now provide a crucial assistance to design hardware-efficient neural networks (NNs). This letter presents NAS for resource-ef...Show More

Abstract:

Advances in neural architecture search (NAS) now provide a crucial assistance to design hardware-efficient neural networks (NNs). This letter presents NAS for resource-efficient, weight-quantized convolutional NNs (CNNs), under computational complexity constraints (model size and number of computations). Bayesian optimization is used to efficiently search for traceable CNN architectures within a continuous embedding space. This embedding is the latent space of a neural architecture autoencoder, regularized with a maximum mean discrepancy penalization and a convex latent predictor of parameters. On CIFAR-100, and without quantization, we obtain 75% test accuracy with less than 2.5M parameters and 600M operations. NAS experiments on STL-10 with 32, 8, and 4 bit weights outperform some high-end architectures while enabling drastic model size reduction (6 Mb–840 kb). It demonstrates our method’s ability to discover lightweight and high-performing models, while showcasing the importance of quantization to improve the tradeoff between accuracy and model size.
0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
In this letter, we present a NAS method to search for efficient CNN architectures that require moderate computational resources at inference. The idea is to first learn a...
Published in: IEEE Embedded Systems Letters ( Volume: 17, Issue: 1, February 2025)
Page(s): 42 - 45
Date of Publication: 26 July 2024

ISSN Information:


I. Introduction

Compact neural networks (NNs) should run on low-power devices whose memory and computational complexity are limited [1], [2]. To provide hardware-efficient models, neural architecture search (NAS) can take into account estimates of latency and energy [3], [4], [5], [6], parameter count [7], [8], [9], number of operations [7], [8], [10], as well as weight [11], [12] and activation [13], [14], [15] bit widths. Indeed, model quantization enables drastic memory savings while maintaining remarkable performance, especially with quantization aware training (QAT) [16]. This letter thus introduces LBQ-NAS for latent Bayesian quantized NAS, to optimize the architecture of convolutional NN (CNN) image classifiers with quantized weights, by combining hardware-aware cost functions and QAT. First, LBQ-NAS trains a Wasserstein autoencoder (WAE) [17] to encode and decode CNN architectures through a low dimensional and continuous latent space embedding (LSE). Second, Bayesian optimization (BO) is used in order to discover efficient weight-quantized models in this LSE (Fig. 1), with a cost function accounting for accuracy, parameter count and operation count. Finally, we perform an architecture retraining (AR) phase with the best candidates to achieve competitive performance.

Given a pretrained LSE, our NAS works as follows: at iteration i, is decoded into a neural architecture , that undergoes QAT. Its validation accuracy , parameter count #Pi, and operation count #Opi are then measured. Finally, these metrics are aggregated in the cost function , whose value is fed to the BO algorithm to determine the next point .

Contact IEEE to Subscribe

References

References is not available for this document.