I. Introduction
Compact neural networks (NNs) should run on low-power devices whose memory and computational complexity are limited [1], [2]. To provide hardware-efficient models, neural architecture search (NAS) can take into account estimates of latency and energy [3], [4], [5], [6], parameter count [7], [8], [9], number of operations [7], [8], [10], as well as weight [11], [12] and activation [13], [14], [15] bit widths. Indeed, model quantization enables drastic memory savings while maintaining remarkable performance, especially with quantization aware training (QAT) [16]. This letter thus introduces LBQ-NAS for latent Bayesian quantized NAS, to optimize the architecture of convolutional NN (CNN) image classifiers with quantized weights, by combining hardware-aware cost functions and QAT. First, LBQ-NAS trains a Wasserstein autoencoder (WAE) [17] to encode and decode CNN architectures through a low dimensional and continuous latent space embedding (LSE). Second, Bayesian optimization (BO) is used in order to discover efficient weight-quantized models in this LSE (Fig. 1), with a cost function accounting for accuracy, parameter count and operation count. Finally, we perform an architecture retraining (AR) phase with the best candidates to achieve competitive performance.
Given a pretrained LSE, our NAS works as follows: at iteration i, is decoded into a neural architecture , that undergoes QAT. Its validation accuracy , parameter count #Pi, and operation count #Opi are then measured. Finally, these metrics are aggregated in the cost function , whose value is fed to the BO algorithm to determine the next point .