I. Introduction
Over the years, internet-of-things (IoTs) have become increasingly intelligent by employing deep neural network (DNN) models for learning-based processing. Due to the stringent resource constraints of such edge devices, the DNN models used in them must be small. To address this problem, various DNN model compression techniques (e.g., weight pruning [1], quantization [2], knowledge distillation [3]) and compact DNN architectures [4] were proposed for edge computing in the literature.