1. INTRODUCTION
Whilst deep neural networks (DNNs) have achieved remarkable success in computer vision, most of these well-performed models are difficult to deploy on edge devices in practical scenarios due to the high computational costs. To alleviate this, light-weight DNNs have been investigated a lot. The typical approaches mainly include parameter quantization [1], network pruning [2], knowledge distillation (KD) [3], etc. Among them, the KD topic has gained increasing popularity in various vision tasks due to its simplicity to be integrated into other model compression pipelines.