I. Introduction
In the last decade, artificial intelligence (AI) software, especially software based on deep neural networks (DNN), has attracted much attention and made a significant influence [1]. Currently, AI software, with DNN as representatives, is recognized as an emerging type of software artifact (sometimes known as “software 2.0” [2]). Notably, the size of DNN-based AI software has increased rapidly in recent years (mostly because of a trained deep neural network model). For instance, a state-of-the-art model of computer vision contains more than 15 billion parameters [3]. A recent natural language model, GPT-3, is even bigger, surpassing 175 billion parameters; this situation requires nearly 1TB of space to store only the model [4]. Such a big model hinders realistic applications such as autonomous driving when the software is required to be deployed in resource-restricted devices such as wearable devices or edge nodes. To this end, a new branch is derived from the traditional area of software compression [5], [6], called AI software compression (especially DNN model compression
In the rest of the article, without incurring ambiguity, we use ‘model compression’ to indicate ‘DNN model compression in AI software’ for brevity.
), and has attracted a lot of research interest.