1 Introduction
Deep neural networks (DNNs) have become the fundamental infrastructure in today's artificial intelligence (AI) systems. Different types of tasks have typically involved different types of networks. For example, multi-layer perceptron (MLP) or the fully connected (FC) network is the classical type of neural network, which is composed of multiple linear layers and nonlinear activations stacked together [1], [2]. Convolutional neural networks (CNNs) introduce convolutional layers and pooling layers for processing shift-invariant data such as images [3], [4]. And recurrent neural networks (RNNs) utilize recurrent cells to process sequential data or time series data [5], [6]. Transformer is a new type of neural network. It mainly utilizes the self-attention mechanism [7], [8] to extract intrinsic features [9] and shows great potential for extensive use in AI applications