A Kind of Wireless Modulation Recognition Method Based on DenseNet and BLSTM

Deep learning has achieved remarkable results in various fields, such as image recognition and classification. However, in the recognition of radio modulation methods, deep learning for different modulation methods of radio signal recognition results are not satisfactory. In this paper, we propose to use densely connected convolutional networks combined with bidirectional recurrent neural networks to identify the radios of 11 different modulation methods. The final results show that our method is more accurate than the traditional convolution neural network in modulation recognition.


I. INTRODUCTION
With the development of communication technology, wireless communication environment is becoming more and more complex. There are many communication signals with different modulation methods in the air, and how to identify and monitor the modulation of these unknown signals arouses much interest. Modulation recognition technology has been widely used in military and civil fields. For example, we can monitor enemy's communication radio stations by recognizing modulated signals in battle. We also can use modulation recognition for interference identification and monitoring spectrum in our daily life.
Signal modulation recognition can be divided into two categories: one is based on likelihood ratio decision theory [1], and the other is based on statistical pattern recognition [2]. The former has higher classification accuracy but higher computational complexity [3]. The latter has lower computational complexity and is easy to calculate and implement, but has lower classification accuracy [4].
Deep learning technology has been widely used in image processing [5] and natural language processing [6], [7]. In recent years, it has also been applied to radio modulation recognition. In [8], one-dimensional radio signal was transformed into spectrum image by STFT (Short Time Fourier Transform) and input into convolutional neural The associate editor coordinating the review of this manuscript and approving it for publication was Chi-Hua Chen . networks (CNN). In [9], the authors used phase and quadrature (IQ) samples to train CNN, and obtained good results in low SNR QAM signal classification. The long-term dependent features of LSTM are helpful to the feature extraction of wireless signals. However, with the increase of neural network layers, the number of parameters will increase and the recognition time will be prolonged.
To extract more time series features, the methods of broadening the width of neural network or increasing the depth of neural network are used in the above works, but the scale of the neural network is often limited by hardware conditions. In order to extract the deeper features of radio signal after modulation without over fitting, we propose a new network structure for wireless signal recognition, combined with DenseNet (Densely Connected Convolutional Networks) [10] and BLSTM (Binary LSTM). The former can deepen the depth of neural network with the property of anti-over fitting, and the latter can improve the classification accuracy due to its robustness. Finally, we test and prove the effectiveness of our strategy algorithm by using the RadioML2016.10a dataset. [11] II. RELATED WORKS A. DENSENET NEURAL NETWORK DenseNet is a kind of neural network structure proposed in the IEEE Conference on Computer Vision and Pattern Recognition in 2017. It has been widely used in many fields. For example, Zhang et al. have applied it in remote sensing classification scenes with high classification accuracy [12]. Liang et al. used this technique for predicting glioma genotypes [13], achieving end-to-end training with high accuracy. Yang et al. used this technique for JPEG steganography [14], which reduced the training parameters. Song et al. used this technology to finger vein recognition [15], which had better anti-noise ability. Hou and Zheng used DenseNet neural network for modulation classification [16], but the network depth was still small. This paper expands the depth of DenseNet neural network to take full advantages of the DenseNet neural network.

B. BLSTM
BLSTM [17] is a bi-directional neural network proposed in 2015, and also plays a very important role in many fields. For example, Chen et al. applied this technology to sentence type classification to improve emotional analysis [18]. Experiments have proved that it has good performance. Luo et al. used it for document-level chemical named entity recognition [19], which makes the neural network model more robust. Liang et al. used this technology for facial expression recognition to better learn space-time information [20]. Stymne et al. used it in cross-language pronoun prediction with very high accuracy [21]. Reference [22] applied it to positive blood culture detection, which can better capture space-time information for detection.

C. CNN BASED MODULATION RECOGNITION
Fan et al. used CNN for radio modulation recognition [23], which has good robustness. Rajendran et al. used a two-layer LSTM structure to process and extract time information [24]. West and Tim O'Shea used CLDNN (Compute Library for Deep Neural Networks) [25], which can not only extract features from data of multiple time points and reduce invalid data, but also deal with time-related data.
Zhu et al. used an improved AlexNet neural network [26], which achieved similar performance as the likelihood ratio judgment theory recognition method with strong robustness. Zhang et al. used HetNet neural network [27], which achieved a higher transmission rate without increasing the overhead of the system. Shah and Dang combined DNN and RBFN [28], which improved accuracy compared with K-NearestNeighbor. Peng et al. used GoogleNet neural network [29] proposed by Google, which had a certain improvement in classification accuracy compared with Support Vector Machine neural network. Based on CNN, Guo used the maximum pooled layer to increase the number of convolution layers, which had good robustness and accuracy [30]. Reference [31] combined semi-supervised and CNN network to identify radio modulation modes, which was also robust. Li, et al. used a deep learning network consisting of stacked autocoders and softMax regression for recognition [32], which still had good performances at low signal-to-noise ratio. However, none of the above methods used DenseNet, and the parameters in these training processes will be larger.
In this paper, we propose a new structure combined with DenseNet and BLSTM, and use it for radio modulation recognition, which solves the problem of lower classification accuracy.

III. NEURAL NETWORK MODEL A. CNN MODEL
CNN has the advantages of local perception, weight sharing and shift invariance, which can spatially share the local correlation of each adjacent layer. A basic hypothesis of CNN is that the input data are localized and shifted unchanged, while the wireless signal sampling data accord with it. The general architecture of the CNN is shown in Figure 1. Each convolution layer will extract different data features from the input data. Finally, the classification function outputs a vector of 1 * p, which represents the probability of P modulation methods for the input data. Radio modulation recognition can be regarded as a classification problem. We set the input signal as x and the output as y, x is a vector of 1 × m × n, y is a vector of 1 × p, representing p kinds of different modulation methods. A neural network is similar to a huge non-linear function, which can be infinitely near the real model by constantly adjusting the constant parameters of this function. Figure 2 is an example of the neural network with 4 layers in CNN. W l jk is the weight of the k-th neuron of layer l-1 connected to the j-th neuron of layer l, b l j is the bias of the j-th neuron of layer l, z l j is the input of the activation function of the j-th neuron of layer l, and a l j is the output of the activation function of the j-th neuron of layer l. We can express the network structure as follows: Equation 2 is for calculation layer by layer, where σ is activation function, assuming the output of the last layer is y 1 to y k . Then we use the softMax function as the output layer to calculate the probability of each category, and output y 1 to y p , which represents the probability of p kinds of modulation methods. Let s i represent the ith output of the softMax function, which can be expressed as, where e is a natural constant. Cross entropy loss function is used to process the output of softMax, and we can get the loss c of cross entropy as, The cross entropy is used to measure the difference between two probability distributions. The smaller its value is, the smaller the difference between the real value and the predicted value is, and the closer the probability distribution is. After the loss is obtained, the parameters can be updated.
We first get, Secondly the parameters w and b are updated with α as the learning rate: Then the weight bias of the upper layer is updated with the same method, which is the process of CNN back propagation.

B. DENSENET NEURAL NETWORK MODEL
To solve the problem of gradient vanishing caused by too deep network layers, we introduce the DenseNet neural network, which reduces the parameters by adding bypass multiplexing, so that the layers of the whole neural network are deep enough to extract enough features. Figure 3 shows the neural network structure of DenseNet. As can be seen in Figure 3, the input of each neural network layer comes from the output of all previous layers. The output of X 0 will be the input of H 0 , the output of X 0 and X 1 will be the input of H 1 , and so on. Each neural network layer will extract features from the input data, and the features will become more obvious with the increase of the depth of the layer. The traditional neural network improves the network performance by deepening the network hierarchy and widening the network structure. However, DenseNet greatly reduces the number of network parameters through feature reuse and bypass setting, and alleviates the problem of gradient vanishing to a certain extent. In the DenseNet neural network, if the output of layer i is represented by X i and the activation function is represented by H, X i can be represented by equation (10) below.
DenseNet makes the extracted data more complete through such information reuse, which is also the reason why DenseNet has a better effect in data feature extraction.

C. BLSTM
The traditional LSTM neural network is widely used to learn persistent features from time series data, which benefits from the special gate structure of LSTM showed in Figure 4. The core of LSTM is the neuron state, which is the segment C t−1 C t in Figure 4. LSTM consists of a forgetting gate, input gate and output gate. The forgetting gate f t determines which information is discarded or retained in the neuron state according to the information of h t−1 and x t . The input gate decides which information is updated through h t−1 and x t . h t−1 and x t will get new information through a tanh layer, which may be updated into the new neuron. The output gate decides which information is output to the neuron state according to h t−1 and x t . These three gates can be expressed from equations (11) to (16), where x i represents the input at the i-th time and h i represents the output at the i-th time.
Forgotting gate: Input gate: Output gate: Neuron state update: The traditional recurrent neural network (RNN) and LSTM neural network can only predict the output of the next time according to the time sequence information of the previous time. However, in some cases, the output of the current time is not only related to the previous state, but also to the future state. The basic idea of bi-directional RNN is that each training sequence has two recurrent neural networks (i,e. forward and backward) which are connected with an output layer. This structure provides some information to the output layer, including past and future context information for each point in the input sequence. Figure 5 shows a bidirectional RNN, in which the six weights (w1-w6) are reused at each time step. In bidirectional RNN, there are two hidden states: one is a positive sequence, as shown in the left to the right solid line in Figure 5; the other is a reverse sequence, as shown in the right to the left solid line in the figure. When calculating the state variables at time t, for positive sequence, it is easy to understand that the input is the hidden state of the previous time and the system input of the current time; for reverse sequence, the input is the hidden state of the next time and the system input of the current time. BLSTM is an optimized LSTM, which can train forward and backward time series respectively, and the output data can obtain context information.

IV. MODULATION RECOGNITION MODEL BASED ON DENSENET + BLSTM + DNN
In this section, we propose a neural network structure of DenseNet + BLSTM + DNN shown in Figure 6. The signal data first go through batch regularization and ReLu layer, then go through Dense layer in DenseNet, splice two BLSTM layers, and then use softmax function for classification.
The format of input data is batch, 1, 2, 128, and the parameter value of nb_filter is 10. Then, the data are input to a 1*5 convolution layer, and the output of the convolution layer will be processed by batch normalization. The convolution layer is followed by four DenseBlocks, and a Transition-Layer is used in the middle of each two DenseBlocks. There are 10 DenseLayers in DenseBlock, and the DenseLayers in DenseBlock are connected according to Equation (10). A DenseLayer consists of two convolution layers. Before each convolution layer, batch normalization is used to process the input data. The data flow direction of DenseLayer is shown in Figure 7. In DenseLayer, the input data first go through batch regularization, then the LeakyReLU activation function is used, then a convolution neural network is used, then the batch regularization and LeakyReLU activation functions are used, and then the output is obtained from a convolution neural network.

V. EXPERIMENTS AND EVALUATION A. CLASSIFICATION ACCURACY
The experiments were done by using RML2016.10a_dict dataset, to evaluate the classification accuracy of different models such as CLDNN, inception, ResNet in Reference [33]. From the experimental results shown in Figure 10, it can be seen that the proposed scheme (DenseNet + BLSTM + DNN) has a high classification accuracy, approximately 5% more than CLDNN, 10% more than CNN, 15% more than inception, 35% more than ResNet, and slightly higher accuracy at low SNR than other models.
Here, we will analyze the classification accuracy of the proposed scheme and CLDNN for different modulation modes under different SNR. The confusion matrices when SNR = 18dB and SNR = 0dB are shown in Figure 8 and Figure 9 respectively. It can be seen in the figures that the classification accuracy of AM-SSB, BPSK, CPFSK and GFSK is above 90%, and the accuracy of the proposed scheme is slightly higher than that of CLDNN. The classification accuracy of other modulation modes is shown in Table 1.
The confusion matrices of SNR = −14dB are shown in Figure 11. We can see that neither method can recognize the modulation mode correctly, because it is difficult to recognize the features of noise and signal at low SNR.

B. THE INFLUENCE OF DENSENET CONVOLUTION KERNEL SIZE
In the network structure shown in Figure 6, DenseNet can reduce the parameters and increase the network depth level. In this section, we will analyze the influence of different convolution kernel sizes on the classification accuracy of DenseNet, and design four kinds of DenseNet neural network models with different convolution kernel sizes with fixed network structures. The classification accuracy of neural networks is evaluated by using RML2016.10a_dict dataset under different SNR. In this experiment, six convolution kernels of different sizes are designed. Under the same network structure, the convolution kernels of DenseBlock in DenseNet are modified to 1*3, 1*5, 1*7, 1*9, 1*11 and 1*13, respectively.  The number of experiments is 200, epochs are set to 15, and Adam optimizer is used. Figure 12 shows the classification accuracy after 15 epoch networks convergence. It can be seen that at high SNR, the classification accuracy of 1*5 and 1*9 DenseNet convolution kernels is almost the same, reaching more than 70%, while the classification accuracy of other DenseNet convolution kernels is about 65%. When the SNR decreases from 5dB to −10dB, the difference of classification accuracy of different convolution kernel sizes gradually decreases, and the classification accuracy of 1*5 and 1*9 sizes is still slightly higher. At low SNR, the classification accuracy of 1*3 convolution  kernel is about 20%, while that of 1*5 and 1*9 convolution kernels is about 15%, and that of other convolution kernels is between 10% and 15%. Figure 13 shows the confusion matrices with SNR = 18dB. We further display the classification accuracy of different convolution kernel sizes in Table 2. From the table, we can see that different convolution kernel sizes lead to different classification accuracy for different modulation modes. Kernel size 1*3 has high accuracy for AM-DSB, kernel size 1*5 has high accuracy for AM-DSB, BPSK, GFSK and PAM4, and kernel size 1*9 has high accuracy for BPSK and CPFSK. Combining the results of Figure 13 with those of Figure 12, it can also be observed that the larger convolution kernel size does not lead to better accuracy. When the convolution kernels are 1*11 and 1*13, the classification accuracy decreases.

C. INFLUENCE OF BLOCK SIZE
The network structure of DenseNet is composed of multiple blocks. In this section, we will discuss the influence of  different numbers of blocks on classification accuracy. Four kinds of DenseNet blocks with different sizes are used in the experiment, and the classification accuracy is calculated under different SNR. Epoch is set to 15, the batch size is 200, and convolution kernel size of the dense block is 1 * 9. Figure 14 shows the comparison of the results after convergence of 15 epoch networks. It can be seen that with the increase of the number of blocks, the classification accuracy VOLUME 9, 2021   also shows an increasing trend. In the case of high SNR, the classification accuracy of the DenseNet neural network with 4 blocks is 5 to 10% higher than those with other numbers of blocks; in the case of low SNR, the classification accuracy of the DenseNet neural network with 4 blocks has little difference, and it is about 13%.
Further, the recognition results are listed in Table 3, and it can be seen that DenseNet neural network with 2 blocks has higher classification accuracy in AM-DSB, AM-SSB and BPSK, while the network with 4 blocks has higher classification accuracy in GPFSK, GFSK, PAM4, QAM64 and QPSK.

VI. CONCLUSION
In order to solve the problem that the classification accuracy of traditional radio modulation recognition methods is not high, the deep learning method is used to recognize radio modulation modes, and the neural network structure of DenseNet + BLSTM + DNN is proposed. In this combined structure, the signals first pass through a convolution layer and then access to a batch regularization layer and LeakyRelu layer, followed by a DenseNet neural network to deepen the neural network level through the bypass multiplexing of DenseNet. Then, the signals are input to two BLSTM layers to extract the time sequence information, and finally input to softMax for classification. By adding bypass multiplexing to reduce parameters, the depth of neural network is increased, and the deeper information is extracted.
Compared with the existing neural network models, the overall classification accuracy of our proposed scheme is improved by 8%. In addition, through the experimental we can observe that, by changing the convolution kernel size and the number of blocks, different modulation modes have different classification accuracy. In our future work, we will consider changing the convolution kernel size adaptively to recognize different modulation modes.