Convolutional Neural Network-Based Radar Jamming Signal Classification With Sufficient and Limited Samples

Jamming is a big threat to radar system survival and anti-jamming is a part of the solution. The classification of radar jamming signal is the first step toward to anti-jamming. Recently, as an important part of deep learning, convolutional neural network (CNN) based methods have shown their capability in discriminant feature extraction and accurate classification. In this study, in order to harness the powerfulness of deep learning, CNN based methods are proposed to classify radar jamming signal acting on pulse compression radar. Specifically, a 1D-CNN is designed for radar jamming signal classification under the condition of sufficient training samples. Furthermore, due to the fact that the collection of sufficient training samples is time-consuming and expensive, a CNN-based siamese network is proposed for radar jamming signal classification to deal with the issue of limited training samples. The experimental results with sufficient and limited training samples show that the CNN-based classification methods obtain good classification performance in terms of classification accuracy and show a huge potential for radar jamming signal classification.


I. INTRODUCTION
Radar is playing an important role since it is widely-used in civilian and military areas. Recently, electronic warfare has become one of the most important parts of modern warfare and radar is crucial to the victory of a war. In order to disturb the enemy radar system and influence its target detection, identification and tracking ability, various radar jamming techniques have been designed.
According to the different jamming mechanism, the jamming signals can be divided into suppression jamming and deception jamming [1]. Suppression jamming covers the target signal by transmitting high-power jamming signals. Among them, noise jamming is the most widely used [2], which can be divided into aiming jamming, blocking jamming, and sweeping jamming according to the ratio between the spectrum width of jamming and the passband of the receiver. Deception jamming actively transmits radio waves The associate editor coordinating the review of this manuscript and approving it for publication was Choon Ki Ahn . of a certain phase and frequency to the enemy radar, which is used to imitate the echo of the target, so that the enemy radar gets the wrong target information [3]. Typical deception jamming includes interrupted sampling repeater jamming (ISRJ) [4], distance deception jamming [5], dense false target jamming [6], etc.
At present, the increasingly complex electromagnetic environment and jamming technology seriously affect the survival and effectiveness of radar system. Therefore, radar anti-jamming technology has been developed. A complete anti-jamming process includes radar jamming signal classification, anti-jamming strategy selection and anti-jamming performance evaluation. As a first step of anti-jamming, the accurate classification of radar jamming signal is a core part of anti-jamming system.
In recently years, the methods of radar jamming signal classification mainly include likelihood-based methods and feature-based methods. The likelihood-based methods calculate the likelihood function of the jamming signal and compare it with a certain threshold to determine the type of jamming. For example, Greco et al. [7] presented adaptive coherent estimator and generalized likelihood ratio test to solve the problem of detecting deception jamming based on digital radio frequency memory. Zhao et al. [8] proposed a generalized likelihood ratio test target discriminator based on the classical linear model to distinguish between target and deception jamming. However, the likelihood-based methods need prior information and expert experience. Therefore, the scope of application is limited.
The feature-based methods include feature extraction and the design of classifier. Feature extraction is the key of radar jamming signal classification. Different radar jamming signals can be transformed through the time domain, frequency domain and time-frequency domain so that the characteristics between the signals are clearly distinguished [9]- [11]. In addition, the amplitude, phase and frequency of different jamming signals are also different. Therefore, it is possible to extract the statistical features of jamming signals in different domains to distinguish different radar jamming signals. For example, Li et al. [12] proposed feature extraction methods based on amplitude fluctuations, high order cumulants, and bispectrum to detect deception jamming. Ma et al. studied statistical algorithms to extract deception jamming features [13]. Liu et al. [14] studied the polarization scattering characteristics of chaff jamming and classified them by support vector machines (SVM). However, the feature-based methods mainly rely on artificial feature extraction, and the process of artificial feature extraction has high computational complexity and requires a lot of manpower.
In recent years, deep learning-based methods have been proposed and achieved outstanding performance in image, text and speech processing [15]. As a representative of deep learning, CNNs have great advantages in extracting discriminant and invariable features of inputs [16]. The powerful feature extraction ability of CNN is inspired by neuroscience [17], which is reasonable in theory. Furthermore, CNN has been successfully applied in the field of radar jamming signal classification. Such as, Yun et al. proposed a new method of barrage jamming detection and classification for SAR based on CNN [18]. Wang et al. [19] designed CNN to classify active jamming. However, there are few types of jamming signals that can be distinguish by radar jamming signal classification methods based on CNN, and with the development of electronic technology, more and more jamming patterns are presented. So, it is of great significance to design a reliable CNN model which can distinguish various radar jamming signals.
Furthermore, it should not be ignored that CNN-based methods usually need lots of training samples. If the training samples are limited, the phenomenon of overfitting will appear in the CNN model. In fact, due to the complex electromagnetic environment, the acquisition of jamming samples is often very difficult and the labeling of jamming samples is also a relatively tedious thing. Therefore, it is of great practical significance to design a model that can realize the accurate classification of radar jamming signals under the condition of limited training samples. At present, siamese network is widely used to solve the problem of insufficient training samples. For example, Wang and Wang [20] proposed an improved siamese network to solve the problem of leaf classification in the case of insufficient samples. With limited training samples, Sun et al. [21] realized the efficient identification of voltage sag sources by designing siamese networks. Therefore, in this paper, in order to address the problem of limited jamming training samples, an improved siamese network combined with CNN is proposed too.
The main contributions of this study are listed as followed: 1) A new classification model is proposed for radar jamming signal, which is based on 1D-CNN. The proposed model obtains good classification performance in terms of overall accuracy under the condition of sufficient training samples. 2) To deal with the practical problem of insufficient samples of radar jamming signal samples, an improved Siamese-CNN (S-CNN) is proposed for radar jamming signal classification.
3) The experimental results of 12 typical radar jamming signals showed that the proposed methods can effectively classify radar jamming signals with sufficient and limited samples. The rest of this paper is organized as follows. Section II and Section III introduces 1D-CNN and S-CNN for radar signal jamming classification, respectively. Section IV mainly introduces the experimental details, experimental results, and comprehensively analyses. Section V gives the conclusion of the whole work.

II. CNN-BASED RADAR JAMMING CLASSIFICATION
This section introduces the CNN-based radar jamming signals classification with relatively sufficient training samples. In view of the time domain characteristics of radar jamming signals, 1-D CNN is adopted to extract the hierarchical features of jamming signals. Through the feature extraction of the convolutional layers and the pooling layers, discriminant and invariant features are finally obtained, which are critical for radar jamming classification.

A. CNN-BASED FEATURE EXTRACTION
CNN mainly includes three basic parts: convolutional layers, nonlinear transformation, and pooling layers [22]. It is worth mentioning that deep CNNs can extract the input features hierarchically. With the help of local connections and shared weights, the features extracted by CNN are often invariant and robust.
A convolution layer and nonlinear transformation are defined in eq. (1) and eq. (2), respectively. Vector is the ith feature vector of the previous (k-1)th layer, x k j is the jth feature vector of the current kth layer, and N is the number of input feature vectors. w k ij and b k j represent the weight and bias of the neuron, respectively. * is the convolution operation. and f (x) is a rectified linear unit (ReLU), which is used to increase the nonlinear expression ability of the network. Moreover, it can extract sparse features faster.

B. THE DESIGN OF THE 1D-CNN MODEL
The designed 1D-CNN model for the classification of radar jamming signals is shown in Fig. 1. In order to fully demonstrate the feature extraction ability of CNN, two 1D-CNNs are designed to extract the features of real and imaginary part of radar jamming data. Through convolution and pooling, the deep features of the real and imaginary parts of the radar jamming data are extracted. Finally, we concatenate the aforementioned features and send them to the softmax classifier to obtain the jamming category information. Due to the high initial dimension of radar jamming data, the network is prone to overfitting. In order to effectively alleviate this phenomenon, dropout [22] and global average pooling (GAP) [23] are adopted in this work. Dropout makes the activation value of a certain probability p, which can make the model more generalized. In order to better match the jamming signal category with the feature map of the last convolution layer, GAP is used for replacing the traditional fully connected layers in CNN. Furthermore, GAP sums out the global spatial information, thus it is more robust to spatial translation of the jamming signal.
Meanwhile, In order to accelerate the training process, batch normalization (BN) is adopted in the 1D-CNN model. BN can maintain the same distribution of inputs at each layer of the neural network during the training process and accelerate the convergence of the network [24].
For the activation value of each neuron in the hidden layer, the BN mechanism can be formulated as follows: is mini-batch variance. After this transformation, the activationx of a certain neuron forms a normal distribution with a mean of 0 and a variance of 1. In order to enhance the network expression, eq. (4) is carried out for the transformed activation. γ (k) and β (k) represent learnable parameters (scale and shift) [24].

III. S-CNN-BASED RADAR JAMMING CLASSIFICATION
This section introduces the S-CNN-based radar jamming signals classification with relatively limited training samples. CNNs have a powerful feature extraction capability when training samples are sufficient. However, the lack of adequate training samples is a common problem in radar jamming classification. If the training samples are insufficient, CNN often overtrains, which reduces the classification accuracy of the test samples. Due to the similarity between intraclass samples and the differences between interclass samples, S-CNN seizes this point and realizes the classification of different jamming signals with limited samples by learning the similarity between the two inputs.

A. S-CNN-BASED FEATURE EXTRACTION
The S-CNN is used to measure the similarity between the two inputs. S-CNN has two sub-networks with the same structure and weights. During training, two sub-networks extract features from two inputs, while connected neurons measure the distance between two feature vectors. The traditional classification model requires a lot of samples with labels. To some extent, S-CNN realizes the reuse of training samples by paired training. Therefore, in the case of limited training samples, S-CNN can be considered for classification. S-CNN measures the similarity of inputs through distance space, such as Manhattan distance (L1 distance) and Euclidean distance (L2 distance), and compares the new samples by learning similarity to determine the category.
S-CNN maps inputs to feature vectors by using CNN, and uses the distance between the vectors to represent the differences between the inputs. Since paired samples are used to train S-CNN, it should be noted that the number of pairs of intraclass samples is consistent with the number of pairs of interclass samples. Only in this way can the S-CNN fully learn the similarity of intraclass samples and the difference of interclass samples. By training the S-CNN, the distance of the same class in the feature space is continuously reduced, and the distance of the different class is continuously increased.
S-CNN determine the category of test sample in the following way. Since the labels of the training samples are known, the test samples and training samples are paired into the trained S-CNN, and S-CNN will output the similarity probability between the test samples and the training samples. Therefore, the similarity probability between the test samples and various types of training samples can be obtained, and the category of the test samples is the category of the input training samples corresponding to the maximum similarity probability.

B. THE DESIGN OF THE S-CNN MODEL
The S-CNN model designed for the characteristics of radar jamming signals is shown in Fig. 2. First, in order to fully extract radar jamming signal features, separating the real part data and imaginary part data of paired input signals, and the four 1D-CNNs with parameter sharing are used to extract features, and concatenating the real part features and imaginary part features. Second, through the full connected layer (FC), paired inputs are finally represented by eigenvector. And then, L1 distance is calculated by using the eigenvector, and the obtained result is input into the sigmoid activation function to acquire the similarity probability P (The larger the P value, the more similar the two input signals were). Finally, S-CNN determine the category of the unknown jamming signal by the similarity probability P. Table 1 shows the algorithm for the designed S-CNN model. In training model, L1 distance is used to calculate the distance between jamming features extracted by CNN. Let p1, p2} denote the input jamming signal pair, they go through the weight sharing CNN and get the feature representation f (p1) and f (p2). Finally, the L1 distance D(p1, p2) is calculated, and D(p1, p2) is expressed as follows: n represents the dimensions of f (p1) and f (p2). x i and y i represent the element of f (p1) and f (p2), respectively. The loss function of the proposed S-CNN is defined as: where y (p1, p2) denotes label information for training jamming signal pair. Suppose y (p1, p2) [[space]] = 1 whenever p1 and p2 are from the same jamming signals class and y (p1,p2) = 0 otherwise. σ (·) is the sigmoidal activation function: Due to the limited training samples for S-CNN model, the network is easy to overfit. In order to effectively avoid overfitting and accelerate the training process, L2 regularization and BN operations are adopted in the S-CNN model.
L2 regularization improves the generalization ability of the S-CNN model by punishing the weights of unimportant features [25]. By introducing L2 regularization, the loss function in this work is defined as follows: D (p1, p2)) + λ w 2 (8) VOLUME 8, 2020 λ is the L2 regularization coefficient, which can reduce the complexity of the model and alleviate the over-fitting problem caused by limited samples. || · || 2 represents L2 distance. w denote the weights of features.

IV. EXPERIMENT AND RESULTS
In this section, first, the radar jamming signals data set and the setting of comparative experiments are introduced. Then, the setting of 1D-CNN and S-CNN model are introduced in detail. Third, the experimental results of 1D-CNN under the condition of sufficient training samples are presented. Finally, we analyse and summarize the results of S-CNN under the condition of limited training samples.

A. DATA DESCRIPTION
In this study, we considered typical radar jamming signals currently acting on linear frequency modulation pulse (LFM) signal emitted by pulse compression radar [26]. The LFM signal was defined as follows: where the pulse width T = 20µs, bandwidth B = 10MHz, and the sampling rate was 20MHz. Then, typical radar jamming signals simulated by experts were used to evaluate the performance of the proposed method. There were 12 typical types of radar jamming signals, including suppression jamming such as aiming jamming, blocking jamming, sweeping jamming and ISRJ, distance deception jamming, dense false target jamming, smart noise jamming and typical passive jamming such as chaff jamming [14]. Finally, it also included the additive compound jamming such as ISRJ + chaff, dense false target + smart noise, and distance deception + sweeping. In the jamming data set, 500 samples were simulated for each kind of radar jamming signals, and the number of sampling points per sample was 2000 complex sampling points (2000 real points + 2000 imaginary points), and the real part and the imaginary part were separated and assembled into a row vector. Some jamming signals simulation parameters and time domain waveform were shown in Table 2 and Fig. 4, respectively. Among them, JNR represented the jamming noise ratio and others represented the corresponding radar jamming signals simulation parameters. All simulated jamming signals suppressed or deceived the target signal.
In order to satisfy the needs of the comparative experiments, experts extracted the features of the jamming signal. Extracted statistical features (SF) including skewness, kurtosis [27], normalized instantaneous amplitude frequency maximum, frequency smoothness, envelope fluctuation parameter [28], mean, and variance, which were a total of seven features.

B. EXPERIMENTAL PARAMETERS SETTINGS 1) COMPARATIVE EXPERIMENTS
In order to explore the classification effect of CNN and S-CNN on jamming signals, a series of comparative experiments were designed in this paper. We used feature data set  to train support vector machine (SVM), decision tree classifiers, logistic regression and random forest (RF). In terms of parameter settings, the kernel function of SVM was radial basis function and we searched the best parameters in the way of exponentially growing sequences ofC and γ (The scope of C and γ is: 10 −3 ∼ 10 3 ). We adopted the classification mode of logistic regression was one-vs-rest and we chose classification and regression tree algorithm to train decision tree. At the same time, the number of decision trees in RF was 200 and the number of features to consider when looking for the best split was set to 10. Furthermore, we compared the designed algorithm with the current 2D-CNN method applied to radar jamming classification (we referred to the network structure of [19]). Finally, we used overall accuracy (OA), and kappa coefficient (K) [29] to compare and estimate the capabilities of the proposed models. The OA was computed by eq. (10). OA = number of correctly classified samples total number of test samples × 100 (10) Suppose the number of test samples in each class was a 1 , a 2 , . . . , a n , and the number of samples for each type of prediction was b 1 , b 2 , . . . , b n . The kappa coefficient was computed by eq. (11). kappa = a 1 × b 1 + a 2 × b 2 + . . . + a n × b n (total number of test sample) 2 (11)

2) 1D-CNN
In this experiment, we spilt the jamming data set into three subsets (i.e., training, validation and test samples). We randomly chose 50, 100, and 150 samples from each kind of jamming signal as training set. The validation samples were from the radar jamming data set outside the training set, and 50 samples were randomly selected from each kind of jamming signals. Then, the rest of the samples were used as the test set. The generated architecture of the 1D-CNNs for jamming data set was shown in Table 3. In the training process, the size of the mini-batch was set to 64, the dropout ratio was set to 0.5, and the number of training epochs was set to 300 for jamming data set. At the same time, the initial learning rate of all 1D-CNNs was set to 0.005, and the learning rate decreased with a step size of 75 epochs.

3) S-CNN
In this experiment, the way to split the data set was consistent with the previous setting. However, the number of training samples for each class of radar jamming signals was set to three, four, and five. The architecture of the S-CNN for radar jamming data set was shown in Table 4. In the training process, the number of training iterations was 300 and the number of logarithms to train S-CNN in one iteration was 12.
Meanwhile, the size of the mini-batch was set to 12 and L2 regularization weight was set to 2 × 10 −4 . At the same time, the S-CNN model adopted adam algorithm [30], and its learning rate was set to 0.0001.

4) EXPERIMENTAL ENVIRONMENT
The platform for jamming data generation was MATLAB 2018. All the experiments were run on pycharm-community-2017.       condition of 150 training samples for each type of jamming signals. For example, when the number of training samples of each class was 150, most classification models had poor classification of the smart noise jamming, but the OA of 1D-CNN was 98.27% ± 2.81%. Since the range of JNR in the simulated radar jamming data set was 30-60dB, the jamming signals were not affected much by the noise. However, in the complex electromagnetic environment, the jamming signals were mixed with a lot of noise, which affected the performance of the classification model. Therefore, in order to explore the anti-noise ability of the designed 1D-CNN model, the JNR of radar jamming data was changed, and the experimental results were shown in Fig. 5. Under the condition of different training samples, the JNR was reduced to 5dB and 10dB (white Gaussian noise was mainly introduced), and the classification accuracy of the 1D-CNN model did not decrease significantly. For example, when the number of training samples of each class was 150, the OA of the designed 1D-CNN with the JNR of 5dB and 10dB was lower 3.02% and 0.95% than the designed 1D-CNN model without changing the JNR, respectively. This showed that the designed 1D-CNN model in this paper had better anti-noise ability. Table 6 showed the confusion matrix for the proposed 1D-CNN at JNR of 30-60dB. It can be seen from the table that the proposed 1D-CNN algorithm had good classification performance for radar jamming signals when the training samples were sufficient. According to the confusion matrix, classification errors mainly occurred on jamming signals with similar jamming mechanism and simulation parameters, such as, distance deception jamming and dense false target jamming. This was mainly because the simulation range of false target parameters of the two types of radar jamming signals was consistent (the number of false targets was different), which led to the correlation and confusion between the two types of jamming signals. Table 7 showed the classification results of different classification models with limited training samples. In all classification models, the designed S-CNN had achieved the best OA  and K. For example, when the number of training samples of each class was 3, 4, and 5, the OA of the designed S-CNN model was 82.73 ± 3.67%, 83.99± 2.49%, and 84.55 ± 2.08%, which was 4.16%, 4.13%, and 3.54% higher than the optimal comparative experimental results, respectively. At the same time, in all classification models, the OA of the designed S-CNN for five kinds of jamming signal reached the optimal value under the condition of 5 training samples for each type of jamming signals. Such as, most classification models had poor classification accuracy for sweeping jamming signals, but the OA of S-CNN was 100% ± 0.00%.

D. THE CLASSIFICATION RESULTS OF THE S-CNN MODEL
Meanwhile, the anti-noise ability of the designed S-CNN model was analyzed. The change in JNR was consistent with previous experiments and the experimental results were shown in Fig. 6. Under the condition of different training samples, the JNR was reduced to 5dB and 10dB, and the classification accuracy of the S-CNN model did not decrease significantly. For example, when the number of training samples of each class was 5, the OA of the designed S-CNN with the JNR of 5dB and 10dB was lower 0.73% and 0.48% than the designed S-CNN model without changing the JNR, respectively. This showed that the designed S-CNN model in this study had better anti-noise ability. Table 8 showed the confusion matrix for the proposed S-CNN at JNR of 30-60dB. It can be seen from the table that the proposed S-CNN algorithm had good classification performance for radar jamming signals. According to the confusion matrix, classification errors mainly occurred on jamming signals with similar jamming mechanism. Since smart noise was generated by adding noise frequency modulation on the basis of ISRJ, ISRJ, smart noise, chaff + ISRJ, and dense false target + smart noise were easy to be confused in time domain. Referring to the confusion matrix for the proposed 1D-CNN, this phenomenon was more obvious when the training samples were limited.
To better understand the classification power of the designed S-CNN model, we randomly selected 60 labeled samples per class from jamming data set and used t-SNE [32] algorithm to reduce the dimensionality of inputs and features extracted by S-CNN to two. The results were visualized in Fig. 7, where different colors represented different classes in jamming data set. It was obvious from the Fig. 7 (a) that the original inputs of different jamming classes were confused with each other. Then, these labeled samples were input into the trained S-CNN to extract the features of full connection layer. The feature visualization of the radar jamming signals extracted by the designed S-CNN was shown in Fig.7 (b). It can clearly see that the features extracted by S-CNN significantly reduced the distance between intraclass samples and further increased the separability between interclass samples.

V. CONCLUSION
In this paper, we proposed a new radar jamming signal classification model based on 1D-CNN and made full use of the hierarchical feature extraction ability of 1D-CNN. Through experiments, the method based on 1D-CNN showed strong ability in feature extraction and accurate classification when the training samples were sufficient. At the same time, in order to solve the problem of limited training samples, a radar jamming signal classification model based on S-CNN was proposed. The proposed S-CNN fully showed the ability to measure similarities between jamming signals. Compared with other classification models, both designed 1D-CNN model and S-CNN model achieved the optimal classification performance for a variety of typical radar jamming types. Meanwhile, the proposed radar jamming signal classification models had good anti-noise ability. The experimental results indicated the effectiveness of deep learning model, and it had great potential in radar jamming signal classification.