Deep-Learning for Radar: A Survey

A comprehensive and well-structured review on the application of deep learning (DL) based algorithms, such as convolutional neural networks (CNN) and long-short term memory (LSTM), in radar signal processing is given. The following DL application areas are covered: i) radar waveform and antenna array design; ii) passive or low probability of interception (LPI) radar waveform recognition; iii) automatic target recognition (ATR) based on high range resolution profiles (HRRPs), Doppler signatures, and synthetic aperture radar (SAR) images; and iv) radar jamming/clutter recognition and suppression. Although DL is unanimously praised as the ultimate solution to many bottleneck problems in most of existing works on similar topics, both the positive and the negative sides of stories about DL are checked in this work. Specifically, two limiting factors of the real-life performance of deep neural networks (DNNs), limited training samples and adversarial examples, are thoroughly examined. By investigating the relationship between the DL-based algorithms proposed in various papers and linking them together to form a full picture, this work serves as a valuable source for researchers who are seeking potential research opportunities in this promising research field.


I. INTRODUCTION
In recent years, top researchers around the world have been increasingly resorting to deep learning (DL) based algorithms to solve bottle-neck problems in the field of radar signal processing [1], [2]. The amount of publications on "deep learning for radar" have been increasing rapidly. To illustrate radar engineers' soaring interests in DL, the number of publications on the topic of "deep learning for radar" from 2016 to 2020 are plotted in Fig. 1 (IEEE Xplore database).
Specifically, a comprehensive survey of machine learning algorithms applied to radar signal processing is given in [3], where six aspects are considered: i) radar radiation sources classification and recognition; ii) radar image processing; iii) anti-jamming & interference mitigation; iv) application of machine learning in research fields other than i) -iv); v) promising research directions. In [4], Zhu et al. provided a comprehensive review on deep learning in remote sensing, which is focused on automatic target recognition (ATR) and terrain surface classification based on synthetic aperture radar (SAR) images. In [5], Zhang et al. presented a technical tutorial on the advances in deep learning for remote sensing and geosciences, which is also focused on image classification.
In this work, we conduct a comprehensive review on the application of DL-based algorithms in radar signal processing, which includes the following aspects: a) DL for waveform and array design, which is an enabling technology for cognitive radar & spectrum sharing; b) DL-based radar waveform recognition, which could potentially 1) boost the possibility of intercepting and recognizing the signals transmitted from the low probability of interception (LPI) radar; and 2) improve (note that these two factors are also applicable to DNNs in other application areas in addition to ATR); d) DL-based algorithms for jamming/clutter identification and suppression. The major contributions of this work are summarized as following:  A comprehensive review of various DL-based algorithms for radar signal processing is provided. The papers reviewed in this work are "hand-picked" highquality research works and are neatly grouped based on the pre-processing methods, DNN structure, main features, dataset, etc.  Both the positive and the negative sides of stories about DL are checked. In contrast, in many existing reviews/surveys on this topic, DL has been unanimously praised as a "marvelous" tool that can overcome all the barriers that preventing radar systems reaching the ideal performance goal. In this work, considerable pages are spent on the "negative" side, e.g. the devastating effects of carefully-crafted adversarial examples on an otherwise "well-trained" DL network.  The relationship between the algorithms proposed in various papers is thoroughly investigated. Generally, a "novel" algorithm doesn't always pop out from nowhere. By analyzing the evolution process from one algorithm to another by comparing different research works rather than taking the contribution claims made in each paper based on their face values, one can get much deeper insights into the problem at hand and the real contribution of a paper. Specifically, in the field of DL, open-source Matlab/Python codes are free for downloads on many websites. The true value of a specific research paper can only be determined by linking everything together as a full picture and then make observations regarding the position of this particular paper within the whole picture. The general structure of this review paper is plotted in Fig.  2. The techniques/applications investigated in this work are listed in Fig. 3, with the most popular network architecture and application highlighted. The rest of this work is organized as following. In Section II, a couple of DL-based radar waveform & array design algorithms are reviewed. In Section III, we focus on the radar signal recognition problem for LPI radar and passive radar. In Section IV, automatic target recognition based on radar HRRP, micro-Doppler signature, and SAR images with DL-based algorithms is investigated. Moreover, two challenging problems for DLbased radar signal processing, namely the lack of training data and the adversarial attacks, are also analyzed. In Section V, various DL-based jamming and clutter suppression algorithms are compared and analyzed. Some final remarks are offered in Section VI.

A. DL FOR SPECTRUM-SHARING
With the ever-increasing demand for spectrum resource from wireless communications systems, technologies enabling spectrum-sharing between radar and communications systems have grabbed the attention of researchers from both fields. In [7]- [11], the DL-based algorithms have been employed to prevent mutual DL-based algorithms are also increasingly adopted in the fields of radar waveform optimization under specific constraints, especially for MIMO radar. In order to separate the echo signals caused by the illuminating signals from different transmitting facilities of MIMO radar for further processing at the receiving end and achieve the waveform diversity gain, the waveforms from different transmitting antennas have to be near-orthogonal [12]. Hence the crosscorrelations between waveforms from different transmitting antennas are to be minimized. To minimize the auto-/crosscorrelation sidelobes while meeting the constraints of constant modulus, Hu et al. designed a deep residual neural network consists of 10 residual blocks, each of which is made of dual layers of 128 neurons [13]. Later, a deep residual network similar to the one in [13] was adopted in [14] to synthesize desired beampatterns while minimizing the cross-correlation sidelobes under the constraints of constant modulus. In [15], Zhong et al. proposed a feedforward neural network with ten hidden layers to maximize the SINR of MIMO radar under the constraints of constant modulus and low sidelobe levels. many research works are focused on the problem of the minimization of crosscorrelation sidelobe levels. In [16], the problem of multitarget detection was considered assuming unknown target positions, where deep reinforcement learning based strategy was adopted for waveform synthesis to maximize the detection capabilities of MIMO radar. Finally, the waveform generation and selection problem for multimission airborne weather radar was discussed in [17], where a feedforward neural network with varying number of hidden layers was designed to synthesize nonlinear frequency modulated waveforms (NFMW) with pre-determined bandwidth and pulse length.

C. DL FOR ARRAY DESIGN
DL-based algorithms have also been employed to realize cognitive selection and intelligent partition of antenna subarrays. For example, in [18], a CNN with multiple convolutional layers, pooling layers and fully connected layers (referred to as "Conv", "POOL", and "FC", respectively, for simplicity in the rest of this work) was utilized for cognitive transmit/receive subarray selection based on the development of the surrounding environment. Moreover, DL-based algorithms could potentially boost the performance of subarray-based MIMO (Sub-MIMO) radar, which could be regarded as a hybrid of phased-array radar and MIMO radar. The essence of Sub-MIMO radar is to transmit correlated waveforms within the same subarray, which resembles the working mechanism of the conventional phased-array, while the waveforms from different subarrays designed to be orthogonal, so that they could be separated at the receiving end for waveform diversity gain [19]. It follows naturally that the partition of subarrays for Sub-MIMO radar plays a key role in deciding the balance between the coherent processing gain and the waveform diversity gain. In [20], a novel CNN was proposed for interleaved sparse array design for phased-MIMO radar. Specifically, the parallel lightweight structure (i.e. PL module), which is based on the MobileNet-V2 structure, was used to divide feature matrices into parallel branches. Meanwhile, the scale reduced convolution structure (i.e. SR-module) was used as an alternative to the conventional pooling layer for feature matrix dimension reduction. Simulation results show that compared with uniform antenna array partition, the proposed CNN provides transmit beampatterns with narrower mainlobe and lower sidelobes, more accurate direction of arrival (DOA) estimation, and higher output SINR.
The structures of the DNNs proposed in [7]- [20] and their distinctive features are summarized in TABLE 1.

III. DL FOR LPI OR PASSIVE RADAR WAVEFORM RECOGNITION
The DL-based radar waveform recognition is also gaining popularity in recent years. Various neural networks and algorithms have been developed, which include the deep convolutional neural networks (CNNs) [21]- [23], autoencoders [24]- [26], and recurrent neural networks (RNNs) [27]- [29]. These techniques could potentially 1) boost the possibility of intercepting and recognizing the signals transmitted from the low probability of interception (LPI) radar [30]- [31]; and 2) improve the direct-path signal estimation accuracy for passive radar applications [43]- [45]. However, as is pointed out in [46], [47], DL-based signal classification algorithms are vulnerable to adversarial attacks, which are expected to be more powerful than classical jamming attacks.

A. DL FOR LPI RADAR
Most modern radar systems have been designed to emit LPI waveforms to avoid interception and detection by enemies. Therefore, automatic radar LPI waveform recognition has become a key counter-countermeasures technology. In literatures, dozens of DL-based waveform recognition techniques have been proposed within the past five years. Usually, the raw radar data are first preprocessed with time-frequency analysis (TFA) techniques, such as Choi-William distribution (CWD) [30]- [35], Fourier-based Synchrosqueezing transform (FSST) [36], Wigner Ville distribution (WVD) [37], and short-time Fourier transform (STFT) [38]- [40], to obtain the timefrequency images. After that, various DNN structures, mostly CNN, could be designed for feature extraction and waveform classification.
In [30]- [35], the TFA technique (CWD) was used to generate time-frequency images in the pre-processing step. In [30], the sample averaging technique (SAT) was adopted for signal pre-processing to reduce the computational cost, after which a 9-layer CNN was proposed. In [31], a 7-layer CNN along with a novel tree structure-based process optimization tool (TPOT) classifier was designed. In [32], Ma et al. employed two different DNN structures to approach the waveform classification problem: a 11-layer CNN and a bidirectional LSTM, with the former exhibiting better performance. In [33], transfer learning was employed to counter the problem of limited training data. The network was pretrained with five different existing highperformance CNN architectures: VGG16, ResNet50, Inception-ResNetV2, DenseNet, and MobileNetV2, with VGG-16 proved to offer the highest classification accuracy.
Twelve different types of radar waveforms have been used to test the performance of various CNN structures proposed in [30]- [33], which include the linear frequency modulated (LFM) waveform, the BPSK, the Frank-coded waveform, the Costas-coded waveform, the P1-P4 phasecoded waveforms, and the T1-T4 time-coded waveforms. Although the performances of different DNNs in [30]- [33] are noncomparable due to training/test data difference, the classification accuracy offered by these DNNs for SNR = -4 dB are all higher than 90%. In [34]- [35], the performances of DNNs were tested with less than 8 different types of waveforms. In [34], networks (Inception-v3 and ResNet-152) pretrained with ImageNet were used to reduce the training cost. In [35], instantaneous autocorrelation function (IAF) was used for denoising via atomic norm as a preprocessing step, following which a CNN structure was proposed for the classification of the LFM, the Costascoded, and the P2-P4 coded waveforms.
Although the CWD is a widely adopted TFA technique, it also involves high computational complexity, which makes the researchers to seek computationally-effective alternatives. The FSST was used in [36] as a substitute for CWD in the pre-preprocessing step, following which a multi-resolution CNN with three different kernel sizes was proposed. In [37], the WVD was adopted, and a VGG16 variant pretrained with ImageNet was used to reduce the training cost. Moreover, the STFT was adopted in [38]- [40] to obtain the time-frequency diagram of radar data. In [38], Ghadimi et al. proposed two CNN structures based the GoogLeNet and AlexNet, respectively, for the classification of LFM, P2-P4, and T1-T4 waveforms. In [39], Wei et al. proposed a novel squeeze-and-excitation network for feature extraction in time, frequency, and time-frequency domains, and the recognition results of all the domains are fused subsequently. In [40], a simple CNN with three convolution layers and one fully connected layer was used to classify of 20 different types of signals, which include frequency-modulated waveforms with various bandwidth and pulse width and phase-modulated waveforms. Finally, it is worth mentioning that some research works on this topic didn't employ TFA techniques for signal preprocessing. For example, in [41], an adaptive 1D CNN with four hidden layers and two dense layers was proposed for the classification of continuous and pulsed waveforms (sinusoidal, LFM, bi-phase coded, frequency-stepped).
The preprocessing procedures, the DNN structures, and the radar waveforms used for performance evaluation in [30]- [40] are summarized in TABLE 2.

B. DL FOR PASSIVE RADAR
Another potential application area for the DL-based automatic waveform recognition algorithms is passive radar. Passive radar utilizes the signals from illuminators of opportunities (IOs) (e.g. base stations of wireless communications systems) for target detection, imaging, and tracking, which could increase the radar coverage area while avoiding the high infrastructure cost and the spectrum-crowdedness caused by the construction of new dedicated radar transmitters. However, since the waveforms from the IOs are usually unknown to radar receivers, the performance of passive radar is usually much worse than the conventional active radar [42]. In [43]- [44], DL was used to realize simultaneous waveform estimation and image reconstruction for passive SAR composed of a ground-based IO at known position and an airborne receiver. A recurrent neural network (RNN) was designed, with which the scene reflectivity was recovered via forward propagation, while the waveform coefficients were reconstructed via backpropagation. Simulation results show that the proposed RNN could learn the characteristics of quadrature phase-shifted keying (QPSK) signals [43] and OFDM signals transmitted from DVB-T [44], and perform the SAR image reconstruction with low error. It was also shown that as the number of layers of the RNN increases, the image contrast improves at the cost of increased reconstruction error. In [45], Wang et al. developed a novel DNN consisting of a two-channel CNN and bi-directional LSTM, which is termed as TCNN-BL, for waveform recognition for cognitive passive radar, which could modify the sampling rate adaptively to suit the task at hand. Moreover, a parameter transfer approach was utilized to improve the network training efficiency.

C. CHALLENGES
According to [46], the DNNs are highly vulnerable to adversarial attacks. Depending on the information that is available to the attackers, adversarial attacks could be classified as white-box attack (the model structure and the parameters of the network are completely known a priori), grey-box attack (known model structure & unknown parameters), and black-box attack (unknown model structure & parameters). In most cases, the detailed information regarding DNNs is unknown to the attacker, who can only get access to the classification results of the network. Although black-box attack is more common and less devastating than the other two types of attacks, whitebox attack is often used in research works to evaluate the worst-case scenario. In [47], Sadeghi et al. showed that black-box attack can be designed to be approximately as effective as white-box attack, which could lead to dramatic performance degradation in DL-based radio signal classification. It is worth noting that most research works on the topic of signal/waveform misclassification caused by adversarial attacks target the wireless communication systems rather than radar. Nevertheless, the theory and  [48], the 1D CNN used as RF signal classifier was pre-trained with an autoencoder to migrate the deceiving effects of adversarial examples, which has the potential to be extended to the 2D image classification problem. In [49], two statistical tests were proposed for the detection of adversarial examples.

IV. DL FOR ATR
Machine learning (such as k-nearest neighbor and dictionary learning) has been employed for ATR long Yes before the emergence of DL [50], [51]. After AlexNet (one of the most popular deep CNNs) won the ILSVRC'12 contest [52], DL for radar ATR has become an intensively researched subject. Based on the amount of labeled data in the dataset used for training the network, DL could be classified as unsupervised learning, supervised learning, and semi-supervised learning (SSL), with SSL being a halfway between the other two. According to [53], in common cases, 1%-10% of the data used for SSL training are labeled, while the rest are unlabeled samples. Since most of the existing DL-based radar ATR methods are supervised, the recognition/classification accuracies of these methods are heavily limited by the amount of labeled training data. In this section, we provide a comprehensive review of DL-based ATR methods proposed in recent published research works, which includes i) ATR using the HRRP; ii) ATR using the micro-Doppler signatures; iii) ATR for SAR; and iv) major challenges for DL-based ATR.
Some researchers used measured HRRP data for performance evaluation. For example, the HRRP data from Yak-42 (large jet), Cessna Citation S/II (small jet), and An-26 (twin-engine turboprop) were used in [54]- [58]; the HRRP data from Airbus A319, A320, A321, and Boeing B738 were used in [59]; the HRRP data from seven types of ship of different sizes (length from 89.3 m to 182.8 m) were used in [60]; the HRRP data from various types of ground vehicles were used in [62], [66], [67]. Since most researchers only have access to a limited mount of HRRP measurement data associated with a handful of vehicles, many of them resort to simulated HRRP data generated by software based on the specific CAD models of vehicles for research purposes. For example, in [63], Lundé n et al. generated HRRP data for 8 fighters (F-35, Eurofighter, etc.) with POFACETS & 3D facet models of aircrafts. In [64], the HRRP data for 6 military and 4 civilian ship targets are simulated based on CAD models assuming X-band maritime radar. Another feasible alternative is data augmentation with generative adversarial network (GAN). Specifically, in [62], GAN was adopted to address the problem of unbalanced training samples, i.e. the labeled training samples for some classes (majority classes) significantly outnumber the other classes (minority classes).
The DNN structures of the DL-based ATR methods proposed in [54]- [65] along with their distinctive features are summarized in TABLE 3. The preprocessing procedures and the dataset used for performance evaluation have also been noted in the table. It is worth mentioning that some simulation results regarding target recognition using a supervised DL based on the HRRPs collected with MIMO radar have also been presented [68]. However, since the DNN used to obtain the results in [68] was not detailed, it is not included in TABLE 3.

B. DL-BASED ATR USING MICRO-DOPPLER SIGNATURES
DL-based target detection/classification based on micro-Doppler signatures has been gaining ground rapidly in the field of automatic ground moving human/animal/vehicle target recognition [69]- [73] and drone classification [74]- [77]. In [69], MAFAT dataset, which contains the echo signals from humans and animals collected by different pulse-Doppler radars at different locations, terrains, and SNR, was used for the training of a six-layer CNN. To achieve higher classification accuracy, the data was further augmented via random frequency/time shifting, noise-adding, and vertical/horizontal image flipping. In [70], a CNN composed of 5 dense blocks (i.e. 3 × 3 Conv followed by 1 × 1 Conv) and 5 transition blocks (i.e. 1 × 1 Conv followed by 2 × 2 POOL) was proposed for human motion classification based on micro-Doppler signatures, the performance of which was tested with two datasets containing the echoes associated with six human motions (walking, running, crawling, forward jumping, creeping, and boxing) obtained via simulation and measurement, respectively. The major feature of the human motion recognition algorithm in [70] is that the proposed network is more robust to the varying target angle aspect than most classic CNN models, such as VGGNet, ResNet, and DenseNet. In [71]- [73], Hadhrami et al. investigated the problem of single-person/group/vehicle recognition based on micro-Doppler signatures with DL. Pretrained classic CNN models (such as VGG16, VGG19, and AlexNet) and transfer learning were adopted to improve the network training efficiency. The RadEch human/vehicle targets tracking data collected with Ku-band pulse-Doppler radar, which covered typical scenarios like singleperson/group walking/running and truck moving, was used to test the performance of the proposed network. Moreover, data augmentation (×16) with image vertical flipping and circular shifting was employed to compensate for the limited training data.
In [74] and [75], pretrained classic CNN models (e.g. GoogLeNet) are used for drone classification. Specifically, in [74], the micro-Doppler signatures and the cadence-velocity diagrams obtained by 14 GHz frequency modulated continuous wave (FMCW) radar in indoor/outdoor experiments are merged as Doppler images, based on which drones with different number of motors are classified. In [75], both the pretrained GoogLeNet and the deep series CNN with 34 layers are employed for in-flight drone/bird classification. The RGB and the grayscale echo signal dataset collected by 24 GHz and 94 GHz FMCW radars are used to train the two networks, respectively. One distinctive feature VOLUME XX, 2021 8 of the networks presented in [75] is that clutter and noises have been treated as two separate sub-classes. In [76] and [77], Mendis et al. proposed a deep belief network (DBN) formed by stacking the conventional RBM and the Gaussian Bernoulli RBM (GBRBM), which is similar to the one proposed in [54], to address the problem of micro drone detection and classification. The classification was based on the Doppler signatures of the targets of interest and their spectral correlation function (SCF) (i.e. Fourier transform of autocorrelation function) signature patterns. The performance of the proposed DBN was tested with the echo signals collected from three micro-drones (available at supermarkets at a price lower than $100) by S-band CW Doppler radar. The micro-Doppler signature based target detection and classification approaches proposed in [69]- [77] are summarized in TABLE 4. Finally, it is worth mentioning that a comprehensive review on the application of DL for UAV detection and classification was provided in [78]. Although [78] covers the general topic of drone detection with multi-types of sensors (which include electro-optical, thermal, sonar, radar, and radio frequency sensors) and does not focus specifically on drone classification using the Doppler signatures collected by radar, it still serves as a good reference work for readers who are interested in the topic of drone/birds detection and classification.

C. DL-BASED ATR FOR SAR AND VIDEO SAR
In 2020, Majumder, Blasch, and Garren published a book summarizing recently proposed DL-based approaches for radar ATR, where DL for single and multi-target classification in SAR imagery was considered [79].
Specifically, this book focused on the ATR performances of various DNNs evaluated with the popular MSTAR dataset, with MSTAR stands for the Moving and Stationary Target Acquisition and Recognition. The public release of the MSTAR dataset, which was collected by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), consists of 20,000 SAR image chips covering 10 targets types from the former Soviet Union. It should be noted that, although the MSTAR dataset has long been widely adopted in research works to evaluate the performance of traditional machine-learning based algorithms (e.g. SVM), by which a classification rate of 97%-100% had been reached, it has been shown in some papers that the ATR performance of the algorithms trained/tested merely on the MSTAR dataset usually degrade when trained/tested using other dataset (e.g. the QinetiQ dataset [80], [81]). Nevertheless, in this section, we will give a brief review of recently proposed DNNs for ATR employing the MSTAR dataset [82]- [92] and other SAR image datasets (e.g. TerraSAR-X). The limitation of the MSTAR dataset and the possible counter solutions will be covered later in Section IV-D.
In [82], Chen et al. proposed an all-convolutional network (A-ConvNet) composed of 5 Conv and 3 × POOL. Since only sparse connected Conv were used and the FC was omitted, A-ConvNet is highly computational efficient. The performance of A-ConvNet was evaluated under both standard operating condition (SOC) and extended operating condition (EOC) (e.g. substantial variation in depression angle/target articulation), which has been widely adopted as the performance benchmark in research papers. In [83], a normal multiview deep CNN (DCNN) was proposed, which  [88] are also worth brief mentioning. In [86], morphological operation was used to smooth edge, remove blurred pixel, amend cracks, and the large-margin softmax batch normalization was employed. In [87] and [88], the database was extended with affine transformation in range, and a couple of SVMs were used to replace the FC in CNN for final classification. ATR based on SAR image sequence obtained from, for example, single-radar observations along a circular orbit over time or joint observation from different angles by multiple airborne radars, has also been investigated in research works. Considering that the sub-images in the SAR image sequence obtained by the imaging radar over a period of time from the same target often exhibit conspicuous variations, a spatial-temporal ensemble convolutional network (STEC-Net) consisting of 4 convolutional layers and 4 pooling layers was proposed in [90]. Dilated 3D convolution was used to extract spatial and temporal features simultaneously, which were progressively fused and represented as the ensemble feature tensors. To reduce the training time, compact connection was used rather than fully connected layer. In [91], Zhang et al. proposed a multi-aspect-aware bidirectional LSTM network (MA-BLSTM) consisting of the feature extraction blocks, the feature dimension reduction block, and 3-layer LSTM block. The feature extraction block utilizes the Gabor filter (orientation and rotation sensitive) in combination with the three-patch local binary pattern (TPLBP) operator (rotation invariant) to obtain global & local features, while 3-layer MLP was employed for feature dimension reduction. In [92], Bai et al. proposed a bidirectional LSTM network, the performance of which was evaluated for two cases: clutterpresent and clutter-free. Surprisingly, the presence of clutter lead to higher classification accuracy than the clutter-free case. All the DNNs proposed in [90]- [92] reported a target recognition accuracy higher than 99.9%, but the performance is expected to degrade in real-life application scenarios (note: "a machine trained in one environment cannot be expected to perform well when environmental conditions change"---J. Pearl [93]).
According to [91] and [92], the LSTM network outperforms the hidden Markov models (HMMs), which has been widely adopted to model the multi-aspect SAR images until 2000s [94], in modeling the stochastic sequences, especially when the initial probability of states is unknown. However, the LSTM is notoriously timeconsuming to train (not to mention that the training time of MA-BLSTM increases by 5 times with the decrease of training data [91]). Moreover, auto-extracted features obtained with CNNs or other types of unsupervised neural networks are not necessarily better than the hand-crafted ones designed by human experts. Actually, many wellestablished researchers hold doubts against the "black-box" process of "automatic" feature extraction, which makes a network extremely vulnerable to adversarial attacks (more details regarding this problem will be provided in Section IV-D).
Except for the CNNs and the LSTM networks mentioned above, other DL-based networks such as the autoencoders and Capsule Networks (CapsNets) have also been investigated as feasible solutions to the ATR problem. In [95], Deng et al. proposed a network composed of stacked auto-encoders (SAE). To avoid overfitting, restriction based on Euclidean distance was implemented (i.e. samples from the same target at different aspect angles have shorter distance in feature space) and a dropout layer was added to the network. In [96] and [97], Geng et al. proposed a deep supervised & contractive neural network (DSCNN), which consists of 4 layers of supervised and contractive autoencoders. Multiscale patch-based feature extraction was performed with three filters: the gray-level gradient cooccurrence matrix (GLGCM) filter, the Gabor filter, and the histogram of oriented gradient (HOG) filter. The graphcut-based spatial regularization was applied to smooth the results. Moreover, unlike the other networks discussed in this subsection, which have all been trained and tested using the MSTAR dataset, the DSCNN was tested with three datasets, the TerraSAR-X, the Radarsat-2, and the ALOS-2 data. A comprehensive review of autoencoder and its variants for target recognition in SAR images could be found in [98]. In [99]- [102], various capsule networks (CapsNets) were proposed to address two problems in SAR-image based ATR: limited training data and depression angle variance. CapsNets are composed of capsules which are vectors of information about the input data, with the magnitude representing the probability of the presence of an entity and the direction representing the pose and position of the entity. Due to page limitation, this minority group of CapsNets based networks won't be detailed here. The DNNs discussed in this section for ATR using SAR images are summarized in TABLE 5.
Finally, note that DL could also be used for video-SAR moving target indication. Specifically, Ding et al. proposed a faster region-based CNN in [103], which is a variant of the algorithm proposed by Ren et al. in [104]. To reduce the training burden, the features were extracted with pertained CNN models such as AlexNet, VGGNet, and ZFNet. The Density-based Spatial Clustering of Application with Noise (DBSCAN) algorithm was developed to reduce false alarms, and the Bi-LSTM was used to improve the detection probability. The performance of the proposed network was evaluated with both simulated video SAR data and real data released by Sandia National Laboratory, which was further augmented with rotation and cropping.

D. MAJOR CHALLENGES FOR DL-BASED ATR
In Section IV-C, we reviewed many DNNs trained and tested with the MSTAR dataset. In this subsection, we will look into two limiting factors which have been keeping the unanimous adoption of DNNs for radar ATR tasks on battle fields from becoming true: the limited amount of training data and the potential security risk posted by carefully crafted adversarial attacks.

(1) Lack of training data
Although classification rates of higher than 99% have been reported in many papers covering DNNs trained for radar ATR using the MSTAR dataset, the accuracies of these networks are expected to degrade dramatically when tested with SAR images taken at depression angles that are very different from the ones used to obtain the training dataset or other SAR image datasets, e.g. the QinetiQ dataset [80], [81]. As pointed out by J. Pearl, the neural networks usually cannot perform well if the environment they are tested in is different from the one they are trained with [93]. However, the DL-based approaches will simply lose all their glamor if we must train the network from the very beginning with large amount of qualified training data for every new classification task. What's worse, unlike other ordinary image classification tasks (e.g. cat/dog classification), the SAR images used for radar ATR are usually very scarce, especially when the targets are military vehicles employed by other countries. Therefore, machine learning with small training data sets is key to the success of radar ATR using SAR images. In the following, we will examine various neural networks that are designed to meet this challenge.
Since these networks have all been trained using the MSTAR dataset, the classification accuracies of these networks and the number of samples involved in the training process are comparable. Before we move on, we will first provide some details on the MSTAR dataset, so the readers could get a clear picture of what is happening. As was mentioned before, the MSTAR dataset consists of 20,000 SAR image chips covering 10 targets types from the former Soviet Union (BMP2, BTR70, T72, BTR60, 2S1, BRDM2, D7, T62, ZIL131, ZSU23/4). These targets were measured over the full 360  azimuth angles and over multiple depression angles (15  , 17  , 30  , and 45  ), and the SAR images are 128 × 128 pixels in size and of 1 foot ×1 foot resolution. In most of papers, to demonstrate the robustness of the proposed networks to the variation of angles, the SAR images used for training and testing usually correspond to two different depression angles (e.g.

 and 17  ).
Supervised learning: For comparison purpose, we first look at the application of traditional machine learning method to address this problem. The topic has been thoroughly reviewed in [105]. More recently, in [106], Clemente et al. utilized K-nearest neighbor for ATR against compound Gaussian noise, which was added to the MSTAR datasets manually. The features were represented by Krawtchouk moments, and the selection of testing/training samples were randomized in each Monte Carlo run. Using only 191 training samples, the network proposed in [106] reached an accuracy of 93.86%.
Semi-supervised learning: Since the manual feature extraction usually induces high computational complexity while the auto feature extraction is a time-consuming Unsupervised learning: One way to realize unsupervised learning with limited training data samples is to employ transfer learning. In [110], Huang et al. proposed a DNN composed of stacked convolutional autoencoders, which was trained with unlabeled SAR images for the subsequent transfer learning rather than the commonly used ImageNet, which contains optical images that are far different from SAR images. In [111]- [118], data augmentation was performed to boost the training dataset in addition to transfer learning to further improve the classification accuracy. Specifically, in [111], Zhong et al. employed three classic CNNs, namely CaffeNet, VGG-F, and VGG-M, that have been pretrained with the ImageNet dataset. The data augmentation method used in [82] was adopted, and 2700 images for each class were obtained via randomly sampling 88 × 88 patches from the 128 × 128 SAR image chips. With network pruning (a maximum of 80% filters pruned) and recovery employed, the networks presented in [111] is 3.6 times faster than the A-ConvNets proposed in [82] at the cost of 1.42% decrease in accuracy. In [112], Ding et al. an all-in-one 6-layer CNN was proposed, and three types of data augmentation, namely posture synthesis, translation, and noise-adding were combined. With training samples augmented to 1000 per class, the network in [112] reached a test accuracy of 93.16%. In [113], Yu et al. proposed a 13-layer CNN, with the input data preprocessed with Gabor filters. The center 88 × 88 pixels of the SAR images were cropped to reduce the computational burden, and the training dataset was augmented with the approach proposed in [112]. By replacing 1%-15% pixels in target scene with randomly generated samples, the anti-noise performance of the proposed network was demonstrated. In [114], data augmentation was performed by first using improved Lee sigma filtering to remove speckles and then adding random noises. The proposed 9-layer CNN reached a high accuracy of 98.7% with 1900 training samples.
In [115] and [116], Lewis and Scarnati pointed out that the synthetic SAR images obtained by simply manipulating the real SAR images as the ordinary optical images are of poor quality (despite of the resemblance between them in "appearance"), and using only the synthetic data in the training process could lead to dramatic performance degradation. For example, the SAR ATR CNN in [117] achieved only a 19.5% accuracy when trained with synthetic data and tested with real data. Therefore, in [115] and [116], 3D CAD models of targets were used to synthesize the Synthetic and Measured Paired and Labeled Experiment (SAMPLE) dataset. The input data was preprocessed with t-SNE for dimension reduction, and variance-based joint sparsity was employed for denoising. Moreover, the clutter was transferred from real to synthetic SAR images via task masks. With 50% real data from the MSTAR dataset and 50% synthesized data generated with the GAN, the modified DenseNet proposed in [115] reached an accuracy of 92%. In [118], dual parallel GAN (DPGAN) made of a generator with 4 convolution layers and 4 deconvolution layers and a discriminator with 4 convolution layers was proposed. The raw images with opposite azimuth were merged together for shadow compensation. With 300 GAN-augmented training samples, the 5-layer CNN proposed in [118] reached a high accuracy of 99.3%.
The networks proposed in [106]- [118] along with the number of MSTAR samples used for training and the corresponding accuracies are summarized in TABLE 6, where "AUG" represents training data augmentation. Since transfer learning plays a key role in improving the accuracy of DNNs with limited training data while reducing the training time, the readers are also referred to [119], in which how to apply transfer learning in SAR ATR were discussed in detail (note that it was concluded in [119] that simple "domain adaption based transfer learning" by applying a DNN model pretrained with natural optical images, e.g. ImageNet, directly to the problem of SAR image classification/recognition does not work well). Finally, although the MSTAR data set has been widely used for the training of SAR ATR DNNs [106]- [118], some researchers resort to a few SAR image datasets obtained by TerraSAR-X that have been made available to public, which include the landscape mapping dataset [120], the ship detection dataset [121], [122], and the vehicle detection dataset [123].

(2) Adversarial attacks
According to literatures, one most intriguing feature of adversarial attacks is that by slightly changing some pixels of a picture (changes so trivial that humans can't even notice), the DL-based image classification algorithm will be fooled to make unbelievable mistakes. For example, if we add a toaster sticker to a banana, it could be misclassified as toaster by a DL-based classifier [46]. Based on the adversary's knowledge on the network to be attacked, adversarial attacks could be classified as white-box, greybox and black-box attack (see Section III-C for details). Moreover, an adversarial attack is said to be "targeted" if the adversarial examples have been designed to be misclassified as a specific type of target and "nontargeted" otherwise. The research in the field of adversarial attacks resembles a cat-and-mouse game: many algorithms are designed to misguide the existing DNNs into misclassification, while the others are developed to improve the robustness of the DNNs to adversarial examples via adversarial training, adversarial detection, gradient-masking, etc. In this subsection, we will give a brief introduction to several highly-cited adversarial attack algorithms proposed in recent years. Before we move on to introduce original research works on this topic, we will first provide some background information on commonly used attack methods that are readily available as Python toolboxes free for download [124].
The adversarial attacks widely adopted by DNN attackers generally belong to three categories: the gradient-based attacks, the score-based attacks, and the decision-based attacks.
The gradient-based attacks utilize the input gradients to obtain perturbations that the model predictions for a specific class are most sensitive to. The fast gradient sign method (FGSM), the Basic Iterative Method (BIM), the iterative least-likely class method (ILCM), the Projected Gradient Descent (PGD) and the DeepFool are some of the most famous attack methods belong to this group [124]. The FGSM proposed by Goodfellow et al. [126] utilizes the loss function with respect to the input to create an adversarial example that maximizes the loss so that it will be misclassified. The BIM, which is also referred to in literatures as the iterative fast gradient sign attack method (I-FGSM), and the ILCM were all proposed by Kurakin et al. in [127]. The BIM is a straightforward extension of the FGSM method, which seeks to maximize the cost of the true class along small steps in the gradient direction in an iterative manner. In contrast, the ILCM iteratively maximize the probability of specific false target class with lowest confidence score for clean image. The PGD-based attack method [128] is essentially the same as the BIM except that for PGD, the example is initialized at a random point in the ball of interest determined by the l  norm. The DeepFool method proposed by Moosavi-Dezfooli [129] first computes the minimum distance it takes to reach the class boundary assuming that the classifier is linear, then makes corresponding steps towards that direction.
The score-based attacks do not require gradients of the model or other internal knowledge about the networks to be attacked, but need to know the probability that the input samples belong to a certain class, i.e. the probability labels. It is less popular than the gradient-based attacks. The single-pixel attack proposed by Narodytska and Kasiviswanathan [130] in 2017 is a typical score-based attack. It probes the weakness of a DNN by changing single pixels to while or black one at a time. In 2019, an alternative single-pixel based approach was proposed in [131], which relies on the differential evolution algorithm and achieved a high successful-misguiding rate by only modifying less than 5 image pixels. In contrast, the decision-based attacks rely only on the class decision made by the targeted networks and does not require any knowledge regarding gradients or probabilities. This last category of adversarial attacks includes the boundary attack [132], the noise attack, and the blur attack (for images only) [124].
In the following, we will concentrate on the application of adversarial attacks in radar ATR. In [133], Huang et al. proposed four algorithms to misguide multi-layer perceptron (MLP) and CNN designed for radar ATR using HRRP. Two of them are fine-grained perturbations (i.e. the adversarial sample to be updated according to the input), while the other two are universal perturbations (i.e. image-  [136], the nontargeted black-box universal adversarial perturbation (UAP) was employed to fool the CNNs, for which the success rate in misguiding the network was higher than 80%. As was mentioned before, although the mainstream research in the field of adversarial examples aims to "attack", a considerable number of researchers work on the "defence" side, i.e. to improve the robustness of the DNNs to adversarial examples via adversarial training, adversarial detection, gradient-masking, etc. For example, in [138], the competitive overcomplete output layer (COOL) was designed to replace the commonly used softmax layer for improved robustness of the CNN against the adversarial examples generated by DeepFool.

V. DL FOR RADAR INTERFERENCE SUPPRESSION
Jamming and clutter are two types of interferences that limit the performance of modern radar systems. In this section, various DL-based jamming recognition and antijamming algorithms are reviewed. The technical trends in using the DNNs to address the challenging problem of marine target detection in sea clutter are also discussed.

A. JAMMING
In [145]- [151], various DNNs were designed for jamming signal classification, with the majority of them being CNNs. The main features of these networks are summarized in TABLE 8, along with the types of jamming signals that have been used for network training and performance testing. Specifically, In [146] and [147], an improved Siamese-CNN (S-CNN) was proposed, which is composed of two 1-D CNNs for feature extraction from the real and the imaginary parts of the data, respectively. This network only needs 500 training samples for each target class, and its performance were compared with various machine learning methods (e.g. the SVM). In [148] and [149], the 1-D jamming signals were transformed to 2-D time-frequency images via time frequency analysis so that they could be processed with CNN. In [149], a DNN based on the bilinear EfficientNet-B3 and the attention mechanism was proposed. The model parameters of EfficientNet-B3 obtained in the pretraining process using the ImageNet dataset were used as the initial weights of the proposed network. Note that EfficientNet-B3 belongs to a large family of EfficientNet algorithms (named as EfficientNet-B0 to B7) [150]. Although the accuracy of EfficientNet-B3 is 4% lower than that of EfficientNet-B7, the amount of model parameters involved in the former is only 1/5 of the latter, which indicates less training time. In [151], a VGG-16 variant was developed for barrage jamming Except for the works discussed above, using the DL-based approaches to perform target classification in the presence of jamming [152], to choose the optimum anti-jamming strategy for radar [153], [154], to analyze the probability of radar being jammed [155], and to adaptively select the best method to jam an enemy radar [157] have also been investigated. The DNN structures proposed in these works and their distinctive features are summarized in TABLE 8. Finally, a detailed discussion regarding the application of artificial intelligence in electronic warfare systems was presented in [158], which is also recommended for readers who are interested in the recent trends of DL-based jamming/anti-jamming techniques.

B. CLUTTER
Marine target detection is a much more challenging task for radar than ground moving target detection due to the highly nonhomogeneous and time-varying clutter incurred by the sea. An early attempt of using machine learning methods for target detection in the presence of sea clutter was made in [159], where k-Nearest-Neighbor and SVM were used for marine target/clutter classification using the data collected by the S-band NetRAD system jointly developed by the University College London and the University of Cape Town [160].
With DL gaining popularity in recent years, many researchers resort to DNNs to further improve the detection performance of marine radars [161]- [164]. Specifically, in [161], Pan et al. used the Faster R-CNN proposed by Ren et al. in [104] for target detection using the sea clutter dataset collected with the X-band ground-based Fynmeet marine radar by the council for scientific and industrial research (CSIR). In [162], Chen et al. proposed a dual-channel convolutional neural network (DCCNN) made of LeNet and VGG16, for which the amplitude and the timefrequency information were used as two inputs, and the features extracted from the two channels were fused at the FC layer. One distinctive characteristic of [162] is that softmax classifier with variable threshold and SVM classifier with controllable false alarm rates were designed. The performance of the proposed network was tested with two datasets, the Intelligent PIXel processing radar (IPIX) dataset collected by the fully coherent dual-pol X-band radar for floating target and the CSIR dataset for maneuvering marine target. In [163], a fully convolutional network (FCC) with 20 layers were proposed for ship detection in SAR images collected by Gaofen-3 and TerraSAR-X. It is worth mentioning that pixel truncation was implemented as a preprocessing procedure assuming that the potential ship pixels are brighter than the clutter, which is not necessarily true. Finally, in [164], a DL-based empirical clutter model named as the multi-source input neural network (MSINN) was proposed to predict the sea clutter reflectively. This model was tested with the sea clutters collected by ground-based UHF band polarized radar and was proven to fit the measurement data better than the existing empirical sea clutter models.
Although most research papers in this field focus on sea clutter, DNNs have also been designed to address other types of clutter. For example, in [165], Cifola et al. considered the problem of clutter/target recognition for drone signals polluted by wind turbine returns. A denoising adversarial autoencoder was designed, the performance of 1. Deep neural network: 4 hidden layers 2. LSTM: (2 × LSTM + 1 × FC) ① pure noise; ② interrupted sampling repeater jamming (ISRJ); ③ aiming; ④ blocking; ⑤ sweeping; ⑥ distance deception; ⑦ dense false targets; ⑧ smart noise; ⑨ chaff; ⑩ noise amplitude modulation jamming (AM); ⑪ noise frequency modulation jamming (FM); ⑫ noise convolution jamming (CN); ⑬ noise product jamming (CP); ⑭ smeared spectrum jamming (SMSP); ⑮ chopping and interleaving jamming (C&I); ⑯ comb spectral jamming (COMB); ⑰ single frequency; ⑱ narrowband barrage; ⑲wideband barrage; ⑳ rectangular wave convolution jamming; which was tested with the micro-Doppler signatures of drones and wind-turbines measured with X-band CW radar. In [166], Lepetit et al. used U-Net, a CNN variant that was originally proposed for medical image segmentation, to remove clutter from precipitation echoes collected by weather radar. 150,000 images collected by the Trappes polarimetric ground weather radar in Mé té o-France were used for network training.
The DNN structures presented in [161]- [166] and their main features are summarized in TABLE 9. Note that except for the works mentioned above, deep convolutional autoencoders were proposed for target detection in sea clutter in [167], [168], and a LSTM-based network was designed for sea clutter prediction in [169]. Since these networks were tested only with simulated data, they are expected to exhibit noticeable performance degradation in real-life detection scenarios.

VI. CONCLUSION
In this work, we consider the application of DL algorithms in radar signal processing. With the DL gaining popularity rapidly in recent years, DL for radar signal recognition, DL for ATR based on HRRP/Doppler signatures/SAR images, and DL for radar jamming recognition & clutter suppression have been explored thoroughly by many researchers. Although classification accuracies of 98%-100% have been reported in many research works on radar ATR with DL networks using the MSTAR dataset, it should be emphasized that there is a long way to go before the DL approaches become qualified substitutes for the classic radar ATR methods. Firstly, DL networks demand large amount of training data. Unlike the typical problem of image classification, for which large amounts of training data are available online, representative real-world HRRPs and SAR images that are labelled with accurately verified targets are simply not readily available for everyone at demand. Not to mention that a network trained under a specific environment doesn't work the same way when the environment changes. Secondly, although some DL networks reach high accuracies with limited training data, most of them were tested with only the MSTAR dataset, which has also been used to prove the high -accuracy performance (above 97%) of traditional machine learning based ATR methods 20 years ago. Moreover, the ever-evolving adversarial attacks also post great security risk to the DNNs. This work provides a full picture of numerous potential research opportunities and grave challenges in applying the DL-based approaches to address the existing problems in radar signal processing, which serves as a good reference work for researchers interested in this field. Separation of drone signals from wind turbine clutter Denoising adversarial autoencoder: discriminator (6 × Conv) + autoencoder  Autoencoder trained to reconstruct spectrogram containing drone-only signal  Discriminator is used to distinguish drone-only signal from clutter

Lepetit et al. [166]
Clutter removal from echoes collected by weather radar U-net  Two sets of data used: weather-plus-clutter images & clutter-only images (clean images not required for training)