Multi-View HRRP Generation With Aspect-Directed Attention GAN

In radar automatic target recognition (RATR), high-resolution range profile (HRRP) has received intensive attention due to its low computational cost. As HRRP is sensitive to the aspect of the target, a training set covering sufficient aspects is essential to the success of an RATR model, which is however intractable in complex environment with noncooperative targets. In this article, an aspect-directed attention generative adversarial network is proposed to generate multiview HRRPs using real samples from few aspects. The key is that the HRRPs from the similar targets share the same aspect variation pattern. Hence, an HRRP is decomposed into its identity and aspect features via an aspect-directed disentangled representation network with self-attention modules. In the training stage, the decomposition network and the aspect variation pattern are learned from full aspect samples of cooperative targets. When generation, the desired multiview HRRPs of the noncooperative target are synthesized by its identity features extracted from few aspect samples and the learned aspect variation pattern. Three types of experiments on the simulated and measured datasets demonstrate the generation performances of our method. First, the generated HRRPs are visually compared with the truth. Second, the similarity of the scattering center power and handcrafted feature distributions are quantitatively evaluated. Finally, recognition experiments verify the feasibility of data augmentation with the generated HRRPs. Extensive results show the superior performance of our method over other state-of-the-art methods.


I. INTRODUCTION
H IGH-RESOLUTION radar plays a crucial role in target recognition owing to the capability of acquiring fine target characteristics [1], [2], [3], [4], [5]. In general, the fundamental characteristic is the high-resolution range profile (HRRP) that represents the distribution of target scattering centers along the line-of-sight [1], [6], [7], [8]. Due to the low requirement for its acquisition, storage, and processing, HRRP recognition has received intensive attention and has been employed in various applications, such as drone detection and perimeter surveillance [9], [10].
Although fruitful achievements have been made, HRRP recognition is still a nontrivial task [11], [12], [13], [14]. One difficulty originates from the aspect sensibility of the HRRP, namely the change of HRRP with the relative aspect between radar and target. In [15], it was shown that even a 0.2°change of aspect could result in markable variation of the HRRPs. In Fig. 1, four HRRPs from a public dataset with an aspect interval of 0.5°are shown to demonstrate the aspect sensibility. Although the major envelope changes slightly, the fine structures, such as the number, positions, and amplitudes of the scattering centers, vary significantly. This phenomenon is the primary cause of the degradation of target detection and recognition performance. Therefore, dealing with aspect sensibility is essential to HRRP recognition in noncooperative environment.
Methods for resolving aspect sensitivity have been widely discussed. The existing methods can be divided into three categories: feature-level; model-level; and data-level. The featurebased methods aim to extract aspect-invariant features, such as scattering centers, weighted profiles, and high-order spectral feature [2], [16], [17], [18], [19]. The model-level method improves the recognition model to reduce the influence of aspect sensibility [20], [21], [22], [23], [24], [25]. The data-level method supplements the training datasets with artificially generated HRRPs for the unacquired aspects [26], [27], [28], [29]. The generation can be achieved by either electromagnetic simulation or some statistical model. Compared with the other two categories, the data-level method is considered to be the fundamental one without the modification of the processing algorithms. However, multiview HRRP generation remains extremely challenging for a noncooperative target, since the precise knowledge of its electromagnetic or statistical model is hard to obtain.
In this article, the multiview HRRP generation task is addressed by the deep generative networks (DGNs). Recently, DGNs, especially generative adversarial networks (GANs), have proven the ability of generating new realistic-looking samples [30], [31], [32], [33], [34], [35], [36]. In radar, GAN has been applied in various applications, such as meteorological radar extrapolation [37], synthetic aperture radar (SAR) image enhancement [38] and optical and SAR image matching [39]. As for radar data generation, most existing works focus on SAR image synthesis. The feasibility was first demonstrated in [40]. Thereafter, a few efforts have been made to improve the convergence, stability and quality of the model [41]. In [42], Wasserstein GAN is used for training data augmentation in SAR target recognition with faster convergence and higher quality samples. Recently proposed models also show that the class (or label) information is helpful to improve the quality of generation [43], [44]. There are few works on HRRP generation as well [45], [46]. In [45], DCGAN is used for imbalanced HRRP recognition and the quality is comprehensively evaluated on data-, feature-and recognition-level. Shi proposed to use a pretrained GAN and transferred features for generation from only one known sample [46]. Although the existing approaches have demonstrated promising results in radar data generation, the aspect information is not integrated into the generation model and the generation quality of the unacquired aspects still needs improvements. In [47], the pose label is concatenated with the class label to supervise the SAR image generation model. The pose is represented by the sine and cosine values of the view angle, and the class label is coded as a one-hot vector. This provides the capability to generate high quality images for specified pose angles.
This article proposes an aspect-directed attention GAN (ADA-GAN) to generate multiview HRRPs for unacquired aspects. We argue that the challenge of multi-view HRRP generation lies in the following three points.
1) The variation of HRRP with respect to the aspect is hard to capture, since aspect features are coupled with the identity features.
2) The supervised manner of aspect angle is difficult to model, since the current GAN structures lack a continuous control variable.
3) The number of available aspects of noncooperative target is very limited. To address the above issues, the key point of ADA-GAN is an attention-based disentangling module that decouples an HRRP into its aspect feature and the identity feature. The aspect angle is used to supervise the representation of aspect feature via aspect regression and also the generation process via a continuous loss function. Meanwhile, the identity label is used to supervise the representation of identity feature. The input of the generator includes the identity feature and the aspect feature corresponding to the given aspect angle. The ADA-GAN is first trained using full aspect samples of targets whose structures are similar to the desired target. Then, the model is fine-tuned using the limited aspect samples of the desired target.
The contributions of this article are summarized as follows. 1) To the best of our knowledge, the ADA-GAN is the first approach to learn the aspect and the identity feature jointly in HRRP generation. The proposed disentangled representation framework is able to handle the coupling of the aspect and identity and thus improve the stability and quality.
2) The ADA-GAN is supervised by the aspect information and it is able to generate HRRPs with specified continuous aspect angles.
3) The generation quality is evaluated on three levels: visualized performance; feature evaluation; and recognition performances. The evaluation methods offer an option of standard metrics for HRRP generation. The remainder of this article is organized as follows. In Section II, the conditional GAN is briefly introduced. Then, a detailed description of the proposed ADA-GAN is presented in Section III. Section IV introduces the experiments on both simulated and measured HRRP datasets to validate the effectiveness of the ADA-GAN. Finally, the conclusion is summarized in Section V.

II. CONDITIONAL GENERATIVE ADVERSARIAL NETWORK
As an important branch of generative models, GAN has achieved state-of-the-art results in producing real-like data. The GAN consists of two networks, the generator G and the discriminator D, pitting one against the other. The generator captures the distribution of the acquired data by fitting a mapping from the "latent space" to the "data space," while the discriminator discerns whether the given data are real or fake. The training stage involves a two-player minimax game. The generator tries to produce samples with a distribution close to the acquired data, and the discriminator is aimed at distinguishing the generated samples from the real ones. This game stops until Nash equilibrium, i.e., the generated data can deceive the discriminator, which can be formulated by the following objective function: where θ G and θ D denote the parameter of the G and the D, E denotes the expectation, x represents data taken from the real data distribution P data , z stands for a noise vector from a Gaussian prior distribution P z . In practice, (1) may not provide sufficient gradient for G to learn well. Thus, it is better for G to maximize log(D(G(z))) instead of to minimize log(1-D(G(z))) [48]. Therefore, the adversarial loss function of the G and the D can be formulated as follows: Despite the astonishing performances, the basic GAN is unable to generate data with a specified label for the lack of supervision. To this end, the conditional GAN (CGAN) adds some conditional information to both the generator and discriminator to supervise the generation process. In the basic CGAN, the conditional information is combined with the latent space as the input of the generator as shown in Fig. 2. This modification has proven to be effective and has been widely used in various applications. The resulting objective function is where y denotes the conditional label. Compared with the GAN, the CGAN added the conditional label y. The adversarial loss function can be formulated as follows: The traditional CGAN considers that one sample is only related to one conditional label. However, HRRPs are related to not only the identity, but also the aspect. Furthermore, the coupling of the two features always leads to poor generation performance. Therefore, exploiting the identity and aspect features simultaneously is the key to the success of employing CGAN for multiview HRRP generation.

III. ASPECT-DIRECTED ATTENTION GAN
The proposed ADA-GAN employs the core concept of CGAN that the data generation is guided by the additional information. In this way, the aspect of the generated HRRP can be manually set. The conundrum is then the design of the "latent space" which can capture the intrinsic features of the HRRPs. For this purpose, the generator in ADA-GAN is built in an encoder-decoder style, and an aspect directed attention mechanism is employed to extract the intrinsic features. The major novelty is that the encoder is formed by an aspect directed attention-based disentangling module that decomposes an HRRP into two features: the aspect feature and the identity feature. The aspect feature implies the aspect variation pattern, while the identity feature is inherently related to the class label. To supervise the decomposition process, an aspect regression task and an identity classification task are jointly incorporated. For the decoder part, the aspect and the identity feature are used as the conditional information in CGAN.

A. Network Structure
The generator G is built in an encoder-decoder style. The encoder is applied to obtain the decomposition of an HRRP, i.e., the identity features f i and the aspect features f a , and the decoder is used to fits the mapping from the view aspects and the identity features to the HRRPs. The same as the CGAN, the discriminator D is applied to distinguish generated HRRPs from real HRRPs. As shown in Fig. 3, the overall structure of ADA-GAN consists of three components: the aspect-directed attention encoder; the continuous aspect embedded decoder; and the convolutional network based discriminator.
In the first component, given an N-element HRRP vector x ∈ R N with label y = {y a , y i }, where y a and y i stand for the aspect and the identity labels, a convolutional module is applied to extract the high-level features f m . This feature is thought to be the mixture of the identity and the aspect information. Thus, an attention-based feature factorization module based on aspect-directed self-attention is employed to extract the identity features f i and the aspect features f a respectively, and this module is introduced in the Section III-B. Because an HRRP always consists of hundreds of range resolution cells, the self-attention module proposed by [45] is introduced to efficiently model relationships between long-range separated cells. Afterward, the two types of features are mapped to the aspect label y a and the identity label y i through the aspect regression module G A and the identity classification module G I . The two mappings are supervised by an aspect regression task and an identity classification task, respectively.
The second component is applied to generate an HRRP. The input of the decoder includes the aspect angle θ, the high-level features f m and the extracted identity feature f i . A reversed FC network with respect to that in the first component is used to project the input aspect angle into the corresponding aspect feature f θ . To avoid the information loss and the vanishing gradient [50], the mixed features is utilized via a shortcut path. The combination of the aspect, the identity and the mixed features enables a deconvolutional network to synthesize an HRRP. This structure offers the flexibility to control the aspect and identity of the generated HRRPs.
The third component uses a convolutional network as the discriminator. In the convolution block, each convolution layer is followed by a batch normalization layer, which can effectively accelerate network training. To reduce the dimensions of the result, another convolution layer with a 1 × 1 filter size is used after the convolution blocks. In the classification block, the result of the convolution block is flattened into a 1-D feature. A Softmax layer is followed to estimate the probability of the input sample belonging to the real.

B. Attention-Based Feature Factorization Module
From the formation process, it can be concluded that an HRRP contains intrinsic identity and aspect information. The key is then to determine a joint representation framework that can embed the identity and aspect features. In the human face recognition, there are some factorization methods to decompose mixed features. In [51], the summation model (i.e., f m = f i + f a ) is used to describe the mixed features and a decoupling method is proposed for decomposition. In [52], the mixture is modeled as the multiplication of the features (i.e., f m = f i × f a ) and the decomposed element can be expressed as f m 2 and f m / f m 2 . However, these two methods ignore the long-range dependence within the internal representation of the samples and the decomposition is unsupervised which may lead to unstable training.
Recently, an attention based decomposition method is proposed to address the above drawbacks [53]. It decomposes the mixed features by where • denotes element wise multiplication, λ() represents a certain attention mechanism. This manipulation factorizes the mixed features in the semantic space to alleviate the coupling in the data space. Inspired by this article, we propose a joint representation framework using self-attention mechanism. As shown in Fig. 4, the aspect-related feature f a in the mixed features is separated through the attention module supervised by an aspect regression task, and the residual part, regarded as the identity-related features f i , is supervised by an identity classification task. As a result, the attention mechanism constrains the decomposition module, better at extracting the aspect-related features in the mixed feature maps. In the proposed module, a cascaded of self-attention layers is used as the feature extractor. According to [54], the attention mechanisms can be divided into three categories: distinctive, co-attention, and self-attention. The first two categories concern the difference between multiple inputs, while self-attention aims to extract the inner information of one input. In HRRP generation, the performance relies on the representation of the inner information. Thus, the self-attention is selected. Furthermore, the architecture in Fig. 4 is a classics self-attention that outperforms on extraction of the relation between different points in one input. HRRPs can be assumed as combine of different scattering centers, and the relation between different scattering centers is important for HRRP generation. The architecture is a suitable selection for the HRRP generation.
The process of the self-attention operator is shown in Fig. 4 Three matrices query matrix Q(f m ), key matrix K(f m ), and value matrix V(f m ) are obtained by multiplying the input matrix standing for the original feature map f with three matrices, weight query matrix W Q , weight key matrix W K , and weight value matrix W V , which are trained during the training process. Then, the scores representing the weight of the input matrix are acquired by multiplying the query matrix Q(f m ) and the key matrix K(f m ). Moreover, the Softmax operation is employed to normalize the scores, resulting in an attention map. Finally, an output matrix named self-attention feature map o is obtained by multiplying the attention map by the value matrix V(f m ).

C. Loss Function
The total loss function consists of the adversarial loss similar to that in the CGAN and three additional terms corresponding to the proposed encoder-decoder styled generator.
The adversarial loss is formulated as where x represents data taken from the real data distribution P data and θ denotes the input aspect angle. Two additional terms are related to the aspect-directed attention encoder where an aspect regression task and an identity classification task are used together as the supervision to the decomposition of the mixed feature. Thus, the loss function herein is defined as where L ASP and L ID are the loss functions for the aspect regression and the identity classification, respectively, and λ ASP and λ ID are the weighting coefficient. The mean squared error is employed for the aspect regression. Let G A (f a ) stand for the aspect regression module that estimates the aspect from the aspect features f a . The loss function is defined as where ||·|| 2 is the l 2 -norm.
In the identity classification module G I (f i ) that obtains the identity label from the aspect features f i , the Softmax layer is used for output. Therefore, the cross entropy (CE) is applied as the loss function, which is defined as where x[c] denotes the cth element of the vector x, and the n represent the length of the element in x.
The third and the final term corresponds to the continuous aspect embedded decoder. The perceptual loss is added to facilitate the quality of generated HRRPs. A pre-trained 1-D VGG network is adopted to extract the perceptual features. The loss is measured as the difference between generated HRRPs and real HRRPs in perceptual feature space, i.e., The total combined loss to optimize generator and discriminator can be written as where λ * controls the balance of different loss terms.

D. Working Procedure
Since samples from very few aspects can be acquired for noncooperative target, it is difficult to train ADA-GAN directly based on the limited data. To this end, a three-phase strategy is elaborately designed to apply ADA-GAN for multi-view HRRP generation, including pretraining, fine-tuning and generation.
In the pretraining phase, HRRPs from full aspects of some targets are collected as the training dataset (named full data). These samples can be either simulated by electromagnetic computation or measured from cooperative targets. To optimize the generation performance, it is strongly encouraged that the targets in the full data are similar to that in the generation phase. During pretraining, the ADA-GAN is optimized in an alternative manner as in the CGAN. For a fixed discriminator, the encoder and the decoder are optimized though the loss function L G via back projection. Note that the input HRRPs and the input aspects are both randomly selected. Afterward, the discriminator is optimized with respect to the loss function L D . The details of the pre-training phase is described in the following Algorithm 1.
In the fine-tuning, the adjustment has some difference with that in pre-training including the different input and some fixed parameters. The HRRPs from very few aspects of the desired targets are used to adjust ADA-GAN (named limited data). The parameters of encoder Conv layers, decoder Deconv layers and discriminator Conv layers are fixed when adjusting. The detail of the fine-tuning phase shown in the following Algorithm 2.
In the generation phase, the HRRPs for the desired target and the desired aspects are synthesized using the encoder-decoder styled generator. Specifically, the input HRRPs are randomly selected from the "limited data" to ensure the consistency of identity features. The input aspects are manually configured to the desired ones. The output of the decoder is than the expected HRRPs. The above process is detailed in Algorithm 3. a) AFRL Dataset: The X-band simulated HRRP dataset released by the AFRL contains ten classes of vehicles. The simulations covered the elevation angles from 30°to 60°with the whole azimuth as shown in Table I.

A. Datasets and Implementation
In our experiment, the full data contains nine classes and the Limited Data is formed by the one class left. Furthermore,  only the azimuth change is considered. Specifically, the azimuth angles of the full data are set from 0.5°to 10°at an interval of 0.5°, which totally include 20 angles. In the limited data, three azimuth angles (0.5°, 5°, and 10°) are selected. For all the azimuth angles above, four elevation angles (30°, 40°, 50°, and 60°) are used.
The synthetic bandwidth of all HRRP is set to 1.5 GHz, resulting in a range resolution of 0.1 m. White Gaussian noises are added to the simulated HRRPs with the peak signal-to-noise ratio of 20 dB. For each aspect, ten samples with different noises are used.
b) GTRI Dataset: The GTRI dataset contains the measured raw echoes of a T72 tank covering the elevation angles from 28°to 32°with the whole azimuth as shown in Table I. The bandwidth was 660 MHz, resulting in a range resolution of 0.3 m.
In our experiment, HRRPs from ten elevation angles are used and regarded as ten "different classes" to fit the structure of ADA-GAN. Similar to the AFRL dataset, there are nine "classes" in full data and one "class" in the limited data. The azimuth angles of the full data are set from 0.2°to 4°at an interval of 0.2°. In the limited data, three azimuth angles (0.2°, 2°, and 4°) are selected.
2) Implementation Details: For that the HRRP is a complex signal, the real and imaginary parts are used as two channels. The length of the HRRP is set to 192.
In the aspect-directed attention encoder, a standard VGG-16 is applied to extract the mixed features, and the structure of this VGG-16 is modified to 1-D. An attention module consisting of six self-attention layers is employed to decouple the identity and aspect features. Then, the aspect regression module consists of six fully connected layers and the identity classification module is a combination of five fully connected layers and a Softmax layer. In the continuous aspect embedding decoder, a converse  II  FEATURE KL DIVERGENCE BETWEEN THE GENERATED AND REAL HRRPS   TABLE III  DATASET FOR THE CLASSIFIER IN EXPERIMENT 1 VGG-16 is applied for mapping combination of features to the HRRP. In the convnet discriminator, a VGG-16 same to that in encoder and a Softmax layer are applied to distinguish the real HRRPs or the generated HRRPs.
In the experiment, the iterations of the pretraining and the fine-tuning phase are set to 1000 and 100, and the batch size is 64. The training phase employs stochastic gradient descent with adaptive moment estimation and early termination is applied to halt the training process before overfitting occurs. The optimizer is stochastic gradient descent, the learning rates of the generator and the discriminator are both 0.0002, and the momentum of the parameter updates is 0.9. Our framework is implemented using Pytorch backend and performed on NVIDIA GeForce RTX 3090 with 24 GB of memory.

B. Visualized Evaluation
The fidelity of the generated HRRPs is first subjectively evaluated by visual comparison in the data space. For this purpose, the methods used for comparison should be able to control the aspect during the generation process. Therefore, we compared the proposed ADA-GAN to four relevant methods: interpolation [55], conditional variational autoencoder (CVAE) [31], [56], disentangled representation GAN (DRGAN) [57] and PeaceGAN [48]. In the interpolation method, the HRRP of the desired aspect is approximated by the combination of adjacent   For the interpolation method, the generated HRRPs are highly related to the closest aspect because of the larger weight. For example, in Fig. 6, the generated 0.8°HRRP are more like the  VI  COMPARISON OF THE NONCOOPERATIVE TARGET RECALL WITH OTHER AUGMENTATION METHODS: EXPERIMENT 1   TABLE VII  COMPARISON OF THE OVERALL ACCURACY AND THE NONCOOPERATIVE TARGET RECALL WITH OTHER AUGMENTATION METHOD: EXPERIMENT 2   TABLE VIII  ABLATION STUDIES ON THREE Fig. 5, there are two groups of peaks in the real 2°H RRP. It can be seen that weaker group is nearly missing in CVAE while both two groups are well presented in ADA-GAN.
DRGAN and PeaceGAN generally perform much better than interpolation and CVAE. The superiority of ADA-GAN mainly lies in higher similarity in details. Take the 6.5°result in Fig. 5 as an example. The positions and amplitudes of strong and weak scattering centers are well preserved in ADA-GAN. In contrary, DRGAN misrepresents the relative amplitudes and PeaceGAN loses the weak scattering centers. Another example is the 2.6°r esult in Fig. 6. The generated HRRP of ADA-GAN captures TABLE X PARAMETER SIZE, COMPUTATION COMPLEXITY, AND TIME COST the two strong scattering centers, but the results of DRGAN and PeaceGAN both show only one strong peak.
In summary, the proposed ADA-GAN outperforms the four most relevant methods by visual inspection.

C. Quantitative Evaluation
In this section, the Kullback-Leibler (KL) divergence is applied to measure the similarity of the distribution of scattering center power and hand-crafted features between the generated HRRPs and real HRRPs. The methods used for comparison are the same as visualized evaluation. In our experiment, the histogram is used as the approximation to the distribution. Hence, the discrete form of KL divergence is employed where P(i) and Q(i) are probability distributions.

1) Scattering Center Power:
The strong scattering centers are selected for evaluation to alleviate the influence of noise or ground clutter. The real HRRP of the first aspect is used to determine the range cells of selected scattering centers. To automatically localize the range cells, the ordered-statistic constant false alarm rate method is applied to deal with the multiple scattering center problem [58], [59]. Fig. 7 shows the real HRRPs and the corresponding selected range cells for AFRL and GTRI dataset.
Afterward, the power distribution of each selected range cell is denoted by the histogram calculated from the results of all aspects. Then, the KL divergence between real and generated HRRPs is computed using the histograms. Fig. 8(a) and (b) shows the results of the AFRL and GTRI datasets, respectively. For the AFRL dataset, CVAE obtains higher KL divergence than others, because it may miss detailed structures as observed in Fig. 5. The KL divergences of weaker scattering centers (selected range cells from 10 to 15) increase for all methods. The proposed ADA-GAN obtains the lowest KL divergence for almost all selected range cells. For the GTRI dataset, the KL divergences of ADA-GAN are the lowest as well. Further, the results of ADA-GAN show less fluctuation with respect to the range cells compared with other methods. In short, the ADA-GAN outperforms other methods in terms of the fidelity of the scattering center power distribution.
2) Hand-Crafted Features: Hand-crafted features are widely used for HRRP recognition in the pre-deep learning era [13], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69]. The commonly applied features can be divided into three categories: geometric; structural; and power features. In our experiment, a total of eight features are selected for evaluation. In the geometric category, there are three features: the number of scattering centers [60]; the length of the target [61]; and the number of half-power scattering centers which counts the scattering centers with the power above the half of the strongest one [62], [63]. The structural category contains two features: the symmetry that measures the ratio of the two halves [64] and the dispersibility that calculates the average distance between two adjacent scattering centers [65]. Power features include three features: the fluctuation that denotes the amplitude ratio of the peaks to the points around them [66]; the contrast that computes the ratio of the variation to the mean [67]; and the half power ratio that is ratio of half-power scattering centers to the whole power [68].
The KL divergence of the eight features between the generated HRRPs and real HRRPs are given in Table II. The ADA-GAN obtains the lowest KL divergence on most features, which demonstrate its superiority over other methods.

D. Recognition Experiment
Since the proposed work aims at improving the recognition performance, thus recognition experiment is used to evaluate the generated data. Besides the four methods used previously, three additional methods without aspect supervision are employed for comparison as well: CGAN [69]; ACGAN [35], [70]; and CWGAN [41]. The 1-D VGG-16 network is applied as the classifier.
We designed two experiments to evaluate: the similarity between generated and real HRRPs, and the promotion of data augmentation using generated HRRPs, respectively. 1) In the first experiment, the generated HRRPs of the noncooperative target are treated as the testing set. The detailed settings are given in Table III. The training set for the classifier contains real HRRPs with full aspects for both the cooperative targets and the noncooperative target. The recall rate of the noncooperative target is selected as the evaluation metric. 2) In the second experiment, the generated HRRPs of the noncooperative target are treated as training set. It consists of two steps: basic experiment and data augmentation experiment. The basic experiment obtains the recognition performance with insufficient aspects and the detail setting is given in Table IV. The training set for the classifier contains six-tenths samples randomly drawn from the real HRRPs. All aspects are covered for the cooperative targets and only three aspects are selected for the noncooperative target. In the data augmentation experiment, the generated HRRPs with the miss aspects are added to the training set for the noncooperative target. The detail setting is given in Table V. The overall accuracy and the recall rate of the noncooperative target are selected as the performance indicators. As given in Tables VI and VII, the proposed method outperforms other methods in both two experiments. For the first experiment, the recall rate of generated HRRPs using ADA-GAN are around 98% for both AFRL and GTRI datasets, indicating that most generated HRRPs can be correctly recognized. The supervision of aspect is essential since the recall rates of the corresponding methods are much higher. Among this group of methods, the proposed ADA-GAN presents an increase of 0.3% to the PeaceGAN for the ARFL dataset and 4.8% to the DRGAN for the GTRI dataset. This implies that the GAN-based models hold greater potentials than CVAE-based models and the proposed method can generate more similar HRRPs than the state-of-the-arts.
For the second experiment, the proposed method still achieves the best performances. The overall accuracy reaches 84.8% on the AFRL dataset and 85.7% on the GTRI dataset. The improvements are above 5%. The recall rate of the noncooperative target is improved by 46.3% on the AFRL dataset and 41.1% on the GTRI dataset.

E. Ablation Experiment
In this section, ablation experiments are designed to analyze the effectiveness of proposed attention-based feature factorization module. There are three factors that may influence the generation performance: the basic feature factorization structure, the self-attention layers and the aspect regression task. Thus, we conducted three ablation experiments by adding different modules progressively. The corresponding versions of our methods are as follows.
1) There is no decomposition module and the supervision of aspect is modeled as a classification task, i.e., the mixed feature f m is directly used for both aspect and identity classification.
2) The decomposition module is constructed by the convolution network and the supervision of aspect is modeled as a classification task.
3) The decomposition module is the same as that in ADA-GAN and the supervision of aspect is modeled as a classification task. The generated HRRPs are regarded as the training set, and the real HRRPs are regarded as the test set. The results are given in Table VIII. The Basic feature factorization structure contributes the most that it can improve the recognition performance by about 40% on the recall rate on the GTRI dataset. The selfattention layers and the aspect regression task also can improve the recognition performance about 10% and 8% on the recall rate on the AFRL dataset.

F. Loss Weight Evaluation
In this section, the influences of four loss weights in (15) and (16) are evaluated, and the overall accuracy in the recognition experiment 2 is used as the performance index. The default values of all weights are set to 1. We evaluated the performances of each weight with the values of 1, 2, and 5. The result are given in Table IX. For the weights λ ASP and λ ADV , the accuracy decline while the value increases. The λ ID performs opposite to the former two coefficients. When the λ PE is set to 2, the accuracy is better. Thus, the optimal values of the loss weight are selected as 1, 5, 2, 1.

G. Computational Complexity
We use the amount of parameters and running time of the forward progression the compare the computational complexity. As given in Table X, while method without aspect supervision have less parameters and forward time, the overall accuracy is lower than 72.4%. The CVAE has least parameters, however the overall accuracy is 10% less than ADA-GAN. The DRGAN, PeaceGAN and ADA-GAN have similar forward time, and the overall accuracy of ADA-GAN is 5% higher than the DRGAN and the PeaceGAN.

V. CONCLUSION
In this article, we propose a novel HRRP data generation method termed the ADA-GAN. The ADA-GAN can decompose the aspect and identity features from an HRRP using an aspectdirected attention network and generate HRRPs with the desired aspect. Specifically, it is trained on the known HRRP dataset to learn the pattern of aspect changes, and then, the HRRPs of noncooperative targets with missing aspects can be generated. Three types of experiments demonstrate that ADA-GAN outperforms other state-of-the-art generation methods. First, the generated HRRPs are similar to the real HRRPs in terms of the scattering center and power distribution in the visualized evaluation. Second, quantitative evaluation of the power distribution show that the generated HRRPs are also close to the real HRRPs in the power, geometric and structural feature domains. Furthermore, the recognition experiments indicate three points: the generated HRRP using ADA-GAN can be correctly categorized; although most aspects of the noncooperative target is missing, the overall accuracy ratio reaches about 85%; and as the known dataset is extended with the generated HRRPs, the recall ratio of the noncooperative target improves by over 40%.