SAR Target Recognition via Random Sampling Combination in Open-World Environments

Target recognition in SAR images was widely studied over the years. Most of these works were usually based on the assumption that the targets in the test set belong to a limited set of classes. In the practical scenarios, it is common to encounter various kinds of new targets. It is therefore more meaningful to study target recognition in open-world environments. In these scenes, it is needed to reject the unknown classes while maintain the classification performance on known classes. In the past years, few works were devoted to open set target recognition. Though the detection performance on unknown targets can be improved to a certain extent in the preceding works, most detection schemes are independent of a pretrained feature extractor, leading to potential open space risks. Besides, the model architectures are complicated, resulting in huge computational cost. To solve these problems, a family of new methods for open set target recognition is proposed. Targets indistinguishable from known classes are constructed by random sampling combination strategy. They are further sent into the classifier for feature learning. The original open-world environment is then transformed into a closed-world environment containing the unknown class. Moreover, the special implication of generated unknown targets is highlighted and used to realize unknown detection. Extensive experimental results on the MSTAR benchmark dataset illustrate the effectiveness of the proposed methods.


I. INTRODUCTION
D ESIGNING target recognition systems has received considerable critical attention for synthetic aperture radar (SAR) data in a real-world environment. With the characteristics of active coherent imaging, SAR acquires high-resolution surface image data all day, all weather, and plays an irreplaceable role in modern high-tech information warfare. Recent developments in remote sensing technology have enabled more and more SAR images to be acquired. The interpretation of large-scale SAR images is an increasingly important area, in which target recognition is one of the research hotspots [1], [2], [3]. Most current research of SAR target recognition methods focuses only on a closed-world environment. The closed-world environment describes such a scenario, where the classes in the training set are consistent with the classes in the test set. These classes included in the training set are called known classes, and targets appeared in the test set all belong to known classes. The main task of target recognition is to accurately divide targets into one of the known classes in a closed-world environment, which is called closed-set recognition (CSR). Traditional CSR technology mainly includes three stages: data preprocessing, feature extraction, classification, and recognition. Because there is a heavy dependency on a large amount of professional knowledge and prior information to manually design feature extractors, these technology have high computational complexity and poor generalization performance. With the continuous development of deep-learning theory, various methods based on automatic feature extraction of neural network have shown significant advantages and become mainstream methods.
Because SAR image data are scarce while the learning process of CNN requires a large amount of data, some scholars propose to augument the training sample set to improve recognition performance. For example, Ding et al. [4] extracted the attributed scattering centers of original SAR images to reconstruct targets to expand the database. Wang et al. [5] designed a semisupervised learning framework including self-consistent augmentation rule, mixup-based mixture, and weighted loss, which allows a classification network to utilize unlabeled data during training. Similarly, Zheng et al. [6] proposed to generate new samples with the help of generative countermeasure network. And these unlabeled generated images are input to CNN together with the labeled images for semisupervised recognition. The expansion of sample set effectively prevents model overfitting caused by the small amount of training data. However, the quality of these augmented samples is difficult to guarantee. When augumented features are not representative, the existing classification performance is affected. What is more, some CSR methods optimize classification models by combining CNN and other deep-learning models such as autoencoder and SVM. For example, Wagner [7] suggested replacing the fully connected layers of CNN by a collection of SVMs for the final classification. In addition, elastic deformation and affine transformation are used to expand the training set. By optimizing the algorithm structure, such methods aim to reduce network complexity while improving classification accuracy. However, the generalization ability is relatively poor when dealing with small training datasets. Besides, some target recognition methods based on multifeature This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Fig. 1. Difference between open-world environments and closed-world environments in SAR targets recognition problem. In the distribution of original dataset, represents all targets belonging to three known classes. denotes various unknown targets in real open-world environments. Fig. 1(a) shows the decision boundary of each class obviously misclassifies unknown targets as known. Fig. 1(b) shows the decision boundaries limit the scope of known classes, reserving space for unknown classes. (a) Closed-set recognition problem. (b) Open-set recognition problem.
fusion are also popular. Such as Chen et al. [8] used convolutional kernels of different sizes to extract the multikernel-size deep features, and then, these features are fused in an optimal way to acquire the lowest loss. The feature information of different dimensions is fully used to achieve promising feature representation integrity in such methods. But they rely heavily on network parameters and are not well generalized on various types of datasets.
In the real-world target recognition problem, many unseen classes not included in the training set are likely to appear, which are called unknown classes. And these unknown targets are misjudged as known classes in CSR. This misjudgment is defined as open space risk and seriously affects the military application of recognition systems [9]. Therefore there is an urgent need to address the risk caused by the assumption of closed-world environments. The concept of open-world environments is proposed to describe this scenario where the test set contains various targets not belonging to known classes. In such an open setting, there are two primary goals. First, the system ought to correctly classify all targets belonging to known classes. Second, the targets not belonging to any known class are required to be identified as the unknown class and promptly rejected. The recognition problem setting is regarded as open set recognition (OSR) for SAR targets. Compared with CSR, OSR can additionally achieve the rejection of unknown classes, which is the significant difference between them. Fig. 1 describes the tasks that OSR and CSR need to achieve, respectively.

A. Traditional Strategies
Early there have been some efforts with traditional techniques toward developing efficient OSR methods for SAR targets, as shown in Fig. 2. For example, Scherreik and Rigling [10] proposed a support vector machine (SVM)-based method, which realize classification with a rejection option using the W-SVM and POS-SVM. Besides, some scholars proposed classification models based on artificially generated unknown targets. Such as [11], two template-based open set recognizers using synthetic images as unknown classes are adopted. Later, Song et al. [12] regarded physics-based electromagnetic (EM) simulated images under different azimuth angles as the unseen targets and ZSL Fig. 2. Summary of OSR methods for SAR targets. "SVM-based" denotes support vector machine-based models while "EVT-based" represents the statistical EVT is applied to the models. Modern methods based on deep-learning improve OSR performance from both discriminative and generative perspectives. model was designed. What is more, some methods using extreme value theory (EVT) are also representative. For instance, Dong et al. [13] put forward to select the edge exemplars by edge pattern selection and herding, and then fit the probabilistic distributions with EVT. The unknown class is rejected by thresholding distribution similarities.

B. Modern Strategies
Recently, with the emergence of deep neural networks, deeplearning based OSR methods have developed and achieved superior performance [14]. Most studies focus on discriminative models, which quantize the output distance or probability to constrain the decision boundary. For instance, Hendrycks Gimpel [15] compared output probabilities to a threshold based on a pertrained classifier. An instance belongs to the unknown class when the max probability is lower than the threshold. Bendale and Boult [16] defined scores from the penultimate layer of the deep network as activation score. Then activation scores were used to estimate whether the input data belong to the unknown class combining meta-recognition. However, they mostly serve as postprocessing methods on resulted CNN features. The division of decision boundaries depend heavily on the information obtained after training. When the extracted features are not rich enough, the recognition effect is greatly affected. Apart from these discriminative models, some generative models have also gained a lot of attention. They learn the decision boundaries by generating known targets or unknown targets using generative adversarial network (GAN) [17], autoencoder [18] and flow-based model [19]. An example can be drawn from [20], which learned the distribution of known classes with GAN and discriminator. Because the unknown samples did not fit in with the learned distribution, unknown classes were identified by thresholding the output scores of the discriminator.

C. Our Solution
Compared with discriminative methods, generative models are theoretically elegant and straightforward. However, these classifiers are trained independently of the targets generation process in existing methods. The deep unknown distribution learned by the classifier is ignored, resulting in the potential open space risk. Besides, the techniques used to generate targets are usually complicated. Motivated by these problems, a family of novel OSR methods, generative models via random sampling combination (GvRSC) is proposed in this article. Two technical routes are designed to estimate the distribution of unknown classes indistinguishable from known classes. The original open environment is then converted into a closed environment. Moreover, the different implications between prior information on determined known classes and simulated unknown classes are used to detect unknown targets. 1 The main contributions of this article are as follows.
1) The designed generation process of unknown targets is straightforward and effective. Furthermore, the classifier is trained in the feature space augmented by generated targets, making the model more general. 2) A rich deep feature space is further learned, so that decision boundaries of known classes are pushed away significantly. Meanwhile, the randomness of generation allows diverse novel features to be constructed continuously, effectively avoiding overfitting.
3) The proposed spatial clipping suppresses the noise interference effectively on the basis of retaining important details. By this way, the features used for classification are optimized with strong pertinence in SAR images.

II. BACKGROUND
Our work is mainly related to the concatenation and interpolation between known targets, aiming to simulate unknown targets. In this section, the challenges in OSR and the algebraic model of known class spaces are briefly discussed.

A. Challenges in OSR
Rejecting unknown targets while correctly classifying all known targets is just the targeted problem to be solved by OSR, even if these unknown targets come from varieties of categories. In a feature space, the positive half space for each known class is Fig. 3. Distribution of unknown targets in a feature space, where represents all targets belonging to three known classes. The black denotes the unknown targets similar to some known classes and blue denotes unknown targets that have nothing to do with all known classes. considered to be relatively bounded. The prior information used to learn the known distribution is limited but sufficient. However, the distribution space of unknown classes is unbounded. The mixed unknown targets are roughly divided into two types: unknown targets far away from clusters of all known classes, and unknown targets close to some clusters of known classes, as shown in Fig. 3. Considering the identification process is essentially the process of finding the best match of feature information, the distribution of unknown targets is determined by the feature similarity between them and known classes. The first type of unknown targets have few similar features to all known classes, but the second type of unknown targets have a high feature similarity with known classes.
Unknown targets far away from clusters of all known classes: The positive half space of each class is identified after a classifier is trained. When a sample appears deeper in an identified positive half space, the probability of belonging to the corresponding class is large. On the contrary, the probability tends to decline gradually as the sample is further away from the identified space. Hence, the output probabilities of such unknown targets under all classes are all low. In this case, these unknown targets are rejected directly by thresholding the maximum output probability.
Unknown targets close to some clusters of known classes: Notably, the difficulty lies just in identifying such unknown targets, which have a semantically similar component/region to that of some known classes. The existence of common features leads to a high probability for these unknown targets under the nearest known class. This probability is as high as the probability of the known target being under the true class. As a result, thresholding probabilities simply fails to solve the rejection problem of these unknown targets.

B. Algebraic Model of Known Class Spaces
For all targets of the same class X k = {x k,1 , x k,2 , . . . , x k,n }, it is generally considered that they span into a linear subspace of the class [21], [22] Span(X k ) = α k,1 x k,1 + α k,2 x k,2 + · · · + α k,n i x k,n (1) where α k = [α k,1 , α k,2 , . . . , α k,n }is the coefficient vector. Each group of imaging data for this class is regarded as a specific element on the linear subspace. On the contrary, a linear combination of two or more groups of imaging data from different classes theoretically do not belong to any related subspace spanned by the known class Likewise, each image is quantified into a group of limited discrete features set. The kth class ith SAR image is denoted as Because the targets from different classes span into their respective linear subspaces, the zero vector is the only common vector. This algebraic model indicates that subspaces of different classes are noninterconnected. Hence, feature sets of two or more targets from different classes are disjointed. Any feature set composed of some discrete features from different classes does not belong to any relevant known class Targets constructed as above are obtained based on the feature transformation for known targets. Consequently, these targets are similar to known classes while not belonging to any known class. Inspired by that, known targets not in the same class are randomly combined to approximate the distribution of indistinguishable unknown targets in our study.

III. PROPOSED METHODS
Assuming the original training set contains K known classes, then OSR can be regarded as [9], [23]: simultaneously correctly classifying the K known classes and identifying unknown targets as the unknown class. That is, OSR is a K + 1-class classification problem containing prior information of K known classes. We concatenate known targets to simulate the prior information on indistinguishable unknown targets, and make these unknown targets participate in classifier training. Consequently, the original open space is transformed into a closed space containing prior information of K + 1 classes. Specifically, two technical routes about unknown samples generation are included. The first is that some known targets not in the same class are randomly cropped and spliced in the input layer. This technique is defined as spatial clipping used to generate unknown samples (SCG). The second is to make a random weighted combination of some known targets in the middle hidden layer, ensuring these targets are not in the same class. This technique is named as weighting used to generate unknown samples (WG).
In this section, we first propose the overall frameworks of the family of generative models in detail, and then explain the implementation process of unknown detection and known classification, followed by motivation.
A. Generative Model Based on SCG 1) Modeling: As an underlying support, the network architecture of our classification model is arbitrarily chosen according to pratical requirements. We denote the selected network as f (x; θ) with parameters θ, which inputs an image x and outputs a logit vector over the limited set of classes. Furthermore, the classification network is considered to be composed of an embedding function and a linear classifier, described as In (5), ϕ(x) : R D → R d denotes the abstract embedding function for extracting features, where D refers to the dimensionality of each input image and d refers to the dimensionality after mapping. W ∈ R d×(K+1) represents the weight matrix of the fully connected layer (FC) for linear classification, where K + 1 indicates the network output has a total of K + 1 classification nodes with the K + 1th node corresponding to the unknown class.
2) Implementation: Assuming a labeled training set In the input layer, we first randomly sample four different targets not in the same class from D tr for pairing. Noting that we do not pair (x i , x j , x k , x l ) for all possible combinations. Conversely, combinations are produced within mini-batches. Given the training batch of size B, four orders of training targets are obtained by shuffling the mini-batch. Then pairs containing the same target or belonging to the same class are discarded, leaving the remaining pairs for unknown targets construction.
After target pairing, random cropping and stitching are performed to simulate novel targets. Within the length and width of the original training image, two values are sampled from Beta distribution to construct the boundary coordinate in every training step [23]. Supposing the size of the training image is To guarantee the difference between generated images and original images, the boundary coordinates are required to fall on the target center area with a higher probability, while fall on the boundary area of the original image with a lower probability. Considering Beta distribution simulate the probability distribution of event occurrence probability, this characteristic is used to effectively constrain the probability distribution of selecting boundary coordinates. We set α 1 = β 1 = 2 > 1, and the corresponding probability distribution shape is shown in Fig. 6. By drawing a horizontal line and a vertical line at the position of (w, h), the original shape a × b is divided into four new rectangles. We denote the shapes of these four rectangles as [·, ·] represents the length and width of rectangles. The paired four-column images are sequentially cropped according to the  shape ν n , n = (1, 2, 3, 4). Specifically, the starting positions for cropping denoted as (a n , b n ), n = (1, 2, 3, 4) are randomly produced from beta distribution within a certain range a n = a n × (a − w n ), a n ∼ Beta(α 2 , β 2 ) They are taken as the upper left corners of cropped areas. In order to avoid the cropped area contains a complete known target, starting positions are required to locate in the upper left region of the central target as little as possible. Therefore, we also use the Beta distribution to constrain the selection probability and set the parameters as α 2 = 1, β 2 = 4, resulting in the distribution shape shown in Fig. 6. Finally, the four cropped images in each pair are spliced around the boundary coordinates (w, h) to form a novel unknown image with the same size as the known image. The specific explanation of the process is shown in Fig. 4. The targets generated in each batch are augmented into the batch of D tr with class label K + 1. The augmented dataset is expressed as Then, D tr is sent to the classification network for joint training. The classification loss of K + 1 classes is expressed as The output distribution of network is optimized to match the one-hot encoded distribution of true labels, leading the generated targets to approximate the unknown class. After the network finishes iterative optimization, a K + 1-class classifier C K+1 is formed. The complete process of the SCG-based model is shown in Fig. 5.
B. Generative Model Based on WG 1) Modeling: Considering a higher dimensional information is in the middle hidden layer than the input layer, the hidden representations are used to construct unknown targets in this technique. We divide the network structure into two parts with the middle hidden layer as the boundary. The embedding function ϕ(x) can be further expressed as In (10), ϕ pre (·) represents the embedding function corresponding to the prelayers before middle layer, mapping input data into hidden representations. ϕ pos (·) corresponds to the remaining layers of the feature extraction network, mapping the hidden representations into output features. Then, the classification network is described as where ϕ pre (x i ) refers to the high-dimensional hidden representations of the input x i . 2) Implementation: Similarly, known targets in D tr are first sampled randomly for pairing, discarding the pairs containing the same target or in the same class. Denoting one of the obtained pairs as (x i , x j , x k , x l ), the four known targets are separately put into the previous part of the feature extraction network. The corresponding hidden representations are expressed as (ϕ pre (x i ), ϕ pre (x j ), ϕ pre (x k ), ϕ pre (x l )). A linear weighted combination of these representations is performed to construct a novel unknown target, as shown in Fig. 7. The corresponding formula is as follows: These weight coefficients are also selected from Beta distribution. That is, In order to ensure the sum of weight coefficients is 1, we set Then,x u goes through the rest part of feature extraction network, corresponded to ϕ pos (·). The final output is represented as f (x u ). An additional loss function is constructed for generated targets, defined as: l u is used to optimize the output distribution of generated targets, leading these targets to simulate indistinguishable unknown targets as much as possible. As a result, the unknown distribution in open-world environments are constrained within a limited range.
The original training set D tr is directly input to the complete classification network f (x) = W T ϕ(x), so as to obtain the output distribution of known targets. The corresponding classification loss between known classes is denoted as l k Finally, the overall loss is obtained by the weighted summation of l u and l k , denoted as A K + 1-class classifier C K+1 is also formed after optimization. The corresponding complete process of the WG-based model is shown in Fig. 8.

C. Implementation of the Identification Process
The output distribution difference between unknown and known targets is significantly improved after training. The output probabilities of class K + 1 are relatively high for unknown targets similar to some known classes. But it does not rule out that a high output probability of the similar known class may also appear. As for the known targets, the highest probability appear in their true class label while the probability of class K + 1 is low. However, the countless of such unknown targets is worth noting, just as mentioned in Algebraic model of known class spaces. The finiteness of the learned unknown features is normal and realistic, and the difference from known targets determines the output probability of class K + 1 is full of different representation meaning. Hence, we do not generalize the output probabilities of the unknown class and known classes, making full use of the limited feature information of generated unknown targets.
During the recognition process of testing instances, the output probability of class K + 1 is not ignored just because it is not the max probability on all classes. When the absolute value of the probability is large, it has been stated the instance belongs to the indistinguishable unknown class with a high probability.
Above all, we reject the targets far away from all the clusters of K + 1 classes by thresholding, which corresponds to unknown targets having few similar features to all known classes max k=1,2,...,K+1 If the above formula is satisfied, where ε 1 is the threshold, the instance is directly judged as an unknown target, expressed as y = K + 1.
Otherwise, other judgments need to be continued. The rejection on unknown targets similar to some known classes is achieved by thresholding the output probabilities of belonging to class K + 1. Specifically, if the output probability is greater than the threshold denoted as ε 2 the instance is judged as indistinguishable unknown class: y = K + 1. If not, it is identified as the known class corresponding to the largest probability of the top K classes

D. Motivation
In this section, the effectiveness of GvRSC is theoretically demonstrated and expounded in detail. Three main reasons for improving the unknown detection performance while guaranteeing the known classification effect are included: efficient simulation of unknown targets, boundaries compactness of known classes, suppression of SAR noise interference.

1) Generative Model Based on SCG:
The targets generation process of SCG only uses image cropping and stitching without any extra time complexity. Besides, the diversity of feature combinations is increased by virtue of randomness. Thus, the generated targets are led to best approximate indistinguishable unknown targets, which is verified by the t-SNE visualization effect in Fig. 9(a). In addition, patching creates new global features in the generation process. This keeps neural network from overfitting to specific features [24].
Targets generated by SCG always equips local features included in the original dataset, from which deep and comprehensive known features are further extracted. Many patch details in known classes are learned repeatedly. As a result, the decision boundary is moved away from the generated targets, resulting in a compact embedding space.
Furthermore, the influence of background noise [25], [26] on recognition is reduced effectively, increasing the model stability. We assume the feature set corresponding to each target consists of main features used to classify and various noise. Because the background noise is random, the noise distribution among varieties of classes is also random. By cropping and splicing, various random noises on different images are randomly sampled on one certain image, so that the same noise distribution exists in the generated targets and K known classes. The uniformity of noise distribution is greatly enhanced in varieties of classes. During the network training process, the weight of all features to the target classes are determined. Therefore, when the noise tends to be evenly distributed across all classes, the information gain or weight assigned to the noise is weakened close to 0. On the contrary, the main features belonging to the inherent properties of each known class have regular distribution. The information gain assigned to each main feature is continuously strengthened after training. In summary, on the basis of not losing SAR image texture information, the interference of random noise on image classification is effectively suppressed. During the testing process, even if unseen random noise is mixed in, the weight of these main features still play a dominant role, and the stability of the classification effect is increased obviously.
2) Generative Model Based on WG: From (12), we find the generation process also does not consume extra time complexity. WG builds complete new pixel-level features that original known targets do not include. Therefore, the various of learned unknown features are enriched drastically.
Since the network learning process is regarded as the parameter learning process, a deeper abstract representation appears in each layer compared with the previous layer. As the parameters are updated through the network, targets generated in the middle hidden layer are continuously optimized. Thus, generated targets are allowed to better stand for the unknown targets similar to known classes, as the SNE visualization result in Fig. 9(b) proved. In addition, the combinations in the middle hidden layer prevent generated targets from being confused with other known targets, which prone to occur at the input layer. It means the distinguishability between known classes is effectively guaranteed.
Weighted combination is understood as the establishment of a linear interpolation function, which makes the discrete sample space continuous. Verma et al. [27] proved the learning of unknown targets located at the interpolated position pushes the decision boundaries away in all directions, smoothing the decision boundaries. This characteristic is conducive to improving the generalization ability of the WG-based model. With the above characteristics, testing unknown instances are gathered around the cluster center of class K + 1.

IV. EXPERIMENT AND RESULTS
To evaluate the performance of GvRSC, we use the MSTAR public database for conducting experiments. The SAR images  consisted are imaged in the X-band and HH polarization with 0.3-m resolution for multiple targets. There are 10 classes of vehicle targets with a pixel size of 128×128, ie., BMP2 (tank), BTR70 (armored car), T72 (tank), BTR60 (armored car), 2S1 (cannon), BRDM2 (truck), D7 (bulldozer), T62 (tank), ZIL131 (truck), ZSU23/4 (cannon). These targets were captured with 190 ∼ 300 different aspect versions, which are more than 360 • full coverage. According to the recommended configuration [8], [28], the images with a depression angle of 17 • are used to train the network, and the images with a depression angle of 15 • are used for testing. The number of the ten-class targets images and some imaging parameters are shown in Table I. Because the data in MSTAR is insufficient, a set of data augmentation strategies is performed in all the experiments of this article.
In this section, we first compared the difference between open-world and closed-world, reflecting the important research significance of OSR. Then, we evaluate the performance in unknown detection and make a further extended experiment with openness changes. Subsequently, the performance on the OSR task is compared with other state of the art OSR methods. Finally, an ablation study is also conducted to further analyze the contribution of each part in our model. Notably, the result data are directly quoted from the relevant references if they exist. In other cases, where there are no directly citable literatures, we maintain the same conditional configuration as the original reference, and use the recommended parameter values in the literature to reproduce. Details on recommended values can be found in these reference.

A. Comparison Between Open-World and Closed-World
In order to reflect the difference between open-world environments and closed-world environments, we conduct a brief experimental comparison on the MSTAR dataset. Eight of the ten classes are randomly selected as known, labeled as 0 ∼ 7. The other two classes are merged into the class 8 as unknown.
The comparative experiments consist of three parts: the plain CSR CNN in closed-world environments, the plain CSR CNN in open-world environments, and SCG-based model recognition in open-world environments. The experimental results are shown by the confusion matrix in Fig. 10. In Fig. 10(a), the CSR method ensures the accurate classification of known classes. But in the real open-world environment as shown in Fig. 10(b), the CSR method misjudges unknown targets as one of the known classes, leading to serious open space risks. Different from that, the proposed OSR method effectively addresses the misjudgments. In Fig. 10(c), most unknown targets are correctly identified as the unknown class, and the classification accuracy of known classes is also effectively guaranteed.

1) Experimental Datasets:
From the MSTAR dataset, we randomly select eight classes as known classes, remaining the other two classes as the unknown class. The training set is composed of the corresponding eight classes of the MSTAR training images, while two groups of test sets are set up. The first test set contains 10 classes of the MSTAR testing images. The second test set contains eight known classes of the MSTAR testing images and two unknown classes of MSTAR-noise images. These MSTAR-noise images are randomly generated on the MSTAR images using GAN, which are similar to the original MSTAR images.

2) Evaluation Metrics:
In a real open-world environment, it is not known how rare or common the unknown targets are. An independent and flexible threshold is required for detecting unknown classes. The receiver operating characteristic (ROC) curve characterizes the performance of a detector with the threshold changing from zero recall to complete recall.  For this reason, we choose the ROC curve and the area under the ROC curve (AUC) as metrics, which provide calibration-free measures of detection performance. Apart from this, any OSR method should remain capable of standard closed-set classification when detecting known and unknown targets. We choose the classification accuracy in a closed set space as another metric, which states if the classifier still working when applied to the known subset of classes.
3) Network Architecture: The classification network for this experiment refers to the classifier32 network used in [29], with some changes. Considering SAR images are very sensitive to features such as imaging azimuth, learning deeper highdimensional features is the key to improve recognition accuracy. We add two combined layers composed of a convolutional layer and an activation function layer to the middle hidden layer of the original neural network. As a result, the network depth is increased and more abstract features are obtained. The momentum stochastic gradient descent (Momentum SGD) is used to optimize the network, and its learning rate starts from 0.1. Table II, the proposed two models are compared with newly proposed methodology under the same conditions. Among them, softmax is regarded as a baseline, which uses the highest output probability as the confidence score for detection. We report the mean AUC results over five trials in the Table II. It is apparent from this Table II the proposed methods significantly improve the recognition performance. Specifically, in the experiments on MSTAR dataset, both SCG-based model and WG-based model improve the detection performance by a considerable margin, pushing forward about 10% than Softmax. Their AUC scores have exceeded 90% and achieved excellent detection effect. As unknown targets become more indistinguishable in MSTAR+noise dataset, more prominent advantages are shown in SCG-based model than other advanced methods. The detection performance is pushed forward by 12.1% from Softmax, achieving a good effect of 85.2%. WG-based model also slightly improves the performance in unknown detection by 3.5% compared with the baseline, though the effect is not as good as the multitask learning-based model.

4) Result Comparisons: As shown in
Moreover, we draw ROC cruves of these listed methods for performance evaluation in Fig. 11. In the MSTAR dataset, ROC curves of SCG-based model and WG-based model are relatively closer to the upper left corner, which means they have higher identification accuracy. The ROC curve of RPL is closest to the lower right corner, which indicates RPL is the worst method to detect unknown. In addition, the ROC curve corresponding to SCG-based model is still closest to the upper left corner in the MSTAR+noise dataset, followed by that of multitask learningbased model. And the ROC curve of WG-based model is located in the upper left of RPL, ARPL, and GCPL. In view of this, the superiority and effectiveness of our proposed methods in detecting unknown classes is verified.
We also provide the closed-set accuracy in Table II. Compared with the baseline, the closed-set recognition performance of other newly proposed comparison methods is all obviously reduced, with accuracy reduced by several percentage points on different datasets. But the closed-set classification performance of SCG-based model and WG-based model has basically remained unchanged, only a few tenths of a percentage point fluctuation. From these results, it is concluded that the proposed methods realize accurate unknown detection without sacrificing the discriminative ability in the closed-set classification.

5) Extended Research for Openness:
Real open-world environments are very complex and unpredictable, where diverse unseen targets may be encountered. The more classes of unknown targets, the greater the open space risk in the recognition problem, and the higher the difficulty in detecting unknown targets. Therefore, when evaluating the model, the influence of openness [35] on detection performance needs to be observed. Following the protocol given in [36], we quantify the complexity of the open-set task with simplified openness, defined as where N train denotes the number of classes contained in the training set and N test denotes the number of classes contained in the test set. As we discussed in the preliminaries, we set N train = K. To change the degree of openness, we vary K with a fixed N test = 10 for the MSTAR dataset. The detection results corresponding to a range of greater openness scores are shown in Fig. 12. As the openness degree increases from 10.56% to 45.23%, our models perform better in unknown detection, leading to significant differences with Softmax. When the openness degree reaches 45.23%, the AUC scores are improved by 24.8% for SCG-based model and 17.8% for WG-based model compared with the baseline. What is interesting in Fig. 12 is that AUC scores of the proposed two models both degrade gently as the openness increases. Their unknown detection performance does not deteriorate as drastically as Softmax. Particularly, SCG-based model appears to be unaffected by openness. Overall, these results suggest when Softmax is difficult to work in high degree of openness, the proposed methods handle these scenarios with stable and excellent performance.

6) Visualization Detection Results:
To analyze the feature similarity between testing instances and the class K + 1, we measure the second norm of the distance between testing instances and the center point of class K + 1 in a high-dimensional feature space. The corresponding high-dimensional feature distribution histograms with MSTAR as the test set are shown in Fig. 13. Apparently, the distribution peaks of known classes and the unknown class are clearly distinguished in both SCG-based model and WG-based model. The feature difference between unknown instances and the center of class K + 1 are much less, while the difference is relatively more for known instances. This finding confirms thresholding the output probability of class K + 1 is reliable and effective for unknown detection.

C. Open Set Recognition
In addition to detecting unknown targets, another purpose OSR needs to achieve is to accurately classify known targets. In this section, the open-set classification performance of GvRSC on known targets is verified.

1) Experimental Datasets:
To facilitate comparison with other OSR methods, we set up the experimental dataset with reference to [10]. We choose T72, BMP2, and BTR70 to make the training set. The whole 10 target classes make up the test set. This means OSR models should receive and classify T72, BMP2, BTR70, and simultaneously identify the rest seven classes as the unknown class during the test time.
2) Evaluation Metrics: To analyze the comprehensive performance in both known classification and unknown detection, we introduce recall, precision, and macro-F1. Recall indicates the proportion of instances classified correctly among all positive instances, which measures the ability to identify positive instances. Precision indicates the proportion of instances that are actually positive and classified as positive. Macro-F1 is a weighted harmonic average between recall and precision. In a multiclass problem, these metrics are calculated as follows, where K denotes the number of all classes 3) Result Comparisons: Regarding the classification network, we still use the network structure in Unknown Detection. We compare GvRSC with other six OSR methods in Table III, i.e., Softmax [30], ARPL [32], Openmax [16], W-SVM RBF [37], iCaRL [38], and EVM [39]. Among them, Softmax, ARPL, and Openmax belong to the universal OSR methods migrated from the optical field. It is apparent that their macro-F1 scores are all below 70%, yielding unsatisfactory performance in OSR. Especially for ARPL, the recognition ability to SAR images has almost lost, though ARPL performs well on optical  datasets. Therefore, OSR for SAR targets needs to be considered in combination with the characteristics of themselves. Compared with these migrated optical OSR methods, WG-based model still has a more passable recognition performance. The macro-F1 score is already close to 70% with the recall as high as 73.2%.
However, the macro-F1 scores of iCaRL, EVM, and SCGbased model are all more than 80%. Performance differences have been revealed significantly compared to the optical methods. It is obvious that our SCG-based model has the best macro-F1. Futhermore, the precision of EVM is relatively low, which means the accurate predicted results account for a low proportion of all predicted results. The recall of the SCG-based model is up to 89.3% and the precision is up to 82.4%. The results indicate that a good trade-off between avoiding missed detections and reducing false detections is achieved well. Consequently, it is inferred from Table III that SCG-based model performs better both on known classification and unknown detection than others.

D. Ablation Study
In this section, we conduct an ablation study and analyze each innovative part's contribution with the MSTAR dataset. We continue to use the training set in Section IV-B, where eight out of ten classes are chosen as known and the other two classes are regarded as unknown. For the two innovative parts in our method: new technical routes of simulating indistinguishable unknown classes, ways of detecting unknown classes. We configure four groups of comparison experiments for analysis: Softmax, K+1-Softmax, SCG-based model, and WG-based model. Among them, Softmax stands for a plain CNN, where the network contains K output nodes corresponding to K known classes participating in training. And unknown targets are rejected only by thresholding the maximum output probability of K classes. K + 1-Softmax means the output nodes of the classification network are set to K + 1, while there are still only K known classes participating in training. By thresholding the output probability of the K + 1th node, unknown classes are rejected. On the basis that the network output node is set to K + 1, SCG-based model means SCG is used to simulate indistinguishable unknown classes and augmented to the original training set, participating in training together with known targets. Finally, the output probability of the class K + 1 is thresholded to complete the OSR task. Similarly, in the WGbased model, WG is used to simulate indistinguishable unknown targets, and these targets are fed into the later part of the network for training. OSR is realized by thresholding the output probability of the unknown class K + 1. We evaluate the recognition performance by closed-set accuracy, AUC and macro F1-scores.
The results are shown in Table IV. In addition, ROC is also presented to simply and intuitively display the recognition effect of the comparison experiments, as shown in Fig. 14.
We can infer from the Table IV that thresholding the output probabilities of class K + 1 improves the OSR ability of plain CNN by a large margin. On this basis, simulating unknown classes to participate in training further results in better recognition effect. Specifically, the results indicate that adopting SCG and WG both improve recognition performance through effective simulation of the unknown class.
V. CONCLUSION SAR target recognition in open-world environments is important for practical applications, while only few researches were studied. In this article, we propose a family of generative models to solve the OSR problem, that is, SCG-based model and WG-based model. By random sampling combination, the targets generation processes do not consume extra computational complexity. Meanwhile, the kind of indistinguishable unknown distribution was approximated well by generated targets, resulting in compact embedding space of known classes. Besides, SCG-based model reduces the interference of the background noise to the OSR performance particularly. The targets generated by WG make the discrete space continuous, so that decision boundaries are pushed away in all directions. The smooth decision boundaries further improve the generalization ability of the model. What is more, the difference of simulated unknown targets from known classes is highlighted and fully exploited in unknown detection. A series of experimental results have verified that the proposed GvRSC performs well both in unknown detection and known classification. Particularly, the SCG-based model outperforms other state of the arts. Moreover, there is not a significant downward trend in unknown detection performance as the degree of openness increases.
In the practical application of OSR methods for SAR images, sometimes it is not enough to detect unknown targets, and it is necessary to further identify their specific attributes. Therefore, in the future, we will focus on the problem of class-incremental learning, which learn useful information in new targets, while retaining the original classification information.