Processing math: 0%
A Coarse-to-Fine Hierarchical Feature Learning for SAR Automatic Target Recognition With Limited Data | IEEE Journals & Magazine | IEEE Xplore

A Coarse-to-Fine Hierarchical Feature Learning for SAR Automatic Target Recognition With Limited Data


Abstract:

With the rapid advancements in deep learning, Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) has seen significant improvements in performance. However,...Show More

Abstract:

With the rapid advancements in deep learning, Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) has seen significant improvements in performance. However, the effectiveness of even the most advanced deep-learning-based ATR methods is limited by the scarcity of training samples. This challenge has sparked growing interest in SAR ATR under data-constrained conditions in recent years. Most current approaches for SAR ATR with limited data enhance recognition through data augmentation, specialized modules, or contrastive learning-based loss functions. However, effectively utilizing limited supervision signals to identify key features remains a significant challenge that existing methods have not thoroughly addressed. In our research, we introduce a novel coarse-to-fine hierarchical feature learning strategy for SAR ATR with limited data. Starting with a feature extractor that produces multi-level features, we implement a coarse-to-fine gradual feature constraint to optimize each level using limited supervision signals. This approach simplifies parameter search and ensures effective feature utilization from coarse to fine granularity. Additionally, our method enhances the compactness within classes and the separability between classes of features at various levels. This is achieved by capitalizing on the consistency of features across multiple levels, thereby progressively enhancing the features and, in turn, boosting the model's overall performance. To validate our approach, we conducted recognition and comparative experiments on the MSTAR and OpenSARShip datasets. The results demonstrate our method's exceptional performance in limited-sample recognition scenarios. Moreover, ablation studies confirm the robustness of our approach, underscoring its potential in addressing the challenges of SAR ATR with limited data.
Page(s): 13646 - 13656
Date of Publication: 04 July 2024

ISSN Information:

Funding Agency:

No metrics found for this document.

SECTION I.

Introduction

Synthetic aperture radar (SAR), a pivotal microwave remote sensing system, plays an essential role in both military and civilian contexts [1], [2], [3]. Its implementation in automatic target recognition (ATR) is fraught with challenges but stands as a key element in SAR applications. Numerous esteemed researchers have spearheaded diverse deep learning strategies, leading to substantial advancements in SAR ATR applications [4], [5], [6], [7], [8].

Yet, a common limitation in these algorithms is the need for large amounts of labeled samples per target type for training, a condition that's difficult to meet in real-world applications [9], [10], [11], [12], [13]. In certain scenarios, such as earthquake and sea rescue operations, the volume of SAR images can be sparse, causing potential shortcomings in existing SAR ATR methodologies. As a result, attention has shifted towards ATR methods that operate efficiently with limited SAR data [14], [15], [16], [17], [18], [19]. The focus is on building robust classifiers using minimal labeled SAR images.

The existing strategies for handling SAR ATR with limited data are broadly classified into three main categories: data augmentation, metric-based, and model-based methods [20], [21], [22].

Data augmentation strategies play a critical role in overcoming the challenges posed by sparse data in SAR ATR [4], [6], [14], [23], [24]. These techniques focus on enhancing recognition performance by increasing both the number and variety of training SAR images. A notable example is the work of Zheng et al., who introduced a semisupervised SAR ATR method utilizing a multidiscriminator generative adversarial network (GAN). This approach achieved a remarkable recognition rate of 85.23% with a minimal dataset containing only 20 samples per target [15]. Similarly, Gao et al. [25] proposed a semisupervised GAN method with multiple generators, attaining an efficiency of over 92% using approximately 40 samples for each target type. These innovations highlight the effectiveness of data augmentation in significantly enhancing SAR ATR performance, even when dealing with limited data.

Metric-based approaches, in contrast to other methods, focus on learning class representations that can effectively generalize to new, unseen classes [18], [26], [27], [28], [29]. The primary aim of these strategies is to develop more robust classification systems. An example is the work of Wang et al. [30], who introduced a prototypical network. In this network, the classifier is designed based on the Euclidean distance between training samples and the prototype of each class. This distance-centric approach ensures that the classifier is capable of accurately identifying and categorizing novel samples. In a similar vein, a distinct hard task mining technique has been applied to enhance meta-learning. This technique demonstrated significant improvements, achieving absolute gains of 1.7% and 2.3% in 1-shot and 5-shot settings, respectively [31]. These metric-based methods are proving to be effective in creating versatile and generalizable class representations.

Model-based methods, the third category, harness prior knowledge to construct the embedding space and regulate the complexity of the model [19], [32], [33]. These approaches prioritize incorporating domain knowledge and past experiences into the model design. These model-based methods underscore the value of incorporating prior knowledge and structural information into the modeling process.

Despite these advancements, the key challenge in SAR ATR with limited data is determining how to use rare supervisory information to enable the model to identify effective recognition features and achieve accurate identification. Generally, models undergo an end-to-end training process, but the scarcity of data often prevents the provision of sufficient supervisory information. Consequently, it becomes challenging for the model to find optimal hypotheses in the whole hypotheses space and extract effective recognition features from SAR images. Therefore, achieving precise recognition with limited SAR data in an ATR model is quite difficult [27], [34], [35].

In this article, we introduce a novel coarse-to-fine hierarchical feature learning approach for SAR ATR, specifically designed to work with limited data. Our method effectively utilizes the scarce supervisory signals available from limited samples and progressively refines features at various levels, aligning with the granularity of these features. Additionally, by leveraging the consistency across different granularities of features as supplementary supervisory input, our approach aids the model in navigating towards the optimal hypothesis during training, thereby enhancing the feature effectiveness. This strategy facilitates accurate target identification even with a limited number of SAR samples. More specifically, our approach begins by employing a multistage feature extractor to obtain initial multilevel features from limited SAR images, each representing different levels of granularity. We then compute a multilayer deep supervisory signal, which is derived from the consistency between features of varying granularities. This computation includes two primary components: the consistency loss in probability distribution and the consistency loss in feature distribution. Throughout the training process, this multilayer deep supervisory signal is utilized to progressively refine the features at each level, applying different weights to different granularities. This methodology enables precise recognition capabilities even with a constrained set of SAR training data. The key contributions of our method can be summarized as follows.

  1. Our framework introduces an innovative approach to handle the limited SAR training data challenge, focusing on extracting highly discriminative features. This framework is structured around hierarchical feature embedding, implementing a coarse-to-fine gradual constraint mechanism and a dual consistency constraint that targets features of different granularities. This design is meticulously crafted to amplify the effectiveness of feature extraction, even with the limited availability of SAR data.

  2. The core of our method lies in the coarse-to-fine gradual feature constraint. This strategy prioritizes the extraction of broader, more effective regions in SAR images initially, before delving into the extraction of finer, more discriminative features. Furthermore, we propose a consistency constraint aimed at not only improving the compactness within classes and the separability between classes at different granularity levels but also ensuring uniformity across features of various granularities.

  3. Our approach demonstrates exceptional performance in the recognition of MSTAR and OpenSARship datasets, even with limited training data. The effectiveness of our method is substantiated through rigorous verification, showcasing its robustness and reliability in handling constrained SAR data scenarios. The detailed outcomes and analyses of these verification processes, highlighting our approach's capabilities, are elaborated in the subsequent sections of this article.

The rest of this article is organized as follows. Section II provides an in-depth presentation of the proposed method. Section III validates the effectiveness of our approach through various experiments. Finally, Section IV concludes this article.

SECTION II.

Method

In this section, the proposed method is introduced. First, we elucidate the framework of our method. Then, the coarse-to-fine gradual feature constraint is described in detail. The consistency in features of different granularities is also presented.

A. Framework

The motivation behind the proposed method is to leverage limited supervisory signals to progressively enhance the effectiveness of multilevel features in a coarse-to-fine manner. Furthermore, the consistency across multilevel features serves as additional prior information, further addressing the problem of inadequate feature discriminability in small sample scenarios. Given practical constraints, ATR methods often confront a reduction in recognition performance with limited SAR data.

However, in real-world scenarios, there often exists knowledge that can serve as prior information, which can help augment the originally limited supervisory signals.

This can also enhance the utilization of limited supervisory signals, thereby acquiring more discriminative recognition features even with a small sample size. Hence, by exploring the utilization of supervisory signals and effective prior information, the recognition performance of the ATR method with limited SAR data can be improved and achieve state-of-the-art performance.

As shown in Fig. 1, our method can be divided into three stages. First, a multilevel feature extractor is constructed. SAR images are inputted and corresponding multilevel features are obtained.

Fig. 1. - Whole framework of the proposed method. The proposed method uses the sparse supervisory signals from limited samples to iteratively optimize multilevel features based on their granularity. It employs consistency across different feature granularities as extra guidance, assisting the model in finding the optimal hypothesis during training and improving feature effectiveness. This leads to precise identification, even with limited SAR samples.
Fig. 1.

Whole framework of the proposed method. The proposed method uses the sparse supervisory signals from limited samples to iteratively optimize multilevel features based on their granularity. It employs consistency across different feature granularities as extra guidance, assisting the model in finding the optimal hypothesis during training and improving feature effectiveness. This leads to precise identification, even with limited SAR samples.

Second, the coarse-to-fine gradual feature constraint is proposed to progressively enhance the effectiveness of the multilevel features from coarse to fine in recognition, ensuring that features at both coarse and fine levels have sufficient recognition effectiveness.

Then, the consistency in features across different levels further enhances the consistency of features at different levels, working in conjunction with the progressive enhancement of effectiveness of the coarse-to-fine gradual feature constraint. It not only filters features at multiple levels, but also effectively enhances the separability and compactness within classes of features. Moreover, it limits the search range of the model's hypothesis space, making it easier for the model to find the optimal hypothesis with small sample sizes. The pipeline of our method can be formulated as follows.

Given the training SAR image datasets X=\lbrace x_{11},..., x_{ij}, ...,x_{CN}\rbrace, C is the total class number, and N is the number of SAR images for each class. x_{ij} is the jth SAR image of the ith class, and y_{ij} is the corresponding class label.

First, the feature extractor is constructed to obtain the multilevel features F_{ij}=\left\lbrace f_{ij}^{1},..,f_{ij}^{K}\right\rbrace from the input SAR image x_{ij}. Here, f_{ij}^{k} represents the features at the kth level of x_{ij}. Then, the coarse-to-fine gradual feature constraint calculates the recognition effectiveness of each level feature on the K levels of features \left\lbrace L_{g}^{1},\ldots, L_{g}^{K} \right\rbrace. By further proposing a progressive optimization constraint for \left\lbrace L_{g}^{1},..., L_{g}^{K} \right\rbrace and calculating the final loss of features at K levels L_{gc}, it realizes the optimization of single-layer features using the inadequate supervisory signals provided by limited samples, and progressively enhances the recognition effectiveness of features with the increase of feature levels.

To maximize the efficacy of limited supervisory signals, our method “consistency in features of different levels” calculates the consistency measures for features at K levels, focusing on both feature distribution and recognition probability distribution. This calculation results in an optimization loss, L_{c}. This approach necessitates strong consistency between features at adjacent levels, ensuring that recognition-effective features are progressively enhanced through each layer. This strategy works in tandem with the previously established progressively enhanced recognition effectiveness optimization loss, facilitating better coordination of multilevel features for accurate recognition in scenarios with limited sample sizes.

The total loss of our method is formulated as \begin{equation*} L_{\text{total}} = L_{ce} + L_{gc} + L_{c}. \tag{1} \end{equation*} View SourceRight-click on figure for MathML and additional features.In this equation, L_{ce} represents the basic loss for recognition, encompassing the foundational aspect of our loss function.

Our proposed method systematically augments the recognition effectiveness of multilevel features using the constrained supervisory signals. This is achieved through the implementation of multilevel consistency constraints and progressive effectiveness constraints, which are critical for ensuring accurate recognition in contexts characterized by small sample sizes. In the subsequent sections, we delve into the intricacies of the coarse-to-fine gradual feature constraint and the consistency in features of different levels, providing a comprehensive overview of these integral components of our methodology.

B. Coarse-to-Fine Gradual Feature Constraint

The coarse-to-fine gradual feature constrain aims to optimize the features of K levels individually, using only limited supervisory signals, so that the recognition effectiveness of features at K levels progressively increases with the enhancement of the level.

Most previous methods used the same limited supervisory signals to optimize the entire model simultaneously, making it difficult to obtain ultimately effective recognition features. This is because of the following. 1) The entire model has a large number of parameters, corresponding to a large hypothesis space, and limited supervisory signals find it hard to help the model search for the optimal hypothesis. 2) The process of feature extraction by the entire model gradually transitions from coarse to fine levels. When limited supervisory signals are used to optimize the entire model simultaneously, the optimization of single-layer features is not controlled. This might lead to a situation where a certain level of feature has high recognition effectiveness, but the next level of feature has low recognition effectiveness. These locally optimal parameters limit the overall performance of the recognition model. Therefore, we propose the coarse-to-fine gradual feature constraint to ensure the recognition effectiveness of features at each level among the K levels and a progressive effectiveness constraint to help the model gradually learn effective recognition features. This not only improves the way the model searches for optimal hypotheses in the model's hypothesis space, but also enhances the recognition performance of the model.

The coarse-to-fine gradual feature constraint method is structured into two primary steps: 1) measurement of single-layer recognition effectiveness; and 2) progressive constraint of recognition effectiveness across K levels. The overarching structure of our proposed approach is depicted in Fig. 1. The detailed methodology is as follows.

We start with SAR image datasets X= \lbrace x_{11},\ldots,x_{ij}, \ldots,x_{CN} \rbrace and extract the features F_{ij}=\left\lbrace f_{ij}^{1},\ldots,f_{ij}^{K}\right\rbrace of each image x_{ij}.

Step 1: Measurement of single-layer recognition effectiveness. This step focuses on evaluating the effectiveness of single-layer features in two dimensions: the distinction between interclass features and the accuracy of recognition. Initially, cosine similarity is employed to assess the similarity between two sample features \begin{equation*} d(x_{ij},x_{mn})= \frac{f_{ij}^{k}\cdot f_{mn}^{k}}{|| f_{ij}^{k} ||_{2}^{2}|| f_{mn}^{k} ||_{2}^{2}}. \tag{2} \end{equation*} View SourceRight-click on figure for MathML and additional features.

Here, ||\cdot ||_{2}^{2} denotes L2 normalization. Subsequently, for a given feature x_{ij}, we identify the most similar interclass sample feature x_{ij}^{n} and the least similar intraclass sample feature. The separation degree of interclass features for x_{ij} is then calculated \begin{equation*} Ls_{ij}^{k}= \max \left(d \left(x_{ij}^{k},x_{ij}^{n}\right)+\theta -d\left(x_{ij},x_{ij}^{p}\right),0\right). \tag{3} \end{equation*} View SourceRight-click on figure for MathML and additional features.

In this equation, \theta is a margin, a hyperparameter that defines the desired level of feature separation. By measuring the cosine similarity between pairs of features, we can effectively determine the most similar interclass and the least similar intraclass features.

Therefore, the measurement of the degree of separation of interclass features at the single layer can be calculated as follows: \begin{equation*} Ls^{k}= - \sum _{j} \sum _{i} L_{ij}^{k}. \tag{4} \end{equation*} View SourceRight-click on figure for MathML and additional features.

In assessing the accuracy of single-layer feature recognition, we employ the cross-entropy loss as a metric \begin{equation*} L_{ce}^{k} = -\sum _{i} \sum _{j} y_{ij} \log \left(p\left(y_{ij} | f_{ij}^{k}\right)\right). \tag{5} \end{equation*} View SourceRight-click on figure for MathML and additional features.

Here, p(y_{ij} | f_{ij}^{k}) represents the probability of the feature f_{ij}^{k} being correctly classified into its corresponding class label y_{ij}. To maintain consistency in classification across all granularity levels, we utilize a shared-weight multilayer perceptron (MLP) as the classifier. The overall effectiveness of the recognition using single-layer features is quantified by \begin{equation*} Le^{k} = Ls^{k} + L_{ce}^{k}. \tag{6} \end{equation*} View SourceRight-click on figure for MathML and additional features.

Step 2: Progressive recognition effectiveness constraint for K levels. After calculating the recognition effectiveness of single-layer features, we progressively constrain the features at adjacent, gradually deepening levels. For features at level k-1 and k, their progressive constraint is calculated as follows: \begin{equation*} Lg^{k}= \left(1-\frac{Le^{k-1}- Le^{k}}{Le^{k}}\right)^{p} \times Le^{k} \tag{7} \end{equation*} View SourceRight-click on figure for MathML and additional features.where p is the parameter to adjust the degree of progression. Through the proposed coarse-to-fine gradual feature constraint, we use limited supervisory signals to not only optimize the features at each level, but also progressively constrain them layer by layer, completing the progressive enhancement of feature effectiveness. Next, the consistency of features at different levels is described in detail.

C. Consistency in Features of Different Granularities

While the coarse-to-fine gradual feature constraint method successfully enhances feature effectiveness incrementally, its performance is limited due to the inadequacy of supervisory signals. This limitation leads to a discrepancy in the features extracted across different layers, resulting in the neglect or loss of effectively extracted features, ultimately impacting the overall performance of the recognition model. To address this, we introduce the “consistency in features of different levels” approach. This method aims to fortify the consistency among features at various levels, diminish the disparity between these features, and thereby augment the collaborative effectiveness of multilevel features, which is crucial for accurate recognition in small sample scenarios.

The “consistency in features of different levels” involves two main steps: 1) measuring the consistency of features across different levels; and 2) assessing the consistency of features at K levels. The process, depicted in Fig. 1, unfolds as follows.

Starting with SAR image datasets X=\lbrace x_{11},\ldots, x_{ij},\ldots, x_{CN}\rbrace and the associated features F_{ij}=\lbrace f_{ij}^{1},\ldots,f_{ij}^{K}\rbrace for each image x_{ij}:

Step 1: We measure the consistency between features at levels t and k for x_{ij} by unifying their scales using lightweight convolution and then employing a mixed method to gauge their similarity, calculated as \begin{equation*} d_{c}\left(f_{ij}^{k}, f_{ij}^{t}\right) = d\left(f_{ij}^{k}, f_{ij}^{t}\right) + \sqrt{\left(f_{ij}^{k}-f_{ij}^{t}\right)^{2}} \tag{8} \end{equation*} View SourceRight-click on figure for MathML and additional features.where d(f_{ij}^{k}, f_{ij}^{k-1}) represents the cosine similarity.

Step 2: For the consistency measurement of features at K levels for x_{ij}, we compute \begin{equation*} L_{c}(x_{ij}) = \sum _{k} \sum _{t} \frac{k+|k-t|}{k}^{p} d_{c}\left(f_{ij}^{k}, f_{ij}^{t}\right) \tag{9} \end{equation*} View SourceRight-click on figure for MathML and additional features.where |\cdot | computes the absolute value.

The final consistency measurement across K levels is thus \begin{equation*} L_{c} = \sum _{i} \sum _{j} L_{c}(x_{ij}). \tag{10} \end{equation*} View SourceRight-click on figure for MathML and additional features.

In addition, we use the cross-entropy loss as the basic recognition loss \begin{equation*} L_{\text{ce}} = -\sum _{i} \sum _{j} y_{ij} \log \left(p\left(y_{ij}|x_{ij}\right)\right). \tag{11} \end{equation*} View SourceRight-click on figure for MathML and additional features.The Kth feature of x_{ij} post dense layer processing is utilized as the final recognition feature for calculating L_{\text{ce}}.

By employing these methods, our approach leverages limited supervisory signals to optimize and progressively constrain features at K levels while enhancing consistency among level features, effectively utilizing the limited supervisory signals. Diverging from traditional overall optimization methods, our approach to single-layer optimization increases the likelihood of obtaining optimal features at each layer. The progressive level constraints and consistency constraints collectively boost feature effectiveness, enabling accurate recognition even with limited sample sizes. The complete process is summarized in Algorithm 1.

Algorithm 1: Enhanced Recognition With Multilevel Feature Consistency.

Algorithm
SECTION III.

Experimental Results

In the upcoming section, we will first introduce the dataset that we have selected for our experimental evaluation. Subsequently, to authenticate the efficacy of our proposed method, we will conduct a series of performance assessments of the recognition capability under a variety of sample number scenarios. These tests will further illustrate the robustness and versatility of our method under varying conditions.

A. Dataset and Network Setup

The moving and stationary target acquisition and recognition (MSTAR) dataset serves as a benchmark for evaluating SAR ATR performance. Comprising SAR images of ten distinct ground target classes, each image in the dataset measures 0.3 m × 0.3 m. Fig. 2 displays optical images along with corresponding SAR images for all ten target classes within the MSTAR dataset. The training and testing data both feature the same ten target classes, but differ in terms of depression angles. While the training data were captured at a depression angle of 17^{\circ }, the testing data were acquired at an angle of 15^{\circ }. The distribution of training and testing images is detailed in Table I.

TABLE I Original Number of Images for Different Depression Angles for SOC
Table I- Original Number of Images for Different Depression Angles for SOC
Fig. 2. - Optical images and corresponding SAR images of ten classes of objects in the MSTAR database. (From left to right: BMP2, BTR70, T72, 2S1, BRDM2, ZSU234, BTR60, D7, T62, and ZIL131.).
Fig. 2.

Optical images and corresponding SAR images of ten classes of objects in the MSTAR database. (From left to right: BMP2, BTR70, T72, 2S1, BRDM2, ZSU234, BTR60, D7, T62, and ZIL131.).

OpenSARShip is one key benchmark dataset in the field of SAR ship recognition. The OpenSARShip dataset is specifically designed to facilitate the development of advanced ship detection and classification algorithms that can perform under high interference. The data comprising this dataset were collected from 41 Sentinel-1 images under diverse environmental conditions. The dataset encompasses a total of 11 346 ship chips representing 17 types of SAR ships, which are combined with automatic identification system (AIS) information. The reliability of the labels within this dataset stems from the fact that they are based on AIS information [36]. In our experiments, we utilized the ground range detected (GRD) data, which boasts a resolution of 2.0\,\text{m} \times 1.5\,\text{m} and a pixel size of 10\,\text{m} \times 10\,\text{m} in azimuth and distance directions under the Sentinel-1 IW mode. The lengths of the ships in this dataset range from 92 to 399 m, with widths spanning from 6 to 65 m. The SAR images of six different ship classes within this dataset are illustrated in Fig. 3.

Fig. 3. - SAR ship images and corresponding optical ship images of six classes in the OpenSARShip dataset.
Fig. 3.

SAR ship images and corresponding optical ship images of six classes in the OpenSARShip dataset.

The experimental setup and the architecture of the network are detailed in this section. The SAR input images are resized to dimensions of 224 \times 224 using bilinear interpolation on the raw data. We set the {k} value to 4, which implies we have 4 granularities. We initialize the batch size to 32. The learning rate is initially set to 0.01, and it is subsequently decreased by a factor of 0.1 every 25 epochs. We also incorporate a warming-up period of 10 epochs prior to actual training. Our proposed method was implemented and tested on a GPU cluster featuring an Intel Xeon CPU E5-2698 v4 @ 2.20 GHz, and eight Tesla V100 units, each equipped with 32 GB memory. The implementation leveraged the open-source PyTorch framework, utilizing a single Tesla V100 unit.

B. Recognition Performance Under MSTAR Dataset

The recognition results of the proposed method are shown in Table II. The number of training samples per class ranges from 5 to all to validate the effectiveness of our method. From the table, facing 5-shot each target, our method can achieve a 74.52% overall recognition ratio, and when the training number of each class in a few labeled SAR samples is 20 and 10, respectively, the recognition rate just decreases from 95.67% to 83.46%. It indicates that our method can greatly improve the recognition performance when the labeled samples increase. When the number of each class in a few labeled SAR samples is 40, it gets above 98.31%, and most classes of all targets are above 98.00%. It shows that the innovations proposed in our method are beneficial for the recognition of a few labeled samples, though the one class of the target is more sensitive to the decreased labeled samples. In addition, it is clear that, when the number of each class in a few labeled SAR samples is larger than 40, the recognition rate can get above 98.00%. It demonstrates the effectiveness of the proposed method in SAR ATR under few labeled SAR samples and relatively sufficient SAR samples. When all the training samples are employed, the proposed method also achieves state-of-art performance in SAR ATR.

TABLE II Recognition Accuracy (%) With Different Numbers of Labeled Images Each Classes Under MSTAR
Table II- Recognition Accuracy (%) With Different Numbers of Labeled Images Each Classes Under MSTAR

Through the experiments of different sample numbers under MSTAR, it has been illustrated that our method can achieve superior recognition performance facing a large range of number of training sample each class. Then the recognition performance under OpenSARship dataset is presented.

C. Recognition Performance Under OpenSARship Dataset

In this section, we conduct two sets of recognition experiments using the OpenSARShip dataset to thoroughly assess the efficacy of our method. The OpenSARShip dataset, which includes several ship classes prevalent in the international shipping market, constitutes a significant portion of this market, as referenced in [37]. The experiments are carried out for both three and six-class categorizations, in line with [37], [38], and [39]. Specifically, the three-class experiments involve bulk carriers, container ships, and tankers, whereas the six-class trials additionally include cargo ships, fishing vessels, and general cargo ships, as detailed in Table III. The image preprocessing for these experiments aligns with the protocols used in the MSTAR dataset.

TABLE III Image Number and Imaging Conditions of Different Targets in OpenSARShip
Table III- Image Number and Imaging Conditions of Different Targets in OpenSARShip

Table IV showcases our method's results in the three-class OpenSARShip experiment. Impressively, with 200 training samples per class, our method attains an overall recognition rate of 85.61%. Remarkably, the recognition rates only slightly decrease from 82.32% to 80.00% when training samples are reduced from 100 to 60 per class, underscoring our method's robustness and effectiveness. A further reduction in training samples from 30 to 20 leads to a stable recognition rate around 72.99%, a commendable achievement in SAR ship target recognition. Even under stringent conditions with just ten samples per class, the method maintains a robust recognition rate of 69.76%.

TABLE IV Recognition Accuracy (%) of Three Classes Under Different Training Data in OpenSARShip
Table IV- Recognition Accuracy (%) of Three Classes Under Different Training Data in OpenSARShip

In the 6-class OpenSARShip experiment, as shown in Table V, the challenge is amplified due to the high similarity among different ship classes and the significant variation within the same class. Here, with 200 training samples per class, the recognition rate stands at 68.47%. The rates remain relatively stable, marginally dropping from 64.40% to 62.02%, as the training data reduces from 100 to 60 samples. This robustness is consistent with the three-class performance. With a further reduction in training samples from 40 to 20, the recognition rates show slight variation, ranging from 58.84% to 55.70%. Even with as few as ten samples per class, the method achieves a recognition rate of 52.39%, reflecting the typical challenges faced in practical applications.

TABLE V Recognition Accuracy (%) of Six Classes Under Different Training Data in OpenSARShip
Table V- Recognition Accuracy (%) of Six Classes Under Different Training Data in OpenSARShip

Our analysis reveals that for each dataset, the performance initially improves rapidly with an increase in the number of samples. During this phase, the diversity of data supplements the model's learning of key SAR image features, leading to swift performance enhancement. However, as the sample size reaches a certain threshold, the addition of key features becomes less pronounced. The model learns more about the same category in different environments, providing auxiliary information that aids in classification, hence the slower rate of improvement.

The quantitative results from the OpenSARShip experiments demonstrate that our method not only achieves high accuracy in SAR ship target recognition but also maintains robustness and effectiveness even with diminishing numbers of training samples.

D. Ablation Experiments

In this section, we assess the effectiveness of our method through ablation experiments conducted on the OpenSARShip and MSTAR datasets.

Initially, we perform these experiments with four configurations of our method: each using 40 samples per class for OpenSARShip and ten samples per class for MSTAR. As detailed in Table VII, the configurations are: V1, the basic vanilla model without our proposed innovations; V2, which incorporates the coarse-to-fine gradual feature constraint into the vanilla model; V3, which applies consistency in features of different granularities to the vanilla model; and ours, the full version of our method. The recognition performances of these configurations are presented in Tables VII and VIII.

TABLE VI Ablation Experiments: Average Recognition Accuracy (%) of six Classes Under Different Training Data in OpenSARShip With Different Level of Granularity
Table VI- Ablation Experiments: Average Recognition Accuracy (%) of six Classes Under Different Training Data in OpenSARShip With Different Level of Granularity
TABLE VII Ablation Experiments: Recognition Performance (%) of Different Ablation Configurations Under 40 Training Samples in OpenSARShip
Table VII- Ablation Experiments: Recognition Performance (%) of Different Ablation Configurations Under 40 Training Samples in OpenSARShip
TABLE VIII Ablation Experiments: Recognition Performance (%) of Different Ablation Configurations Shot Under Ten Training Samples in MSTAR
Table VIII- Ablation Experiments: Recognition Performance (%) of Different Ablation Configurations Shot Under Ten Training Samples in MSTAR

The comparative analysis of these configurations highlights the enhancement in recognition performance due to our innovations. Specifically, the comparison between V1 and V2 demonstrates the effectiveness of the coarse-to-fine gradual feature constraint. Moreover, the performance difference between V2 and V3 indicates that the coarse-to-fine feature constraint is more impactful than consistency in features across different granularities. However, the combined implementation of these two features in the “Ours” configuration illustrates that the consistency of features across different granularities can synergistically work with the coarse-to-fine feature constraint to significantly enhance recognition performance.

Further, detailed ablation experiments in Table VI for the OpenSARShip model reveal the impact of granularity consistency across various shot settings. The model shows the best average performance at level 4 granularity across different shot settings. Generally, model performance increases with granularity up to level 4, but beyond this point, granularity levels 5 and 6 lead to a decrease in performance due to model overfitting. This pattern validates the necessity of multilevel granularity consistency and, by extension, the efficacy of our approach.

Building on these findings, we will next present a comparison of our methods with other state-of-the-art approaches in SAR ATR under limited data conditions.

E. Comparison

In this section, the comparisons with other state-of-the-art methods are presented under MSTAR and OpenSARShip datasets.

We compare our approach with other SAR ATR methods, incorporating five traditional methods (PAC+SVM, ADaboost, LC-KSVD, DGM, and Gauss) and several few-shot learning methods. The latter includes three data augmentation methods (GAN-CNN, MGAN-CNN, and ARGN), a metric-based method (TMDC-CNNs), and four model-based methods (DNN1, DNN2, CNN, and semisupervised).

When testing using 20, 40, 80, or all target samples from each label as training images, it is evident from Table IX that our proposed method outperforms the other methods in terms of recognition rate. Remarkably, even when the training dataset for each target comprises only 20 samples, our method achieves a recognition rate exceeding 94%, whereas most other methods fall below 86%. These findings attest not only to the effectiveness of our method when leveraging all training images in the dataset but also to its proficiency in few-shot learning scenarios.

TABLE IX Recognition Accuracy (%) Under Different Numbers of Labeled Images
Table IX- Recognition Accuracy (%) Under Different Numbers of Labeled Images

Table X presents a comparison of our method with several other approaches used for SAR ship recognition. The methods we compare include the following. 1) Semisupervised learning [37], which leverages unlabeled data during training to reduce the reliance on labeled data. 2) Supervised [37], a variant of semisupervised learning that is fully supervised. 3) CNN [40], a conventional classification framework. 4) CNN+matrix [40], which combines a convolutional neural network with a matrix for improved performance. 5) PFGFE-Net [41] that achieves polarization fusion at input, feature, and decision levels to address the issue of insufficient utilization of polarization information. 6) MetaBoost [42], a method that employs a two-stage filtration system focusing on the generation and combination of “good and different” base classifiers. For the purpose of our comparison, following [42], we divided the number of training images for each class into three bands: 1–50, 51–100, and 101–338.

TABLE X Comparison of Performance (%) of Three Classes Under OpenSARShip
Table X- Comparison of Performance (%) of Three Classes Under OpenSARShip

From Table X, it is evident that our method further enhances the recognition performance across different sample quantity intervals. In the 1–50 quantity range, among the compared methods, semisupervised learning achieves an excellent performance of 61.88% when each class has 20 shots, whereas our method reaches 71.46%. In the 51–100 quantity range, CNN [40] and semisupervised learning [37] both deliver impressive performance, exceeding 68.50% when each class has 80 shots. Our method, on the other hand, reaches a recognition rate of 79.33%. Through comparison with other methods on the OpenSARShip dataset, it is clear that our method can achieve state-of-the-art performance.

From the above comparison, it is evident that our method can attain state-of-the-art recognition performance across a broad spectrum of training sample sizes. More significantly, our method clearly surpasses all other methods in both types of recognition tasks, regardless of the number of classes involved.

SECTION IV.

Conclusion

In conclusion, this article addresses the significant challenges in the increasingly explored field of SAR ATR under the constraints of limited data, primarily caused by the scarcity of abundant training samples. We have conducted a thorough analysis of the limitations inherent in existing SAR ATR methods, such as those relying on data augmentation, specialized modules, or contrastive learning-based loss functions. Our analysis underscores the vital importance of effectively utilizing limited supervision signals for precise target identification. We introduce a groundbreaking approach in the form of coarse-to-fine hierarchical feature learning tailored for SAR ATR with limited data. Our method is distinguished by its feature extractor that yields multilevel features and a unique coarse-to-fine gradual feature constraint. This approach enables individualized optimization of features at each level, significantly reducing the complexity of model parameter search and ensuring the robustness of features across all granularity levels. A key aspect of our methodology is the optimization of intraclass compactness and interclass separability of features at different levels, achieved by leveraging the consistency of features across these levels. This strategy of progressively enhancing feature effectiveness substantially elevates the overall performance of the model. Empirical experiments conducted on the MSTAR and OpenSARShip datasets have demonstrated the superior performance of our approach in scenarios with limited sample sizes. The robustness of our method has been further validated through comprehensive ablation studies. This research contributes meaningful insights to the ongoing advancements in SAR ATR with limited data, paving the way for more sophisticated, efficient, and effective solutions in the future.

Usage
Select a Year
2025

View as

Total usage sinceJul 2024:333
01020304050JanFebMarAprMayJunJulAugSepOctNovDec22212147150000000
Year Total:126
Data is updated monthly. Usage includes PDF downloads and HTML views.

References

References is not available for this document.