Scattering-Point Topology for Few-Shot Ship Classification in SAR Images

Synthetic aperture radar (SAR) has emerged as a critical technology for detecting and classifying objects, such as ships, in challenging environments. However, few-shot learning remains challenging due to the limited availability of labeled SAR data, complex radar backscatter, and variations in imaging parameters. In this article, we propose a novel network, scattering-point topology for few-shot ship classification (SPT-FSC), which addresses these challenges by incorporating scattering characteristics into the network learning process through a scattering-point topology (SPT) based on scattering key points. We design a topology encoding branch through a series of operations to encode the topological information of scattering points, resulting in an SPT embedding that improves the network's adaptability to the imaging mechanism and reduces imaging variability in SAR images. To effectively fuse the SPT embedding and image features extracted from a convolutional neural network, we introduce a novel mechanism named reciprocal feature fusion attention. In addition, to address the limited diversity in the training data, we apply fine-tuning-based methodologies and construct a fine-grained ship classification dataset by combining the OpenSARShip and FUSAR-Ship datasets. Our comprehensive experiments on these datasets demonstrate the effectiveness of our proposed SPT-FSC method, achieving high accuracy and robustness in few-shot ship classification tasks for SAR images, outperforming the existing methods in this domain.

penetrate through clouds and vegetation for detection and classification of objects, such as ships, vehicles, and buildings [1].SAR has become an important tool for applications, such as maritime surveillance, border control, and disaster response, where its ability to detect and classify targets in challenging environments is crucial for effective decision making [2].
Ship classification of SAR images is a challenging task due to the complexity of radar backscatter from ships and the presence of environmental clutter.Various techniques have been proposed for ship classification, including statistical analysis, texture analysis, and machine learning algorithms.At the same time, different features, such as shape, size, texture, and polarimetric characteristics, have been used to discriminate between different types of ships, including cargo ships, oil tankers, and other vessels [2].Statistical methods, such as principal component analysis and independent component analysis, have been applied to extract discriminative features from SAR images for ship classification [3].Texture analysis methods, such as gray-level cooccurrence matrix and local binary pattern, have also been used to capture the spatial variation of backscattering signals from ships [4].However, these conventional methods are limited in their ability to handle the complex nature of SAR backscattering signals and the variability of environmental conditions, leading to reduced accuracy and robustness for ship classification in SAR images [5].Machine learning algorithms, such as support vector machines (SVM), k-nearest neighbors (KNNs), and random forests, have been applied to automatically classify ships in SAR images with high accuracy [6], [7], [8].Some studies have also explored the use of hybrid approaches that combine multiple machine learning algorithms for ship classification, such as feature selection followed by SVM classification [9].
In recent years, deep-learning methods, such as convolutional neural networks (CNNs), have shown promising results for ship classification in SAR images, achieving high accuracy and robustness to variations in ship size, shape, and orientation [10].However, the limited availability of labeled SAR data remains a bottleneck for further progress in this field.Traditionally, deep-learning models for SAR image classification require a large amount of labeled data to achieve high performance.However, acquiring labeled data for SAR images is time-consuming and expensive, which limits the applicability of deep-learning models in practical scenarios.In SAR image classification, where the datasets are often limited and imbalanced, few-shot learning offers an effective solution to enhance classification model accuracy and robustness.Therefore, few-shot learning has emerged as a promising solution to address this issue by enabling deep-learning models to classify SAR images accurately with a limited amount of labeled data [11], [12].
Few-shot learning is a type of machine learning that aims to train models with the ability to learn from a small number of examples [13].By leveraging prior knowledge or metalearning, few-shot learning algorithms can effectively generalize to new tasks with limited labeled data [14].In the context of SAR image classification, few-shot learning approaches have been proposed to extract and leverage the underlying features of SAR images, which can be used to classify images with a limited number of labeled examples [15].Recently, various few-shot learning methods have been developed, including data augmentation, metalearning, and transfer learning.Data augmentation is a technique that generates additional training data by applying various transformations to the existing data [16].In few-shot learning, data augmentation can be used to generate more labeled examples from the limited training data.This approach has been shown to improve the performance of few-shot learning methods in various applications [17].Metalearning is another popular few-shot learning approach that aims to learn how to learn from a few examples [18].It involves training a model on multiple tasks and using the learned knowledge to quickly adapt to new tasks with only a few labeled examples.Metalearning has been successfully applied to various computer vision applications, such as object recognition and segmentation [19].Transfer learning is a technique that involves transferring knowledge from a pretrained model to a new task [20].In few-shot learning, transfer learning can be used to leverage the knowledge learned from a large dataset to improve the performance on a few-shot learning task with limited labeled data.Transfer learning has been widely used in various computer vision applications, including few-shot learning [21].
Few-shot learning has gained significant attention in recent years as it offers a promising approach to tackle challenges in various computer vision tasks.In the realm of image classification, few-shot learning techniques have shown remarkable adaptability to recognize objects or categories with limited labeled data.This capability is particularly valuable in domains where acquiring extensive labeled data is costly or impractical.Furthermore, few-shot learning is not limited to image classification but also extends its potential to image detection and segmentation tasks.In the field of image detection, few-shot learning empowers models to identify objects or regions of interest in images with only a few annotated instances, which is crucial for applications, such as object tracking and surveillance.And few-shot object detection based on fine-tuning has received widespread attention in recent years, such as TFA [22], FSCE [23], and DeFRCN [24].In addition, in image segmentation, where the goal is to segment objects or regions within an image, few-shot learning techniques have demonstrated their ability to adapt and generalize with limited labeled data [25].
Despite the success of existing few-shot learning methods, there are still several problems and challenges that need to be addressed for SAR images.First, the imaging mechanism of SAR images is different from optical images, resulting in different feature representations and classification challenges [26].Unlike optical images, SAR images contain multiple scattering effects and speckle noise, which can obscure the underlying features of the image [27].The difference in the imaging mechanism and the characteristics of SAR images require specific adaptations and modifications to existing few-shot classification methods [28].Second, SAR images exhibit significant variability due to changes in imaging parameters [29].SAR images are affected by various factors, such as the imaging mode, incidence angle, polarization, wavelength, and so on, which can result in significant variations in the image features.This variability makes it difficult to learn a robust representation that can be generalized to new classes with limited labeled data [30].Furthermore, the imaging variability of SAR images can also result in high intraclass variance and low interclass distinction, making it difficult to distinguish between similar classes [31].This variability can result in a high degree of ambiguity in few-shot classification, leading to poor performance.Third, the training data of few-shot classification in SAR images lacks diversity [27].SAR images are typically acquired over specific regions and at specific times, which can limit the diversity of the labeled data.When training a few-shot classifier on a small number of classes, the classifier may not be able to generalize well to new classes that are not similar to the training classes.This lack of diversity can lead to overfitting and poor generalization performance of few-shot classification models.
To address these challenges, a new network scattering-point topology for few-shot ship classification (SPT-FSC) is proposed in this article in SAR images.Different from the previous methods of directly applying the deep network model to the SAR images, SPT-FSC integrates the scattering characteristic into the network learning process.We propose scattering-point topology (SPT) based on scattering key points [32], and an SPT embedding that introduces information on additional features to the network.Meanwhile, SPT features and the image features extracted by the CNN belong to different modalities, so we need to better integrate the two features.Consequently, we introduce the reciprocal feature fusion attention (RFFA) to comprehensively integrate two distinct sets of features.Furthermore, to alleviate the issue of limited diversity in the training data of SAR images, we apply fine-tuning-based methodologies to exploit the knowledge gained from a larger dataset.In this regard, we amalgamate the OpenSARShip [33] dataset and the FUSAR-Ship [34] dataset to generate a ship classification dataset, encompassing an expanded array of categories and an augmented sample size.
The main contributions of our work can be summarized as follows.
1) An SPT method is proposed based on the scattering key points, which incorporates not only the position information of the scattering key points but also the distance information between the scattering points and the inherent information of each individual scattering point.2) SPT-FSC employs an effective topology encoding branch (TEB) to encode the topological information of scattering points to derive the SPT embedding.This approach was employed in few-shot classification to enhance the network's adaptability to the imaging mechanism and more effectively alleviate the imaging variability inherent in SAR images.3) To achieve the effective fusion of the SPT embedding and image features obtained from the CNN, a mechanism named RFFA was designed.RFFA is a mechanism that amplifies the representational capacity of neural networks by enabling them to learn from both directions of the input sequence.Extensive studies have been conducted on OpenSARShip and FUSAR-Ship to validate the effectiveness of the proposed method.In comparison with the baseline, the accuracy of fourway one-shot and four-way five-shot tasks is improved by 11.90% and 25.15% on the OpenSARShip dataset, respectively, and the accuracy of five-way one-shot and five-way five-shot tasks is increased by 24.08% and 30.72% on FUSAR-Ship dataset, respectively.

A. Few-Shot Classification Methods
Few-shot classification is the task of learning to classify new categories with only a few labeled examples per category.There are several methods that have been proposed to tackle this problem.Metalearning approaches include MAML [18], reptile [19], and prototypical networks [14].MAML learns a set of initial weights that can be quickly adapted to new tasks with few examples.Reptile performs a similar task but uses a simpler optimization method that is faster to compute.Prototypical networks learn a prototype for each class and classify new examples based on their distance from the prototypes.Matching networks [35], relation networks [36], and Siamese networks [37] [38], D2C [39], and ProtoGANs [40] are examples of techniques that use generative models.These methods learn to generate new examples for unseen categories based on the labeled examples.DynaGAN presents a few-shot-domain adaptation method for multiple target domains, which utilizes an adaptation module to dynamically adapt a pretrained GAN model to these domains.D2C introduces a paradigm for training unconditional VAEs for few-shot conditional image generation, which leverages diffusion-based priors and contrastive self-supervised learning to adapt to novel tasks with minimal labeled examples.Proto-GANs combine the prototype learning approach of prototypical networks with a generative model to generate new examples for unseen categories.Transfer-learning-based methods have shown promising results in addressing this problem.Baseline++ [41] is a state-of-the-art transfer-learning-based approach that aims to improve upon the baseline model by fine-tuning pretrained models on a small labeled support set.It also utilizes the cosine similarity metric to enhance the distance metric learning process.Metabaseline [42] is another transfer-learning-based method that utilizes metalearning to improve few-shot classification performance.It learns to adapt to new tasks by learning a set of parameters that can be fine-tuned quickly on new tasks.It also incorporates an auxiliary loss function to encourage the model to learn more discriminative features.
Given the outstanding performance observed in few-shot learning methods relying on transfer learning, we opted for a fine-tuning-based approach in our few-shot ship classification method.Our model's baseline aligns with the Baseline++ framework, excluding the utilization of a cosine classifier.

B. Few-Shot Classification in SAR Images
Few-shot classification in SAR images has been the subject of considerable research in recent years.Various approaches have been proposed to tackle this problem.One of the methods for few-shot SAR image classification is the instance-aware transformer (IAT) model [43].The IAT model incorporates instance-level information into the classification process, which can enhance the performance of few-shot classification in SAR images.Another approach involves utilizing a weighted distance and feature fusion strategy [44].The similarity between a query image and support images is evaluated using a distance metric through this method, and the accuracy of few-shot classification is improved by combining the features extracted from both the query and support images.The technique proposed for few-shot SAR image classification, known as the spatial transformed prototypical network, has been introduced in the network [45], which incorporates spatial transformations in the prototypical network to enhance its ability to recognize new classes.A fewshot learning framework is based on the prototypical network with a limited number of training samples [46], allowing it to learn a classifier capable of recognizing new classes with only a limited number of examples.A technique utilizing a metalearning approach with amortized variational inference has been proposed [47], which involves a metalearning framework to train a model capable of quickly adapting to new classes with only a few examples.The proposed method for few-shot classification in SAR images employs a metric learning approach that utilizes a Siamese network to train a similarity metric [48].Deep transfer learning has also been applied to few-shot SAR image classification [49].For the identification of few-shot SAR targets, Lu et al. [15] presented a deep CNN based on transfer learning.They utilized the refined model to categorize new cases after fine-tuning a pretrained network model on a few samples from new classes.Disentangled attention modules were used by Tai et al. [50] to selectively transfer characteristics from electro-optical samples to SAR samples, eliminating the need for additional SAR samples.In addition, in the current landscape, a number of deep-learning-based methods integrate object detection and classification.Several few-shot object detection methods for SAR images also include classification modules.Chen et al. [51] presented a novel few-shot SAR object detection framework based on metalearning, utilizing an attention mechanism to emphasize class-specific features and a dynamic relationship learning paradigm involving graph convolutional networks (GCNs) and orthogonality constraints to enhance feature similarity.Zhou et al. [52] proposed a novel FSODM method for optical remote sensing that includes a lightweight metafeature extractor (DarknetS) and an aggregation module (AggregationS) to improve SAR image feature representation and generalization for new classes.
Nevertheless, the majority of prior model optimization research has primarily utilized general models designed for natural scenes, without taking into account the distinct imaging characteristics inherent to SAR and optical images.Our method innovatively designs an SPT, thereby uniquely integrating the scattering characteristics of SAR in the learning process of the network.

C. Scattering Key Points for Deep Neural Network
The concept of key points in deep neural networks has emerged as a pivotal aspect of modern computer vision and machine learning research.Key points, often referred to as key points or landmarks, represent specific spatial locations in an image or feature map.They serve as salient reference points, capturing essential information about the underlying structure and content of an image.While the idea of key points has been presented in computer vision for decades, the advent of deep learning has revitalized its relevance and utility in transformative ways.One of the primary roles of key points in deep neural networks is feature extraction.CNNs use learned filters to identify these key points, enabling the networks to recognize patterns, objects, or structures within the data.This has led to remarkable progress in image classification, object detection, and segmentation tasks.Key points also facilitate spatial understanding, enabling models to infer the relative positions and orientations of objects in an image.
The integration of scattering key points into deep neural networks for the analysis of SAR images represents a significant breakthrough in remote sensing and computer vision.SAR images, characterized by their complex and texture-rich content, have long posed unique challenges for traditional computer vision approaches.The introduction of scattering key points marks a promising shift toward harnessing the full potential of deep neural networks in SAR image analysis.These key points serve as salient landmarks, allowing deep neural networks to effectively understand, classify, and interpret the content of SAR images.The use of scattering key points has led to remarkable strides in object detection and recognition within SAR images.
Recent studies [32], [53], [54], [55], [56] have proposed the use of a set of key points to model local scattering regions for guiding networks in SAR object detection and classification.An anchor-free network was proposed by Fu et al. [53] that uses scattering key points to guide network training and employs predicted scattering key point positions as offsets for deformable convolution modules.It has been found that the scattering key points obtained using the Harris corner detector roughly capture the structural characteristics and the distinguishing features of ships.The proposed SFR-Net [54] aims to address the completeness issue of aircraft detection results in SAR images by utilizing the scattering-point relation module (SPRM) to analyze and correlate discrete scattering points.The SPRM enables the extraction of characteristics and establishment of connections among scattering points, thereby overcoming the discreteness issue inherent to individual aircraft in SAR imaging.Sun et al. [55] propose a novel unified framework named SPAN for accurately locating and classifying ships in large-scale SAR images by capturing the distribution characteristics of strong scattering points in the ship area.SPAN utilizes a ship classification encoder module to extract the correlation and distribution characteristics between each scattering point and combines the features and distribution information of strong scattering points to recognize the ship category.A novel integrated framework named SCAN [56] was proposed for few-shot SAR aircraft classification, which includes a classification path and a scattering extraction branch that utilizes a scattering extraction module (SEM) to guide the network to learn the number and distribution of strong scattering points for each target type.The scattering extraction branch employs an SEM to improve the SAR target feature representation and optimizes the main classification task, which leads to better performance on novel category few-shot recognition tasks.The proposed SPG-OSD [32] method for ship detection in SAR images incorporates the scattering characteristics of SAR images to guide the network through an oriented two-stage detection module and a scattering-point-guided RPN.The innovative use of key scattering points in the RPN effectively reduces false alarms and missing ships, while the RPN predicts the position of these points and utilizes location information to extract features near them.
Most of the aforementioned research endeavors have leveraged scattering key points to facilitate network learning of pertinent features and enhance precise localization.However, it is imperative to acknowledge that, in practice, the distribution and spatial inter-relationships among scattering key points can divulge pivotal insights into the fundamental shape, boundaries, and internal composition of the target.Consequently, we introduce a topological structure founded upon scattering key points, aiming to elevate the representation of ships' structural characteristics.This enriched feature representation is subsequently applied within the domain of few-shot ship classification.

A. Problem Setup
In standard few-shot classification, the objective is to acquire knowledge of new concepts within a set of novel classes C novel , with only a limited number of samples, while having access to a labeled dataset of base classes C base containing a substantial number of images, where C base C novel = ∅.The test set is divided into two distinct subsets, namely the support set and the query set.The support set comprises of a limited number of annotated samples belonging to the newly introduced classes, while the query set contains unlabeled samples of the same label space.The main goal of few-shot classification is to precisely classify unlabeled query samples based on the information provided in the support set.Specifically, when the support set contains N classes and each class is represented by K labeled samples, the problem is referred to as N-way K-shot few-shot classification.At the same time, the query set contains Q samples per class drawn from the same N classes, and the objective is to classify the N × Q query images accurately into their respective N classes.
In our setup, we use three separate datasets: a training set, a support set, and a testing set.The label space for the support set and testing set is identical, while the training set has its own unique label space that is completely separate from both the support and testing sets.During the training phase, our training method is exactly the same as normal image classification, as shown in Fig. 1.It refers to training a classifier f θ with standard cross-entropy loss on all base classes.To accommodate the recognition of novel classes in the test stage, we replace the classification head of the model.Specifically, the original classification head used during training on base classes is replaced with a new one designed to handle unseen categories.This replacement ensures that the model's output aligns with the novel class labels during testing, facilitating accurate classification.To make the model capable of recognizing novel classes during testing, we fine-tune it using the N×K images from the support set.This fine-tuning process involves updating the model's parameters to minimize the loss, allowing it to become more attuned to the novel classes' characteristics.Notably, both the backbone part and the TEB remain frozen during the fine-tuning process.This means that only the fully connected layer is updated using the support set images from the novel classes, allowing it to adapt to the unique characteristics of these classes.Subsequently, we evaluate the model's performance on the N×Q images from the query set using the fine-tuned model.This assessment phase ensures that the model can effectively recognize and classify novel classes based on the adapted knowledge acquired during fine-tuning.

B. Overall Architecture
The overall architecture of the proposed network model primarily consists of three components: the backbone, the topology encoding branch named TEB, and the RFFA module named RFFA, as shown in Fig. 2. The backbone of the network is based on ResNet-12 [42], which is composed of four stages, each containing a residual block.Each residual block consists of three convolutional layers with a 3 × 3 kernel, followed by batch normalization and Leaky ReLU activation with a 0.1 slope.The number of channels in the convolutional layers for each residual block is 64, 128, 320, and 640, respectively.A 2 × 2 max-pooling layer is applied after each residual block, and a 5 × 5 global average pooling is used at the end to obtain a 640-dimensional (640-D) feature vector.This architecture is consistent with recent works and is kept simple without additional modifications, such as Drop-Block and wider channels.The TEB includes two layers of GCNs [57] and an MLP-mixer [58] layer to extract the SPT embedding.The purpose of this branch is to capture the intricate spatial relationships among the scattering key points in the SPT.The RFFA module is designed to effectively fuse the SPT embedding with the features extracted from the CNN.This fusion allows for better concatenation of the features before they are input into a fully connected layer for predication.

C. Scattering-Point Topology
The SPT method is developed based on the principle that strong scattering points possess substantial significance and stability, containing an abundance of scattering information.This approach aims to explore the stable scattering characteristics of various targets, such as ships, to better understand and represent their structural and shape features.To initiate this process, the Harris detector [59] is applied to identify important points that reflect the structural outline of the target, as illustrated in Fig. 3(c) and (d).This process results in a set of points that capture the shape and structure of the object.Subsequently, we apply the Kmeans clustering algorithm to classify the extracted points into N distinct clusters with high scattering intensities, as illustrated in Fig. 3(e) and (f).This step effectively eliminates redundant points and derives a more regular structure, which is essential for further analysis.The N cluster centers are defined as the scattering key points, denoted as P = {p 1 , p 2 , . . ., p N }.These points represent the local distributions of scattering intensity and structural characteristics of the targets.By capturing these local features, the method is able to enrich the discriminative features of ships, leading to improved classification and recognition performance.Following the clustering process, the scattering key point coordinate V = {(x 1 , y 1 ), (x 2 , y 2 ), . . ., (x N , y N )} of the scattering key points on the original images is obtained for establishing SPT.
In effect, there is geometric structure information for the distribution of key scattering points of a target.Key scattering points of ships of the same category have comparable topology structure, which can also be confirmed by a similar distribution of key scattering points of the two containers in Fig. 3(e) and (f).Therefore, we construct a topology structure based on the scattering key points so as to fully utilize the distribution information of the scattering key points to characterize the structural characteristics of the ship.First, for a set of scattering key points P , calculate the Euclidean distance d(p i , p j ) between each pair of scattering key points.This can be represented by the following equation: ( For each scattering key point p i , according to its Euclidean distance with other scattering key points, determine its KNN data points.Determined using the following formula: where N i represents the index of K scattering key points closest to scattering key point p i .And then, using the information of KNNs, construct an adjacency matrix A ∈ R N ×N , where A(i, j) indicates whether there is a connection between scattering key points p i and p j .In general, A(i, j) is 1 if p j is one of the KNNs of p i , and 0 otherwise, which can be expressed as follows: Finally, using Dijkstra's algorithm, starting from each scattering key point p i , calculate the shortest path distance to other scattering key points to obtain the distance matrix E ∈ R N ×N , Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where e ij ∈ E represents the geodesic distance from p i to p j .This forms an undirected graph T = {P, E}, also named as the SPT.Visualization of the SPT is shown in Fig. 3(g) and (h).This topology structure provides valuable insights into the distribution patterns of the scattering key points as well as their positional relationships.By analyzing these relationships, it is possible to characterize the structural and shape features of ship targets more accurately.

D. Topology Encoding Branch
After constructing the SPT, it is essential to access the feature map from the backbone's Stage 1 and identify the 64-D feature vectors corresponding to each scattering key point based on their coordinates V .These vectors serve as the attributes of the nodes in the topology, with the corresponding vector matrix denoted as F ∈ R N ×64 .With the complete SPT in place, denoted as T = {P, F, E}, we can proceed to input it into the TEB, as shown in Fig. 4.
The branch's first layer comprises a graph convolutional layer with the same number of input and output channels.This layer is followed by another graph convolutional layer, which compresses the output channels down to 32 dimensions.This process effectively extracts and condenses the features, yielding N 32-D feature vectors, denoted as F ∈ R N ×32 , which is expressed as with di = 1 + j∈N (i) e j,i , where f i represents the ith row of F , f j represents the jth row of F , Θ represents the trainable weights, N (i) represents all neighbor nodes of i, and e j,i denotes the edge weight from source node j to target node i.However, directly concatenating these vectors would introduce a dependency on the order of input feature points, resulting in an SPT embedding that could exhibit variations for different targets of the same category.In this scenario, scattering features representing the same feature area might appear in different positions, potentially compromising the effective utilization of ship scattering features.This limitation arises due to the constrained feature extraction capabilities of the subsequent fully connected layer.
To confront and mitigate this challenge while bolstering the robustness of the embedding, the final layer incorporates a dedicated MLP-mixer layer.This MLP-mixer layer is meticulously engineered to further process the feature vectors while maintaining the same number of output channels as the input channels.The mixer layer comprises two types of MLP sublayers: channel-mixing MLPs and token-mixing MLPs.Within the mixer layer, there are two distinct categories of MLP sublayers, namely channel-mixing MLPs and token-mixing MLPs.These two sublayer types are designed to serve specific purposes in enhancing feature fusion and communication.The channelmixing MLPs are responsible for facilitating communication among different channels.They operate independently on each token, treating individual rows of the data table as their inputs.In contrast, the token-mixing MLPs are geared toward enabling communication across different spatial locations or tokens.They work independently on each channel and utilize individual columns of the data table as their inputs.These two types of sublayers are intentionally interleaved within the mixer layer.This arrangement ensures that both channel-level and spatial-level interactions are effectively harnessed, facilitating a comprehensive and holistic fusion of information across both input dimensions.
Distinct from a conventional MLP, which primarily extracts features from individual vectors, the MLP-mixer introduces an innovative capability: it facilitates communication between different spatial locations, specifically the features of scattering key points located at varying positions.This communication mechanism empowers the comprehensive fusion of feature attributes from scattering key points situated in diverse areas.This fusion can be mathematically expressed as follows: (5) Here, σ is an elementwise nonlinearity (GELU [60]).Ultimately, by concatenating these N 32-D feature vectors, we obtain an SPT embedding that is largely independent of the scattering-point placements.

E. Reciprocal Feature Fusion Attention
After completing the feature extraction process within the TEB, we transition from the SPT to the SPT embedding.Nevertheless, a pivotal challenge arises when it comes to effectively amalgamating the SPT embedding with the visual information, which has been extracted by the CNN.This challenge is critical to address as it determines the model's ability to synergize both visual and topological aspects.On the one hand, SPT embedding encapsulates the topological characteristics of the target, providing essential insights into its structure and organization.On the other hand, the visual information obtained through the CNN offers rich details and patterns that are crucial for accurate classification and recognition tasks.Bridging the gap between these two types of data and ensuring that they complement each other seamlessly is of paramount importance to improve the overall performance of the model.
To tackle this challenge, we introduce the innovative concept of RFFA.RFFA serves as a mechanism to facilitate the fusion of visual information and SPT embedding, thereby creating a unified representation.This integration plays a pivotal role in enabling the subsequent fully connected neural network to acquire a more comprehensive understanding of the target by simultaneously learning from both its visual and topological features.By incorporating two interactive cross-attention [61] functions within the RFFA module, we facilitate cross-modality learning, as shown in Fig. 5.These cross-attention functions are meticulously designed to discern subtle alignments and correspondences between the topological and visual information.They effectively identify and enhance the features that are corroborated and confirmed by both modalities, thus enriching the final fused representation.
In practice, the RFFA process starts with layer normalization to ensure consistent scaling and enhance training stability.Following this, the input features from both modalities are processed through a shared fully connected layer, projecting them onto a shared inner dimension.We denote the modality of the topology and visual representations with subscripts T and V, resulting in H l T and H l V , respectively.The overarching concept is to enhance the importance of visual features within a target when their relevance is substantiated by the semantic context provided by the topology features.To formalize this, we introduce a cross-attention function, expressed mathematically as follows: In this formulation, q, k, and v represent the query, key, and value functions, respectively, while d k is the dimension of the key vectors.After applying the cross-attention functions, we utilize fully connected layers to separately project the topology Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.features and visual features back to their original dimensions, thereby obtaining the new topology features and visual features.To generate the final output, we concatenate these updated features, effectively combining the relevant information from both modalities to produce a more discriminative and robust representation.

A. Datasets 1) OpenSARShip Dataset:
The OpenSARShip dataset is a comprehensive collection of Sentinel-1 SAR images specifically designed for ship interpretation tasks.The dataset is compiled from Sentinel-1 images acquired across various sea areas, environmental conditions, and seasons, making it a valuable resource for research in maritime surveillance and related applications.This dataset comprises over 60 000 SAR image chips, with each chip measuring 256 × 256 pixels.The chips contain approximately 20 000 ships, with detailed annotations, including ship position, length, and width, as well as relevant environmental information, such as wind speed and wave height.The dataset also includes a wide range of ship types, such as bulk carriers, container ships, fishing vessels, oil tankers, passenger ships, and tugs, among others.Excluding a few categories with particularly small sample sizes, we utilized eight different classes from the dataset.We randomly selected four classes for the training set and the remaining four classes for the test set, which is shown in Table I.
2) FUSAR-Ship Dataset: The FUSAR-Ship dataset is a highresolution SAR automatic identification system (AIS) matchup dataset built using Gaofen-3 SAR images and associated AIS data.The dataset is specifically designed to facilitate research in ship detection, recognition, and maritime surveillance by providing high-resolution SAR images with accurate ship position and identification information from corresponding AIS data.FUSAR-Ship contains over 10 000 SAR image chips, with each chip measuring 500 × 500 pixels.These images are acquired from Gaofen-3, a high-resolution SAR satellite with a spatial resolution of up to 1 m.The dataset covers various sea areas, environmental conditions, and ship types, enhancing its versatility for algorithm development and evaluation.The AIS data included in the dataset provide ship identification, position, course, speed, dimensions, and other relevant attributes.Following the general settings in FSL, we utilized ten different classes from the dataset.We randomly selected five classes for the training set and the remaining five classes for the test set, which is shown in Table I.
3) Open-FUSAR Dataset: To address the challenge of finegrained classification in SAR images, where the subtle differences between various classes make it difficult for few-shot learning models to effectively learn discriminative features, we have integrated the aforementioned two datasets into a unified dataset called the Open-FUSAR dataset.During the integration process, we combined instances from the same categories across both datasets to increase the diversity and the number of samples for each class.Meanwhile, distinct categories were kept separate to maintain a broad range of ship types for classification tasks.After the integration, we randomly selected 13 classes to be included in the training set, ensuring a sufficient number of examples for each class to facilitate the effective training of the model.The remaining six classes were designated as the test set, which enabled us to evaluate the performance of the trained model on unseen data and assess its generalization capabilities, which is shown in Table I.By combining and processing the two datasets in this manner, we aimed to enhance the model's learning ability by exposing it to a diverse set of examples across various ship types, environments, and conditions.Furthermore, this approach allows researchers to develop more robust and accurate finegrained classification algorithms for SAR images, ultimately improving maritime surveillance and related applications.

B. Implementation Details
In our experiments, we employ a testing setup using fourway one-shot and four-way five-shot configurations on the OpenSARShip dataset, using five-way one-shot and five-way five-shot configurations on the FUSAR-Ship dataset, and using six-way one-shot and six-way five-shot configurations on Open-FUSAR dataset and with five query images per class.The input images are resized to a fixed resolution of 84 × 84 pixels.During the training phase, we set the number of epochs to 50.For both the training and fine-tuning processes, we utilize the stochastic gradient descent algorithm as the optimizer.The optimizer is configured with a learning rate of 0.01, momentum of 0.9, dampening of 0.9, and weight decay of 0.001.The fine-tuning process is executed with a total of 150 steps for one-shot and 600 steps for five shots.In the testing phase, we set the number of test episodes to 100 and employ a batch size of 4. In the context of the SPT, each scattering-point's feature dimension extracted by the TEB is set at 32. Consequently, the dimensionality of the TEB output equals N×32.In parallel, the output of the backbone network is flattened, resulting in 640-D vectors.To maintain consistency, we have set the default number of scattering key points (N) to 20, ensuring that the dimensionality of the TEB output matches the output dimension of the backbone network.As a result, the SPT embedding shares the same dimensionality as the backbone output.Upon concatenation of the SPT embedding and the backbone output, the combined feature vector attains a dimensionality of 1280.This feature vector is subsequently input into the fully connected layer for further processing.In addition, we have set the default number of nearest neighbor nodes (K) for the SPT at 3.

C. Evaluation Metric
For each episode, the accuracy (Acc j ) is computed as the ratio of the number of correctly classified query images to the total number of query images.The computation process is presented in (8), where N q denotes the total number of query images.The variables p i and t i represent the predicted class and ground truth class of the ith query image, respectively.The indicator function II is equal to 1 if p i = t i and 0 otherwise To mitigate the impact of randomness, the experiment is conducted multiple times, and the final accuracy is determined as the average of all episode accuracies (Acc j ), as shown in (9).The variable N in the equation refers to the number of repetitions of the experiment.In the main experiment, N is set to 100 for all models, providing a more robust evaluation By utilizing this evaluation metric, we can achieve a more reliable and precise estimation of the models' performance.

D. Ablation Studies
We conduct a series of ablation experiments on three datasets to analyze the effectiveness of each proposed component in SPT-FSC.All subsequent experiments maintain the same settings to make a fair comparison.Specifically, we explore the effects of different numbers of scattering key points, varying numbers of nearest neighbors, the utilization of TEB, including GCN and MLP-mixer, and the impact of RFFA.By systematically examining these aspects, we aim to gain deeper insights into the underlying mechanisms of SPT and provide valuable guidance for optimizing its performance.

1) Effects of Different Numbers of Scattering Key Points:
In the analysis of the effects of different numbers of scattering key points on classification accuracy, it can be observed that an increase in the number of scattering key points generally leads to an improvement in classification accuracy for both the one-shot and five-shot scenarios, as shown in Table II.However, the improvement tends to plateau or even slightly decline when the number of key points surpasses a certain threshold, such as N = 20 for the OpenSARShip and FUSAR-Ship datasets.And the performance improvement is more pronounced in the five-shot scenario compared with the one-shot scenario.Furthermore, the impact of increasing the number of scattering key points on classification accuracy appears to be more significant for the Open-FUSAR dataset compared with the OpenSARShip and FUSAR-Ship datasets.This demonstrates the effectiveness of expanding the base class size in the dataset to address the lack Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II EFFECTS OF DIFFERENT NUMBERS OF SCATTERING KEY POINTS TABLE III EFFECTS OF DIFFERENT NUMBERS OF NEAREST NEIGHBORS
of diversity in the training data for few-shot classification in SAR images.
Fig. 6 offers a visualization of the SPT with different numbers of scattering key points, showing that the topology becomes more intricate as the number of key points increases.This increased complexity allows the method to capture more discriminative features, resulting in better classification performance.Nonetheless, it is crucial to strike a balance between the complexity of the SPT and the risk of overfitting, as utilizing too many scattering key points may lead to a decline in generalization capabilities.
In conclusion, the increase in the number of scattering key points enhances the classification accuracy of the SPT method, particularly in the five-shot scenario.However, it is essential to find an optimal balance between the complexity of the SPT and the model's generalization capabilities, as excessive complexity may lead to overfitting and reduced performance on unseen data.
2) Effects of Different Numbers of Nearest Neighbors: In this section, we discuss the effects of using different numbers of KNNs on the performance of the SPT method.Table III presents the experimental results for different values of K, and Fig. 7 provides a visualization of the SPT with varying numbers of nearest neighbors.From Table III, we can observe that the performance of the SPT method varies with the number of nearest neighbors, which indicates that the selection of K plays a significant role in the performance of the method.These observations suggest that an optimal number of nearest neighbors exists for each dataset.When the number of nearest neighbors is too low or too high, the performance of the model deteriorates.The optimal number of nearest neighbors seems to be dataset specific, as the best value varies between the three datasets.The visualization of the SPT with different numbers of nearest neighbors in Fig. 7 further supports these observations.The underlying reason for these patterns may be related to the tradeoff between local and global information.A smaller number of nearest neighbors emphasizes local features, while a larger number of nearest neighbors captures more global information.It is crucial to find the right balance between local and global information to achieve the best performance in recognizing and classifying ships in SAR images.
The visualizations demonstrate how the SPT changes as the number of nearest neighbors increases.We can see that with a smaller K value, the topology appears sparser, while with a larger K value, the topology becomes denser.This observation suggests that the selection of K influences the complexity of the SPT, which, in turn, affects the representation of the structural and shape features of the ship targets.
In conclusion, the results indicate that selecting an appropriate number of nearest neighbors is crucial for the SPT method to effectively capture the structural and shape features of ship targets.By choosing an optimal K value, the method can achieve better classification and recognition performance.The potential pattern underlying these results may be that the optimal K value is dataset-dependent, meaning that it could be influenced by the specific characteristics of the dataset being used.
3) Effects of TEB: In this section, we delve into the analysis of the effects of employing GCN and the MLP-mixer in the TEB of our proposed model.Our primary focus is on understanding how these components enhance the performance of the model, especially in scenarios where the SPT plays a crucial role.
To provide some context, we began by introducing RFFA to the baseline model and conducted experiments.RFFA, in this context, operates by taking image features extracted by CNN as its inputs.It effectively acts as a self-attention mechanism, reinforcing the output features of the network.Our results, as shown in the second row of Table IV, indicated a noticeable improvement in model performance after adding RFFA.This improvement can be attributed to the ability of RFFA to enhance the discriminability of the model's output features.
Building upon this, we further enhanced the model by introducing TEB based on the baseline with RFFA, where GCN was employed.As reflected in the third row of Table IV

, this
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV ABLATION FOR KEY COMPONENTS PROPOSED IN SPT-FSC
addition resulted in a significant boost in performance compared to when both inputs of RFFA were image features.The test accuracy increased by 4.95% (one shot) and 10.6% (five shots) on OpenSARship, 6.28% (one shot) and 12.88% (five shots) on FUSAR-ship, and 1.56% (one shot) and 7.48% (five shots) on Open-FUSAR, respectively.This demonstrated the effectiveness of introducing topological features into the model, highlighting the importance of SPT in the task.
Visualizations using t-SNE on the Open-FUSAR dataset, as shown in Fig. 8, further corroborate our experimental findings.Baseline feature embeddings exhibit aliasing issues [see Fig. 8(a) and (b)], which are alleviated to some extent with the addition of RFFA [see Fig. 8(c) and (d)].However, it is only when we incorporate both RFFA and TEB (including GCN and MLP-mixer) that clear boundaries between different categories emerge in the feature embeddings [see Fig. 8(e) and  (f)].This suggests that the MLP-mixer significantly enhances the discriminative capabilities of the feature embeddings, making it easier to distinguish between different classes.This also proves that the role of the MLP-mixer in addressing the issue of input feature point order in the SPT embedding is noteworthy.By facilitating communication between different spatial locations, the MLP-mixer enables the comprehensive fusion of feature attributes from scattering key points located in different areas.This ultimately results in a more robust SPT embedding that is less dependent on the specific positions of scattering points.Consequently, the inclusion of an MLP-mixer in the TEB contributes significantly to the overall enhancement of the model's performance across all datasets and scenarios.
Visualizations using t-SNE on the Open-FUSAR dataset, as shown in Fig. 8, further corroborate our experimental findings.Baseline feature embeddings exhibit aliasing issues [see Fig. 8(a) and (b)], which are alleviated to some extent with the addition of RFFA [see Fig. 8(c) and (d)].However, it is only when we incorporate both RFFA and TEB (including GCN and MLP-mixer) that clear boundaries between different categories emerge in the feature embeddings [see Fig. 8(e) and (f)].This suggests that MLP-mixer significantly enhances the discriminative capabilities of the feature embeddings, making it easier to distinguish between different classes.This also proves that the role of MLP-mixer in addressing the issue of input feature point order in the SPT embedding is noteworthy.By facilitating communication between different spatial locations, MLP-mixer enables the comprehensive fusion of feature attributes from scattering key points located in different areas.This ultimately results in a more robust SPT embedding that is less dependent on the specific positions of scattering points.Consequently, the inclusion of MLP-mixer in the TEB contributes significantly to the overall enhancement of the model's performance across all datasets and scenarios.

4) Effects of RFFA:
The main goal of RFFA is to enhance the discriminative power of the fused features by considering the reciprocal relationships and contributions of the features from different sources, allowing the model to focus on the most relevant and informative parts of each feature set.
The incorporation of RFFA in our model is aimed at elevating the discriminative power of fused features by considering the reciprocal relationships and contributions of features from diverse sources.This strategic approach enables the model to prioritize the most relevant and informative aspects of each feature set.The impact of RFFA on the network's output features is clearly evident from the second row of the table, where RFFA is applied to the baseline model.In this scenario, RFFA operates on two inputs, which are the image features initially extracted by the CNN.The test accuracy experiences substantial improvements, with gains of 3.7% (one shot) and 10.1% (five shots) on OpenSARShip, 7.72% (one shot) and 12.76% (five shots) on FUSAR-ship, and 1.78% (one shot) and 7.48% (five shots) on Open-FUSAR, respectively.These results underscore the significant enhancement effect that RFFA has on the model's output features.
Fig. 8(g) and (h) presents t-SNE visualizations of the output features fused by simple concatenation on the Open-FUSAR dataset.In this context, the feature embedding of baseline with TEB, including GCN and MLP-mixer, exhibits substantial aliasing, similar to the baseline.This observation underscores that while SPT embedding contains a wealth of feature information, it primarily pertains to topological features, which are inherently different from the visual features extracted by the CNN.Directly concatenating these visual and topological features may not efficiently leverage the topological information and could potentially hinder the extraction of visual features.In contrast, the t-SNE visualizations for both one-shot and five-shot scenarios [see Fig. 8(i) and (j)] demonstrate that features fused by RFFA are more discriminative and better separated compared with those fused by simple concatenation.Clear boundaries between different categories emerge, resulting in significantly improved classification performance.Consequently, the integration of RFFA for feature fusion plays a pivotal role in effectively amalgamating these distinct modalities and substantially enhancing the model's overall performance.

E. Comparison With State-of-the-Art Methods
To validate the effectiveness of SPT-FSC, we compared our method with eight state-of-the-art few-shot classification methods in the three datasets, as shown in Table V.The backbone for all the compared methods utilizes ResNet-12, and the training and testing parameters align with those employed in this article.To ensure a fair comparison, all hyperparameters for our method and other methods are set to their default values.In comparison with state-of-the-art methods, the number of scattering key points (N) is fixedly set to 20, and the number of nearest neighbors (K) is fixedly set to 3. The parameter temperature of the cosine classifier in Baseline++ is set to 10.The parameter margin in NegMargin is set to −0.01.The relation module of RelationNet consists of two convolutional blocks and two fully connected layers.Each convolutional block is a 3 × 3 convolution followed by batch normalization, ReLU nonlinearity, and 2 × 2 max-pooling.MAML uses Adam as the metaoptimizer with a fixed step size α = 0.01.These settings are essential for consistency in our evaluations.
Based on the experimental results presented in the table, we can provide an in-depth analysis of the performance of various few-shot classification methods, including metalearning and transfer-learning strategies, on the OpenSARShip, FUSAR-Ship, and Open-FUSAR datasets.For the OpenSAR-Ship dataset, the proposed SPT-FSC method significantly outperforms all other methods in both the one-shot and five-shot scenarios, achieving 47.75% (±2.47) and 68.55% (±2.36) accuracy, respectively.This demonstrates the effectiveness of the SPT, TEB, and RFFA in capturing detailed information from SAR images.Similarly, on the FUSAR-Ship dataset, SPT-FSC also achieves the best performance in both the one-shot and fiveshot settings, with accuracies of 55.76% (±2.28) and 68.04% (±1.64), respectively.The results again showcase the strength of our proposed method in handling high-resolution SAR images with accurate ship position and identification information.In the case of the Open-FUSAR dataset, the SPT-FSC method maintains its superior performance, achieving the highest accuracy in both the one-shot and five-shot scenarios, with 59.48% (±2.61) and 81.48% (±1.59), respectively.This highlights the benefits of combining the OpenSARShip and FUSAR-Ship datasets to create a more diverse and fine-grained ship classification dataset.In general, our proposed SPT-FSC method demonstrates superior performance across all datasets and few-shot settings.This can be attributed to the effective integration of SPT information and RFFA mechanism, which enhances the network's adaptability to the imaging mechanism and alleviates the imaging variability inherent in SAR images.
The t-SNE visualization of the output features in the Open-FUSAR dataset provides valuable insights into the performance of various methods, as illustrated in Fig. 9.When the image classification model has effectively learned meaningful features from the data, the t-SNE visualization tends to reveal distinct separations between different classes within the embedding space.Notably, in Fig. 9(a), the feature embedding of ProtoNet for one-shot learning and, in Fig. 9(d), Baseline++ for five-shot learning displays highly concentrated distributions with the presence of outliers.Conversely, in Fig. 9(b), the feature embedding of ProtoNet for five-shot learning and, in Fig. 9(c), Baseline++ for one-shot learning exhibits more dispersed distributions, but Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.with a notable degree of overlap among classes.Moving on to Fig. 9(e), the feature embedding of SPT-FSC for one-shot learning and, in Fig. 9(f), SPT-FSC for five-shot learning demonstrates even more scattered distributions, with clear boundaries between different class embeddings, significantly mitigating the issue of overlap.It is worth highlighting that Fig. 9(f), representing SPT-FSC for five-shot learning, exhibits fewer instances of overlap compared with Fig. 9(e), representing SPT-FSC for one-shot learning, indicating that the number of support set samples has an impact on model performance, which aligns with the accuracy results presented.In direct comparison among ProtoNet, Baseline++, and SPT-FSC, it becomes evident that the proposed SPT-FSC method consistently leads to superior separation of different classes, whether in the one-shot or fiveshot learning settings.These results underscore the critical role of feature in enhancing classification accuracy.

TABLE V COMPARISON WITH STATE-OF-THE-ART METHODS
To provide a more comprehensive illustration of the experimental results, Fig. 10 presents the confusion matrices for ProtoNet, Baseline++, and SPT-FSC applied to the OpenSARship dataset.Analyzing Fig. 10(a)-(d), it becomes evident that the classification accuracy is notably higher for the "Pilot" and "Passenger" categories, while it is relatively lower for the "Dredging" and "Search" categories.Notably, the overall accuracy for the "Search" category is particularly low.This disparity in accuracy may be attributed to the inherent challenges posed by the small size of ships within the "Search" category, making them difficult to distinguish.However, in Fig. 10(e) and (f), we observe significant improvements in the recognition accuracy across all categories.For the four categories of "Dredging," "Pilot," "Passenger," and "Search," the accuracy has increased substantially, with gains exceeding 14.8%, 9.4%, 15.4%, and 9.0% for the one-shot scenario, respectively, and even more remarkable gains of 32.4%, 28.6%, 26.2%, and 33.4% for the five-shot scenario.Notably, SPT-FSC outperforms in the fiveshot scenario across all four categories, achieving classification accuracies exceeding 70% in each.Furthermore, the classification accuracy for "Dredging" and "Search" categories surpasses that of the "Pilot" and "Passenger" categories in SPT-FSC.This observation highlights SPT-FSC's proficiency in capturing and leveraging fine-grained local features, enabling it to excel in distinguishing small and intricate targets.In essence, these findings underscore the effectiveness of SPT-FSC in improving classification accuracy and its potential for enhancing the model's performance in scenarios with challenging and intricate visual distinctions.
Comparing the metalearning approaches (MatchingNet, Pro-toNet, RelationNet, and MAML) with transfer-learning methods (Baseline, Baseline++, and MetaBaseline), we observe that transfer-learning-based methods generally perform better.This indicates that transfer learning is more suitable for handling the scarcity of labeled examples in few-shot scenarios, especially for the SAR image of ship domain.Among the metalearning methods, MAML shows relatively consistent performance across the three datasets.MatchingNet, ProtoNet, RelationNet, and MAML perform similarly across all datasets, with a slight variation in performance.This suggests that these metalearning approaches have comparable abilities in learning discriminative features from few-shot examples.However, their performance is still outperformed by the transfer-learning-based methods, particularly our proposed SPT-FSC approach.Baseline and Baseline++ demonstrate improved performance compared with metalearning approaches, especially in five-shot classification tasks.This indicates that transfer-learning-based methods are more effective in few-shot classification tasks.The Baseline++ method outperforms the Baseline method, showing the importance of fine-tuning pretrained models on the small support set and utilizing the cosine similarity metric to enhance the distance metric learning process.The NegMargin method shows a similar performance to Baseline++ in five-shot classification tasks but has a slightly worse performance in the one-shot classification tasks.This suggests that although it is an effective approach, it might not be as robust as Baseline++ in few-shot classification tasks.MetaBaseline shows competitive performance compared  with Baseline and Baseline++, but it still falls short of our proposed SPT-FSC method.This indicates that incorporating metalearning techniques and auxiliary loss functions can improve few-shot classification performance but might not be sufficient to address the challenges posed by SAR image classification tasks.Our proposed SPT-FSC method outperforms all other methods across all datasets and both one-shot and five-shot classification tasks.This indicates that incorporating the SPT, SPT embedding, and RFFA significantly enhances the model's ability to learn discriminative features from few-shot examples in SAR images.In addition, the integration of OpenSARShip and FUSAR-Ship datasets helps the model address the limited diversity in training data and further improves classification performance.
In order to conduct a comprehensive comparison of various methods employed in the few-shot SAR image classification task, we assessed several approaches, namely, the conv-BiLSTM prototypical network (CBLPN) [63], hybrid inference network (HIN) [64], and mixed loss graph attention network (MGA-Net) [65].These methods were evaluated using the OpenSARship dataset, with experiments consistently employing three-way one-shot and three-way five-shot settings for consistency and comparability.The specific configurations of base categories and novel categories are presented in Table VI.In contrast to other few-shot SAR automatic target recognition (ATR) methods, SPT-FSC has made specific parameter choices by setting the number of scattering key points (N) to 30 and the number of nearest neighbors (K) to 2. These parameter selections were informed by the thorough analysis of the effects of varying N and K, which were examined in ablation experiments performed previously on the OpenSARShip dataset.Upon analyzing the results in Table VII, it becomes evident that the proposed method, denoted as SPT-FSC, achieved the highest average accuracies, reaching 70.80% in the three-way one-shot scenario and 88.80% in the three-way five-shot scenario.These results represent a significant improvement over the existing few-shot SAR ATR methods, surpassing them by at least 0.51% and 9.11%, respectively.Notably, SPT-FSC demonstrates substantial advantages in the three-way five-shot scenario, indicating its robust performance.Furthermore, it is worth noting that CBLPN, HIN, and MGA-Net also exhibit competitive performance.CBLPN utilizes long short-term memory to integrate sequence information from SAR images after embedding, while HIN and MGA-Net employ graph networks to model relationships among samples.However, a limitation of these methods is their failure to explore fine-grained features thoroughly, resulting in biased feature distribution estimation for novel categories.This suggests that there is room for improvement in these existing approaches to further enhance their performance in few-shot SAR image classification tasks.

V. DISCUSSION
In this article, we introduced the SPT-FSC method, which addresses the challenges of few-shot ship classification in SAR images by leveraging SPT based on scattering key points.While our approach shows promise, it is essential to acknowledge its potential limitations and suggest directions for further improvement.
One of the key limitations of our method is the parameter selection process.The choice of parameters, such as the number of scattering key points and nearest neighbors, plays a crucial role in optimizing the model's performance.However, these optimal settings may not be universally applicable and can vary depending on the specific dataset and task at hand.To mitigate this limitation, future research should focus on developing automated techniques for parameter adaptation.These techniques can reduce the reliance on manual fine-tuning and enhance the model's adaptability to different scenarios.
The introduction of SPT, TEB, and RFFA has positive impacts on the model's training process, enhancing its ability to understand topological information and potentially reducing the number of training epochs required for convergence.However, these improvements come with negative impacts as well, including increased computational burden during training due to additional computations for SPT and RFFA, which can extend training times, particularly when dealing with complex SAR data or large datasets and model architectures.
Expanding the applicability of the SPT concept is another promising avenue for improvement.While the SPT-FSC method is tailored for ship classification, it is worth exploring the extension of the SPT approach to classify a broader range of objects in SAR images.Investigating the adaptability and generalization of SPT to different targets can enhance the versatility and utility of the method.One of the primary motivations for introducing SPT in the realm of few-shot classification is the inherent challenge posed by a limited number of training samples.In such scenarios, traditional deep-learning networks often encounter difficulties in thoroughly learning the distinctive features and characteristics of the target class during the training phase.Consequently, these networks may exhibit suboptimal performance when tested on unseen data.The incorporation of SPT directly addresses this limitation by enriching the network's knowledge base with additional information, thereby bolstering its capacity to learn and generalize effectively.While the potential application of SPT to general image classification is a noteworthy consideration, its impact may be more pronounced in few-shot classification settings.This distinction arises from the substantial differences in data availability between these two scenarios.General image classification typically benefits from access to extensive datasets, enabling deep-learning networks to acquire comprehensive feature representations through an abundance of training samples.In such cases, the existing feature extraction mechanisms within the network are already relatively well equipped to handle the classification task.Consequently, the introduction of SPT may not yield as substantial an improvement in standard image classification scenarios, as the network's inherent feature learning capabilities are relatively robust.
Furthermore, integrating the SPT-FSC method with SAR object detection techniques represents a significant opportunity for future research.The current trend in deep-learning networks often combines object detection and classification.By incorporating SPT-based classification techniques into object detection networks, we can enhance the accuracy of object recognition in SAR imagery.This integration can lead to more precise and robust target identification, particularly when dealing with SAR images captured under varying conditions and from different angles.
Moreover, leveraging the topological information encoded by SPT can improve SAR object detection itself.For example, it can facilitate more accurate target localization, which is crucial for applications, such as tracking and situational awareness.In addition, SPT can assist in distinguishing genuine targets from false alarms, thereby elevating the overall accuracy of object detection in SAR images.

VI. CONCLUSION
In summary, this article presents a novel framework named SPT-FSC to tackle the unique challenges associated with fewshot classification in SAR images.Our main contributions include the development of SPT, the creation of a TEB to acquire SPT embedding, the design of an RFFA mechanism, and the assembly of a refined ship classification dataset.The experimental results highlight the efficacy of the proposed SPT-FSC method for ship classification in SAR images.The SPT captures the essential structural and shape features of ship targets, while the TEB enables the network to leverage the inherent information of each scattering point.The RFFA mechanism bolsters the discriminative power of the fused features by taking into account the reciprocal relationships and contributions of features from different sources.In addition, the construction of a refined ship classification dataset allows the network to more effectively tackle the few-shot ship classification problem in SAR images.A series of analyses provide insights into the contribution of each component to the overall performance of our proposed model.These studies confirm the importance of each component in enhancing the overall performance and offer valuable insights into the model's behavior.By addressing the imaging variability, training data diversity, and the need for specific adaptations and modifications to existing few-shot classification methods, our proposed SPT-FSC method demonstrates considerable improvement in few-shot ship classification in SAR images.
are examples of metric learning approaches.These approaches learn a distance metric between examples and use this metric to classify new examples.Matching networks use a memory-augmented architecture to learn a similarity function between examples.Relation networks learn a relation module that can compare pairs of examples and classify new examples based on their relation to the labeled examples.Siamese networks learn a shared representation between pairs of examples and classify new examples based on their distance in this shared representation.DynaGAN

Fig. 1 .
Fig. 1.Training and testing of few-shot classification in our setup.

Fig. 2 .
Fig. 2. Overall architecture of SPT-FSL.First, we extract the scattering key points from the objects and establish the SPT.Next, we use the positions of these scattering key points to obtain feature vectors for corresponding nodes from Stage 1 of the backbone and feed them into the GCN and MLP-mixer.Finally, the RFFA fuses the SPT embedding with features extracted by the CNN to generate the final prediction.

Fig. 3 .
Fig. 3. Illustration of the SPT.(a) and (b) Origin SAR images of containers.(c) and (d) Extracted points by Harris corner detector.(e) and (f) Key scattering points.(g) and (h) SPT.

Fig. 8 .
Fig. 8. t-SNE visualization of the output features on the Open-FUSAR dataset.(a) Baseline for one shot.(b) Baseline for five shots.(c) Baseline with RFFA for one shot.(d) Baseline with RFFA for five shots.(e) Baseline with RFFA and TEB, including GCN for one shot.(f) Baseline with RFFA and TEB, including GCN for five shots.(g) Baseline with TEB, including GCN and MLP-mixer for one shot.(h) Baseline with TEB, including GCN and MLP-mixer for five shots.(i) SPT-FSC for one shot.(j) SPT-FSC for five shots.The numbers 0-5 in the legend represent ContainerShip, GeneralCargo, oil_ChemicalTanker, PilotVessel, Tanker-HazardA, and Unspecified, respectively.

Fig. 10 .
Fig. 10.Visualization of the results by confusion matrices on the Open-SARship dataset.(a) ProtoNet for one shot.(b) ProtoNet for five shots.(c) Baseline++ for one shot.(d) Baseline++ for five shots.(e) SPT-FSC for one shot.(f) SPT-FSC for five shots.

TABLE I DETAILED
DATASET SETTINGS IN OUR EXPERIMENT

TABLE VI OPENSARSHIP
DATASET SETTINGS FOR FEW-SHOT SAR ATR METHODS