SPANet: A Self-Balancing Position Attention Network for Anchor-Free SAR Ship Detection

Synthetic aperture radar (SAR) images of ships have complex background interference, multi-scale targets, and irregular distribution characteristics. However, existing mainstream SAR ship detection algorithms rely on manually designed hyperparameters. This results in poor robustness, which makes it difficult to effectively balance the detection accuracy and speed. To solve these problems, a novel anchor-free SAR ship detection algorithm based on self-balancing position attention (SBPA) is proposed. First, a lightweight feature extraction backbone (GhostVS-Net) is designed by FOCUS, ghost, and separable convolution modules to extract feature information. This helps to extract the contour features of ships and suppress unrelated interference, making it more suitable for the scattering characteristics of SAR images. Second, an SBPA module is designed, which balances the local image features under multiple receptive fields, and aggregates the global position information and spatial context information. The proposed SBPA module considers the characteristics of background, scale, and distribution of SAR images to obtain greatly improved positioning accuracy. Finally, the feature pyramid network is applied to fuse the scale context information, and further improve the accuracy of detection. Experimental results on the SSDD and HRSID datasets show that the proposed network bears feature of accurate detection and strong robustness, with the mean average precision reaches 99.72% and 95.30%, which reveals that it outperforms all current state-of-the-art algorithms with super high performances.


I. INTRODUCTION
S YNTHETIC aperture radar (SAR) is an active sensor op- erating in the microwave band.Compared with traditional optical remote and hyperspectral remote sensing, SAR image has good penetrability and high-resolutions, and can observe the sea surface in a large area all-time and all-weather.With the rapid development of imaging technology [1], [2] and the launch of advanced SAR satellites (such as GF-3C, hisea-1, and LT-1A, etc.), the research and applications of SAR image detection has been greatly promoted.However, the challenging marine environment, with its irregular distribution of multiscale ships, impedes ship detection.Therefore, it is of great research significance to realize multi-scale ship detection in complex scenes [3].
Among the traditional SAR ship detection algorithms, the algorithm of constant false alarm rate (CFAR) is widely applied [4].CFAR is a signal processing algorithm [5], by setting a fixed threshold with a constant false alarm probability in the detection system, to reduce the influence of clutter and interference [6].Its performance depends on accurate estimation of clutter.Researchers have proposed various clutter statistical distributions, such as distributions of K, gamma, and complex signal kurtosis (CSK) [7], [8], [9].However, as the SAR image resolution improves and the backgrounds become more complex, CFAR algorithm is easy to estimate the distribution parameters of the target incorrectly, leading to a decrease in detection performance.
Convolutional neural network (CNN) based deep learning algorithms have demonstrated excellent performance in the field of optical image processing and are gradually applied to SAR ship detection.The current target detection algorithms based on deep learning can be divided into the following two categories.

1) Two-Stage Detection Algorithms Based on Bounding Box:
In the first stage, it generates bounding boxes that may contain the targets to be detected.In the second stage, the target in the frame is classified and positioned through the network.The overall process is more complicated and the prediction speed is quite slow.The typical algorithms of this type are the R-CNN series [10], [11], [12].

2) Single-Stage Detection Algorithms Based on Regression:
This type of algorithms transforms the detection task into a regression problem.The architecture can be divided into anchor-based and anchor-free architectures.The typical anchor-based algorithms are the YOLO series [13], [14], [15], single shot detector (SSD) [16], etc.The key idea of YOLO is to divide the image into S × S cells and generate three anchor boxes in the center of each cell.Then, the target information, i. e., center coordinates (x, y), width and height (w, h) and category information of confidence are predicted for each box.The algorithm of SSD generates multiple anchor boxes on each pixel of the feature map and predicts the position and category information for each box quickly.The anchor-free algorithms directly detect the center of the target and avoid the design of hyperparameters.The This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.mainstream frameworks include CenterNet [17], FCOS [18], and YOLOX [19], etc.In CenterNet, the target detection is transformed into key point detection and attribute regression, where the key point detection is applied to locate the target and regress to its position, size and confidence information.The FCOS uses the feature map position points as training samples, and the position points are mapped into the target bounding box in the original image as positive samples, making training simple, and efficient.The YOLOX borrows the idea of FCOS to make per-point predictions on the feature map to obtain the target category, position regression parameters and confidence information.The network sets the center 5 × 5 region as the positive sample and uses the SimOTA strategy to optimize the matching of positive samples.
Based on the advanced deep learning models, existing SAR ship detection algorithms could be roughly divided into two categories.One is the detection algorithm that introduces the denoising algorithm as a preprocessing module [20], [21], [22].The other is the optimization algorithms for SAR target characteristics, directly design dedicated processing module, such as attention reception pyramid [23], cascaded multi-domain attention [24], continuous attention module [25], AFSar [26], BANet [27], and H2Det [28].
Denoising algorithm aim to remove noise in images, a group of small targets is easily mistaken for background noise or assumed as a single large target.This can negatively impact the accuracy of subsequent detection efforts.The optimization algorithms can well adapt to the characteristics of SAR images, which is more conducive to actual deployment.Therefore, many researchers have designed optimization detection algorithms for SAR image characteristics.Wan et al. [26] redesigned the lightweight skeleton based on the YOLOX network, proposed an attention enhancement to improve multi-scale detection performance.Hu et al. [27] introduced a local and non-local attention module to balance the diversity of ship scales, and verified its advanced detection performance on the HRSID datasets.Yang et al. [29] proposed a multi-layer feature attention mechanism on the FCOS network, refined the features of small ships, and optimized the target positioning algorithm.This significantly improved the multiscale detection performance compared to the baseline network.Miao et al. [30] applied the Ghost module as the shallow convolution layer in the backbone Reti-naNet to reduce the network complexity.Further, with the help of convolutional block attention module (CBAM), the detection performance was enhanced.Through research on advanced SAR ship detection models, several shortcomings were identified, including the following.
1) To deploy object detection models on edge devices, such as airborne and civil devices, researchers have focused on designing lightweight and efficient detection algorithms.The anchor-based architecture relies on artificially designed hyperparameters, which can lead to poor stability and generalization performance.As a result, researchers have prioritized using anchor-free algorithms as a baseline to reduce complexity and improve detection stability.However, the detection performance of anchor-free algorithms on multiscale and multiscenario datasets still needs to be improved.
2) From the detection results of BANet [27], it is found that although the classical detection models have made some progress, their actual detection performance is still poor.Due to the scale and distribution characteristics of SAR ships, the model is easier to recognize the crowded targets as a single target, resulting in a large number of miss-detections.Thus, the problem of detecting smallscale crowded targets urgently needs to be solved.Another reason for the poor performance is the difficulty of detecting targets in complex scenes, where the model is easily disturbed by the surrounding land and reefs.Classical models neglect the global position information and spatial context of the target, which can lead to many false detection.Therefore, the problem of mining image semantic features in complex scenes urgently needs to be solved.3) To break through the bottleneck of SAR image detection, researchers have designed specialized processing modules for various characteristics of SAR images with complex background, multiscale targets, irregular distribution, respectively [23], [24], [25], [26], [27], [28].However, such distributed networks lack integrated design and tend to have more repetitive operations, resulting longer processing times for reasoning and training.Therefore, an efficient SAR feature attention module urgently needs to be designed.Through comprehensive analysis, we propose an anchor-free lightweight algorithm for the position, scale and space characteristics of SAR ships.The algorithm achieves high-precision detection in complex scenes and has strong practical application value.The contributions of this article are summarized as follows.
1) A novel anchor-free SAR ship detection algorithm is proposed, which bears a simple model structure and strong generalization ability.This approach avoids the performance instability problem caused by the fuzzy target contour in anchor-based algorithms, and is more adaptable to the diversity of SAR images.

II. RELATED WORKS
With the continuous progress of deep learning algorithms, researchers have made a lot of explorations in model design.To make the model more portable, fast and efficient, researchers mainly focus on the design of backbone feature extraction network and attention mechanism.

A. Backbone
The SSD applies the visual geometry group (VGG) backbone network, which improves network expressiveness by using multiple convolutional layers [16].Resnet (residual network) [39] is a classic network, to applies the VGG network by introducing residual units.This solves the problem of gradient explosion and gradient vanishing.YOLOX [19] applies the darknet backbone network, draws on the idea of Resnet, realizes super feature expression through multiple residual blocks, and introduces FOCUS module to improve its speed.AFSar [26] applies the lightweight feature extraction backbone MobileNetV2, which replaces traditional convolutional layers with depthwise separable convolutions to reduce the amount of computation required.This makes MobileNetV2 well-suited for resource-constrained environments, such as mobile devices.GhostNet [40] is a novel lightweight network that applies ghost modules (whose structure is shown in Fig. 1) to replace traditional convolutional layers.First, the original channel feature maps are generated through standard convolution.Then, new channel feature maps are obtained through depthwise separable convolution and spliced with the original channel.These new channel feature maps are derived from and highly similar to the original channel feature maps, hence they are referred to as "ghost" modules.The effectiveness of Ghost modules in multiple detection algorithms [30], [41], [42] has been well documented.

B. Attention Mechanism
Existing attention mechanisms mainly focus on local features.Jie et al. [43] designed the channel attention module SENet (squeeze-and-excitation network), which gives weights to the feature layer channels so that the network can focus on the most effective channels and achieve a balance of target features.This allows the network to better perceive and utilize important features for the task at hand.Woo et al. [44] designed the attention module CBAM to aggregate effective features and eliminate background interference by combining channel and spatial information.This allows the network to focus on the most important features and ignore distractions in the input data.Wang et al. [51] proposed a nonlocal attention mechanism, which focuses on capturing different types of spatial information.However, this approach has a high computational cost and is therefore typically used in larger models.Hou et al. [45] constructed a coordinate attention (CA) to accurately identify the target area by capturing position information and the relationship between channels.This makes CA attention particularly effective at identifying the position of targets.

III. PROPOSED METHOD
We present the overall architecture and key techniques of the proposed SPANet which is shown in Fig. 1.First, the feature extraction backbone (GhostVS-Net) is designed, which integrates FOCUS, ghost, and depthwise convolution (DwConv) to efficiently extract multiscale ship information under complex background.Second, the SBPA module is proposed to polymerize the target position information, and balance local features across multiple receptive fields.Finally, after feature pyramid networks (FPNs) processing, the feature layer enters the decoupling detection head to precisely locate multiscale targets.

A. GhostVS-Net
The backbone network GhostVS-Net is designed to extract shallow features through the double-bottleneck structure, aiming at increasing the feature expression of SAR ship image, which is divided into five stages from C1 to C5, whose architecture is given in Table I.The C1 feature layer is composed of FOCUS (by whom the image information is focused on the channel) and the standard 3 × 3 convolution layer.Exploiting FOCUS, the information of every other pixel is extracted on the feature map and four independent feature layers are obtained.Then, the standard convolution layer with stride 2 is applied to expand channels.The structure of FOCUS is shown in Fig. 1.
The feature layers of C2 to C5 are composed of two Ghost modules and DwConv, Wherein the first ghost is applied to expand the number of channels, and the second ghost decreases the number of channels and matches the number of input channels.The texture and contour information of the target in the shallow layer feature is more clearly and suitable for ship positioning.Therefore, a double bottleneck structure is designed to extract shallow feature and improve the recall rate.The input feature layer is divided into two parts, one part is processed, and the other part is directly connected to the tail for output.The processing part first expands the channel with the Ghost module, then downsamples with DwConv whose structure is shown in Fig. 2. By performing independent convolution operations on each channel, the loss of information interaction between feature layers is avoided by downsampling and splices with the input.Finally, the channel is compressed through Ghost module and spliced with the input feature layer.
The core algorithm in the Ghost module applies depthwise separable convolution, which has higher computational efficiency compared to conventional convolution.The algorithm processing is similar to linear budgeting, it will efficiently generate low-frequency and high-frequency feature maps similar to the original feature maps [see Fig. 10

B. SBPA Module
The background of SAR images is complex, and the scale and distribution of ships are diverse.To solve this problem, the SBPA module whose structure is shown in Fig. 3 is proposed to balance the local features of images across multiple receptive fields, emphasize the global position and spatial context information.First, through 1 × 1 point-by-point convolution, the information interaction between channels is performed on the input feature layer.Then, dilated convolutions are applied with dilation rate of 3, 5, 7, and 9 to extend the receptive field, perceive the characteristic information of ships of different scales.CA is applied to obtain accurate position information, associative spatial context information, and effectively deal with the complex distribution of ships.Then, hierarchical feature fusion is performed, the feature map under the low receptive field is directly superimposed with the feature map under the next receptive field to avoid the fence effect.Finally, the feature layer channels are combined, and the channel attention is used to assign channel weight, and the relevant feature channels are adaptively perceived.The internal structure of the module is described in detail below.

1) Receptive Fields Enhancement Module:
The receptive fields play a crucial role to detect the target.Large receptive fields may cause the network to ignore small or partially occluded objects, while small receptive fields may not capture enough contextual and spatial information for accurate detection.To address this issue, the dilated convolution exploited to efficiently capture the feature information of ship images at different scales.
The dilated convolution expands the receptive fields by injecting gaps into the convolution kernel.The examples are shown in Fig. 4, with the increase of dilation rate, the receptive fields expanded exponentially.In contrast with standard convolution, dilated convolution increases the receptive fields without sacrificing image characteristics.This makes it well-suited for multi-scale feature extraction tasks.
2) Coordinate Attention Module: Conventional object detection methods compress global spatial information into the feature extraction channel, which can lead to a lack of attention to the key position information of the target object.To address this issue, we introduce the CA module after the receptive field enhancement module.This module is designed to extract accurate position information from multiscale feature layers  and perceive the distribution characteristics of ships.Whose structure is shown in Fig. 5.
First, we perform one-dimensional average pooling on the input feature layer for encoding.Specifically, for the feature layer After the aforementioned processing, two tensors Z h and Z w are obtained, which determine the precise position of the targets by leveraging the position dependence.Then construct the CA module, which is defined as follows: ) ) Among, BN denotes a batch normalization.F 1×1 is a 1 × 1 convolution.σ is the sigmoid function.δ h represents the h_swish function, which can reduce the amount of calculation while maintaining the processing effect.
After the aforementioned processing, the intermediate feature maps f h ∈ R 1×H×C and f w ∈ R w×1×C encoded in the horizontal and vertical directions are obtained.The attention weights are obtained through the sigmoid activation function, and multiplied by the input feature layer.
3) Channel Attention Module: After a series of information augmentation, the enhanced feature layers with rich echelon information are obtained.To further improve the recognition efficiency of the task, channel attention (Ca) is introduced at the end of the SBPA to perceive the channel importance.Aiming at different tasks, weighted enhancement or suppression is performed on each channel to improve detection accuracy.
Ca is divided into two main steps: squeezing and excitation, whose structure is shown in Fig. 6.First, the W × H × C input feature layer is compressed into a 1 × 1 × C channel descriptor through global average pooling.Then, the activation module is applied to obtain the attention weight matrix.Finally, it is multiplied with the input feature layer X c to obtain the calibrated Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.attention feature layer Xc , which is defined as follows: where X c , Z c , and S c represent the feature layer of channel C, the channel descriptor, and the attention weight, respectively.Xc represents the calibrated attention feature map, and σ represents the sigmoid function.
r ×c denote intermediate variables.r is an artificially designed dimensionality reduction coefficient, which is set to 32 to speed up network training and prediction.

C. Loss Function
After FPN processing, the feature layer enters the decoupling detection head to separate the tasks of location, classification and target judgment.For different task branches, applying appropriate loss functions can help to train excellent models effectively, which include positioning loss L reg , classification loss L cls , and target loss L obj L = L cls + L obj + λL reg .
Among them, λ is the weight coefficient of the positioning loss, which is set to 5.
The L cls and L obj are both calculated applying the binary cross-entropy function.This function can measure the deviation between the true distribution of classes and the predicted probability distribution, and helps the model learn to accurately predict the position and size of objects in an image.It is defined as: Among them, P X denotes the predicted value of the target.y represents the real label.If it is determined to be a positive sample, then y is 1, otherwise is 0.
The L reg applies the generalized intersection over union (GioU) loss, which is defined as follows: This loss function is a variant of the intersection over union (IoU), which is shown in Fig. 7.The GioU loss is calculated applying the coordinates and dimensions of the predicted region (A) and the true region (B).Unlike the traditional IoU loss, which only considers the overlapping area between A and B, the GioU loss takes into account the nonoverlapping areas as well.This makes the GioU loss more suitable for training models in complex scenes.
In addition, the GioU loss is scale-invariant, meaning that it does not depend on the size of the bounding boxes.This property allows the loss function to be applied to bounding boxes of different sizes without introducing any bias or distortion.Overall, the GioU loss provides a more comprehensive and accurate measure of the difference between two bounding boxes.

IV. EXPERIMENTS AND ANALYSIS
We present the ablation experiments and contrast experiments on different datasets, and generalization experiments on highresolution datasets.

A. Datasets and Experimental Setup
To evaluate the SPANet, we conduct experiments on several datasets, i.e., the dataset SSDD [32], the high-resolution dataset HRSID [33], and the first-generation SAR ship detection dataset LS-SSDD [46].These datasets vary in terms of size and resolution, allowing the comprehensive assessments over SPANet's ability to detect ships in different scenarios.from Sentinel-1 sensor and includes 15 large scene images with a pixel size of 24000 × 16000.The dataset is cut into 800 × 800 pixels sub-images, consisting mainly of small ships that are difficult to detect.Fig. 8(c) shows the examples diagram of LS-SSDD.Dataset link is https://pan.baidu.com/s/1qnW1r28pRi4AmcLOfmplLg, and the extraction password is w9c9.
In the experiments, the dataset is divided into test set (10%), validation set (10%) and training set (80%).Ships are labeled using the PASCAL VOC format.The experiments are run on an Intel Core i7-10700KF processor and an NVIDIA GTX 3070 graphics card, using the PyTorch 1.9.0 deep learning framework and the CUDA11.0GPU computing platform.The training mode applied is frozen training.At the first epochs, the initial learning rate is 0.001 and the batch size is 8. Then conduct 900 epochs unfreezing training, the initial learning rate is 0.0001 and the batch size is 16.

B. Evaluation Index
To analyze the model, mean average precision (mAP), precision, recall, F1 and FLOPS are applied as evaluation indicators.The main evaluation metric, mAP, is the area under the precisionrecall curve.The auxiliary evaluation metrics, F1, precision and recall, were calculated applying the confidence threshold of 0.3.Furthermore, the FPS is applied to measure the speed of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.detect.Defined as follows:

C. Ablation Experiment
The effectiveness of the main components is verified and analyzed in this section.
1) Analysis of the GhostVS-Net: The proposed algorithm applies depthwise separable convolutions to construct a feature Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.extraction backbone called GhostVS-Net, which efficiently extracts multi-scale ship information from complex backgrounds.

TABLE V COMPARISON
To verify the performance of GhostVS-Net, it is compared with GhostNet and CsPDarknet.The images from the SSDD and HRSID datasets are selected as input, and the feature maps from the shallow layers are extracted (i.e., the C3 feature layer in Fig. 2).Then compare the results by replacing the backbone.The results in Fig. 9 show that GhostNet's feature map is blurred and has significant background interference.The CsPDarknet's feature map contains a small number of excellent feature maps, but the overall quality of the feature layer is poor, and the target features are not prominent enough.
It is difficult to distinguish small-scale targets from the background, which limits detection performance.In contrast, GhostVS-Net's feature map has clear contour features, effectively distinguishing the target from the background, and it appears that the results of Table II show that GhostNet and GhostVS-Net are both improvements over CsPDarknet in terms of prediction rate, recall rate, and mAP.GhostNet has a 1.11% increase in mAP, and GhostVS-Net has a 1.9% increase in mAP.Both have similar detection speeds, but GhostVS-Net has a better feature map quality and is lighter and faster.
To further validate the ability of GhostVS-Net, we select lowfrequency and high-frequency feature maps generated by the GhostVS-Net for observation.As shown in the low-frequency feature map in Fig. 10(a), ships can also become blurred when sidelobe ghosting and speckle noise are linearly eliminated, but due to the strong scattering of the target, larger target features are retained more and smaller target features are weakened.As shown in the high-frequency feature map in Fig. 10(a), as the target feature is linearly enhanced, the scattering and speckle are also enhanced.The intensity of the target is much higher than the speckle noise and sidelobe ghosting, making the target more visible, but some strongly scattered buildings are also enhanced and thus mistaken for the targets.
2) Analysis of the SBPA Module: To visually demonstrate the effect of SBPA Module, some images with complex backgrounds and crowded multi-scale in the SSDD datasets are selected on GhostVS-Net backbone for heatmap experiments.The results in Fig. 11 show that, SBPA aggregates the target features, accurately focus on the target, and effectively eliminate complex background interference.Compared with the baseline, it can obtain effective target spatial context information, accurately distinguish crowded multiscale targets, and achieve strong robust detection.
From the results in Table III, it can be seen that the receive fields enhancement (RFE) module can significantly improve

D. Comparative Experiment
Two comparative experiments are conducted to verify the advanced nature of SPANet.The first experiment compares SPANet with the classic detection models, and the second experiment compares it with the advanced SAR ship detection models.
To further evaluate the robustness of SPANet, the images from the SSDD datasets are randomly selected as input, and compared the detection performance of each algorithm.In Fig. 12(a), the example image has complex land interference, and contains crowded distributed ships of different scales, making it difficult to detect.The image is divided into two areas, A and B. CenterNet and FCOS are unable to accurately distinguish the crowded targets in areas A and B. SSD and YOLOv7 correctly detected the targets in area B, but could not accurately distinguish crowded target in area A. However, SPANet accurately identify the near-shore targets and crowded targets, demonstrating its high-precision detection performance in complex scenes.Fig. 12(b) is a port image under high speckle noise, which contains many deceptive reefs and port buildings, making detection difficult.In Area A, both faster-RCNN and YOLOv4 mistake the reef for the ship.In area B, faster-RCNN and YOLOv4 did not detect any ships, while CenterNet and YOLOX-Tiny only detected more obvious port ships and did not detect hidden ships in the port.SPANet demonstrates its powerful detection capability, maintaining high detection accuracy while discovering all targets.
The PR curve (see Fig. 13) demonstrates that SPANet covers the most area, proving its superior performance compared to the other algorithms.
To visually observe the capabilities of SPANet and advanced algorithms, we select LPEDet [52] and AFSar [26] for comparative experiments on the SSDD dataset, and the results are shown in Fig. 14. Figure A shows a dense small target scene, where LPEDet misdetect and AFSar miss detect small target.Figure B shows a small-scale target scene, where LPEDet mistakenly identifies the reef as the target.Figure C shows a complex nearshore scene, filled with deceptive reefs and nearshore buildings.Both LPEDet and AFSar mistake the port building for a ship.From the results, SPANet is more adaptable to complex scenes and dense small target scenes, demonstrating its excellent detection ability.
It appears that the experimental results of Table V show that, the experimental results show that the proposed algorithm SPANet significantly superior to other algorithms.On the SSDD datasets, compared with the suboptimal algorithm, AFSar, SPANet has a 2% higher mAP and a 0.02 higher F1 score.On the HRSID datasets, compared with suboptimal algorithm BANet, SPANet has a 0.03 higher F1 score and a 4.8% higher precision accuracy.Additionally, SPANet achieves a balance between detection speed and accuracy, and has a much lower algorithmic complexity than other advanced algorithms, while maintaining superior performance.
Four images from the HRSID datasets are randomly selected, which are shown in Fig. 15, including complex land interference scenes at high resolution, crowded small target scenes with canals, multiscale target scenes in the open sea, and complex ship distribution scenes.Intuitively, the algorithm proposed has strong robustness and is able to effectively deal with complex background interference and crowded multiscale ships in SAR images.It can accurately locate and identify targets.

E. Generalization Experiment
In practical applications of SAR ship detection algorithms, the scenes are complex, the image resolution is high, and the ship scale is small.To evaluate the generalization performance of our algorithm, we use the LS-SSDD-v1.0 dataset and cut it into 9000 sub-images, each with the resolution of 800 × 800 pixels.These sub-images are applied for detection and stitched them together.Randomly select a large image from the dataset (see Fig. 16), and select areas A, B, and C from the coastal and offshore areas for observation.In area A, the algorithm proposed in this article was able to effectively suppress background interference and accurately identify nearshore targets.The scene in area B was more complex, with crowded ships and difficult-to-detect targets, but the algorithm still maintained excellent detection performance.In area C, which was an open sea area, all targets are correctly positioned.This experimental result demonstrates that SPANet effectively handle practical, complex scenarios.

V. CONCLUSION
To address the issue of low detection efficiency caused by complex background interference, large scale differences, and crowded target distributions in SAR images, the anchor-free SAR ship detection algorithm SPANet is proposed.First, we design a feature extraction backbone (GhostVS-Net) and introduce efficient modules such as FOCUS and Ghost to extract local feature information of ships in complex backgrounds, enhancing the network's ability to represent targets and making it more suitable for multiscale target detection.Second, we design an SBPA module to fuse and enhance global position information and spatial context information of targets, balancing local features under multiple receptive fields.This enables the network to effectively deal with complex scenarios such as complex land interference and crowded target distributions, and enhances its positioning capabilities.
We conducted extensive validation experiments on multiscene and multiscale datasets.Compared to the baseline model, our algorithm greatly improves detection accuracy and recall while reducing the complexity of the model to only 5.59G.It achieves the mAP of 99.7% on the SSDD datasets and 95.3% on the HRSID datasets.Our algorithm outperforms all current state-ofart SAR ship detection algorithms with super high performances.
In the future, we plan to open source our code and welcome collaboration with other researchers to improve the SAR ship detection algorithm.We hope to build a more accurate and stable algorithm for SAR target detection.We will also continue to explore the global view potential of SAR images and investigate more efficient loss functions and training methods.

Fig. 4 .
Fig. 4. Examples of dilated conv.Asterisks denote the convolution kernels, and the shades represent the receptive fields.
the pooling kernel of size (W, 1) and (1, H) are applied to encode each channel of different directions.The horizontally encoded output of channel C is Z w c (w), and the vertical encoding output of channel C is Z h c (h)
) where b x , b y , b w , and b n are the coordinates and dimensions of the true bounding box, and p x , p y , p w , and p n are the corresponding predicted values.

First, the
dataset of SSDD with 1160 SAR images and 2456 ships is applied for ablation and comparison experiments, which is collected by sensors such as RadarSat-2 and TerraSAR-X.Because the SSDD has a small number of samples, random division may affect the consistency of the distribution.Following the rules of SSDD, the images' names ending with 1 or 9, 8 and the rest are used as test set, validation set, and training set, respectively.Both nearshore and offshore targets are included in the validation set, which ensure the distribution consistency of the training set and the test set.Each image contains equally 640 × 640 pixels and is applied for both training and testing.Fig. 8(a) shows the examples of SSDD.Dataset link is https:// pan.baidu.com/s/1rX2IiMUlpmZ2yuM_Kqwwwgand,and the extraction password is 0pjr.Second, the high-resolution SAR dataset HRSID is applied for comparison experiments.HRSID comes from Sentinel-1B, TerrSAR-X, and includes 5604 high-resolution SAR images and 16951 ships, including SAR images of different resolutions, polarization, sea conditions, sea areas and coastal ports.HRSID is a benchmark dataset for researchers to evaluate the effectiveness of their model.Each image is 800 × 800 pixels and applied for both training and testing.Fig. 8(b) shows the examples of HRSID.Dataset link is https://pan.baidu.com/s/1y3mYCp6IV7gmfKcH2SsNnQ, and the extraction password is uazi.Finally, the first-generation SAR ship detection datasets LS-SSDD-v1.0 is applied for generalization experiments.It comes

Fig. 12 .
Fig. 12.Detection results of different detection algorithms on SSDD.The yellow boxes represent the missed detection targets, and the blue boxes represent the false detection targets.(a) Detection results of each algorithm in dense scenes.(b) Detection results of each algorithm in complex scenes.

Fig. 14 .
Fig. 14.Detection results of different advanced detection algorithms on SSDD.The yellow circle boxes represent the false detection targets, while the blue circle boxes represent the missed detection targets.(a) Detection results of each algorithm in the offshore scene.(b) Detection results of each algorithm in the offshore scene.(c) Detection results of each algorithm in the nearshore scene.

Fig. 15 .
Fig. 15.Detection results of SPANet on the HRSID datasets.Orange boxes are the detection targets.

TABLE I ARCHITECTURE
OF GHOSTVS-NET