Processing math: 100%
SAR Ship Detection Based on End-to-End Morphological Feature Pyramid Network | IEEE Journals & Magazine | IEEE Xplore

SAR Ship Detection Based on End-to-End Morphological Feature Pyramid Network


Abstract:

Intelligent ship detection based on high-precision synthetic aperture radar (SAR) images plays a vital role in ocean monitoring and maritime management. Denoising is an e...Show More

Abstract:

Intelligent ship detection based on high-precision synthetic aperture radar (SAR) images plays a vital role in ocean monitoring and maritime management. Denoising is an effective preprocessing step for target detection. Morphological network-based denoising can effectively remove speckle noise, while the smoothing effect of which blurs the edges of the image and reduces the detection accuracy. The fusion of edge extraction and morphological network can improve detection accuracy by compensating for the lack of edge information caused by smoothing. This article proposes an end-to-end lightweight network called morphological feature-pyramid Yolo v4-tiny for SAR ship detection. First, a morphological network is introduced to preprocess the SAR images for speckle noise suppression and edge enhancement, providing spatial high-frequency information for target detection. Then, the original and preprocessed images are combined into the multichannel as an input for the convolution layer of the network. The feature pyramid fusion structure is used to extract the high-level semantic features and shallow detailed features from the image, improving the performance of multiscale target detection. Experiments on the public SAR ship detection dataset and AIR SARShip-1.0 show that the proposed method performs better than the other convolution neural network-based methods.
Page(s): 4599 - 4611
Date of Publication: 18 February 2022

ISSN Information:

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Ship detection plays an important role in ocean inspection and maritime management [1]. Due to its feature of all-weather, full-time observation, and high resolution for targets, SAR ship detection has been a research hotspot in recent years.

Traditional SAR ship detection mainly focuses on ship target and waves detection. Waves detection is not the mainstream method because waves do not exist all the time. Ship target detection method is mainly based on statistical features [2]–​[4], scattering characteristics [5], and transform domain [6], among which the constant false alarm rate [7]–​[11] is the most widely used classical one in traditional ship detection. Although the previous ship detection methods have excellent performance in some specific scenes, they have many shortcomings, such as complex feature extraction process, sensitivity to speckle noise, and higher false positives in complex backgrounds, which limit the accuracy of these methods. Besides, due to the complexity and huge calculation of these methods, it is difficult for them to meet real-time requirements.

Recently, with the development of deep learning, the methods based on convolution neural networks (CNNs) have become the mainstream for target detection. Deep learning can learn image features automatically and owns stronger feature extraction capabilities. Deep learning-based target detection algorithms are mainly divided into one-stage [12]–​[17] and two-stage detection algorithms [18]–​[22] according to the method of anchor regression. The two-stage methods generally possess higher accuracy. Lin et al. [23] innovatively combined squeeze and excitation mechanism with Faster R-CNN to make better detection performance. Jiao et al. [24] introduced dense connection and a new training strategy into Faster R-CNN to reduce the weight of easy examples, bringing an excellent performance in detecting small-scale ships and the interference of inshore complex background. The one-stage detectors use a variety of techniques to reduce the amount of calculation, making the one-stage methods become the mainstream target detection algorithms in practical applications. Qi et al. [25] proposed a one-stage detector based on the attention mechanism, which improved the detection accuracy for small objects under the complicated background. Yang et al. [26] proposed an improved one-stage object detection framework based on RetinaNet and rotatable bounding box, which performed well in rotate target detection. Some lightweight algorithms with more practical value have been proposed for higher detection efficiency, including Yolo v4-tiny and Yolox-tiny, but few have been used for ship detection.

It is a common problem that SAR images are regarded as optical images to process, while the characteristics of which are ignored in current methods. SAR images are produced in the microwave/millimeter-wave band, and they are different from the optical images consisting of image data acquired by visible and partial infrared band sensors. On the one hand, SAR images only contain the information of a single band, while the optical images usually consist of three-color channels, RGB, or HSV. The difference makes the CNNs-based algorithms aiming at optical images not entirely applicable for SAR image detection. On the other hand, there is a large amount of speckle noise in SAR images caused by the basic principle of coherent imaging, which causes shadows on the image and brings a low signal-to-noise ratio, affecting the SAR target detection results. Driven by these problems, the detection algorithms that combine CNNs with feature extraction and fusion algorithms have recently become a crucial research direction. Qin et al. [27] introduced the wavelet speckle reduction network into the CNN framework for target recognition and achieved high test accuracy. Jiang et al. [28] adopted non-subsampling Laplacian pyramid decomposition (NSLP) as a step for preprocessing to extract features and fed them into CNNs, bringing excellent performance. J. Ai et al. [29] synthesized haar wavelet transform with CNN to classify ships in SAR images and had superior discrimination.

Although the above algorithms have achieved good results, the feature extraction methods are relatively fixed. There is an incomplete fit for feature extraction algorithms and CNNs because of the use of fixed kernel functions and parameters for feature extraction. Setting the kernel functions and parameters as variables that can be adjusted adaptively through backpropagation can obtain more suitable features. The features are conducive to target detection and provide more accurate information for the network. Morphological image processing has achieved great success in many aspects, including image segmentation [30]–​[32], object shape detection [33]–​[35], and filtering [36]–​[39]. The morphological network proposed by R. Mondal [40] combines traditional morphological algorithms with the idea of convolution kernels so that morphological operators can be trained through backpropagation. However, the multilayer stacked structure of the deep morphological network is not suitable for SAR image detection algorithms. Therefore, this paper proposes a structure that combines the morphological network with the edge extraction algorithm and integrates it into the SAR ship detection network. Considering the real-time requirements for SAR image detection, this article takes the Yolo v4-tiny as the basic network, which owns a superior balance on detection effect and speed.

Moreover, as the scales of different ships vary in SAR images, the detection methods face great difficulties in detecting multiscale ships, especially small-scale ships, which have similar characteristics to clustered targets and may be lost in the deep feature maps [41], [42]. To enhance the detection effect of multiscale ships, the feature pyramid fusion structure is absorbed to improve the network.

The main contributions of this article can be summarized as follows.

  1. A novel imaging preprocessing structure is proposed in this article, which combines the deep morphological network with the edge extraction so that the features with edge information and less noise can be provided for the detection network. It offers more effective prior information for the detection network, making the network more effective.

  2. The feature pyramid fusion structure is introduced into Yolo v4-tiny to get detailed information for detection. Through extracting multilevel features, the network gets feature maps of different sizes so that the detection capability of multiscale targets is enhanced, especially on small targets.

The rest of this article is organized as follows. Section II presents an introduction of the proposed method. Section III describes experiments and results analysis. Finally, Section IV concludes this article.

SECTION II.

Methodology

In this section, we present an introduction of Mor-FP Yolo v4-tiny in detail. First, an overview of the proposed method is presented. Then, the structure of the morphological preprocessing module is explored. Finally, the feature pyramid fusion structure is concerned.

A. Processing Flow of the Mor-FP Yolo v4-tiny

The processing procedure of the Mor-FP Yolo v4-tiny is shown in Fig. 1.

Fig. 1. - Processing flow of the proposed network.
Fig. 1.

Processing flow of the proposed network.

First, the original images are sent into the morphological preprocessing module, which reduces the speckle noise and enhances the edge information with the trainable morphological kernels adjusted by the subsequent network. Thus, the feature maps with the same size and relative position as the original images are obtained. Second, the feature maps are combined with the original images to recover the information lost in the denoising and edge extraction procedures. Third, a Yolo v4-tiny network combined with the feature pyramid fusion structure tailored for SAR ship detection is designed. At last, the Yolo head module gives the final results.

B. Feature Enhancement Based on Morphological Preprocessing Module

The morphology preprocessing module is composed of denoising and contour extraction. The generated feature map is used to expand the image channel to construct a multichannel SAR image.

1) Construction of Morphological Module

Classical morphological algorithms are defined by dilation and erosion, which have great effects on image processing such as boundary extraction and denoising. Considering the SAR images processed in this article are all grayscale images, the grayscale morphological operations are used during processing.

Let f(x,y) be the original grayscale image and b(x,y) be the structure element. The equations defining dilation and erosion are \begin{align*} &(f \oplus b)(s,t) =\\ & \max \{ f(\!s \!-\! x,t \!-\! y\!) \!+\! b(x,y)|(s \!-\! x),(t \!-\! y) \!\in\! {D_f},(x,y) \!\in\! {D_b}\} \tag{1}\\ &(f\Theta b)(s,t) =\\ & \min \{ f(\!s \!+\! x,t \!+\! y\!) \!-\! b(x,y)|(s \!+\! x),(t \!+\! y) \!\in\! {D_f},(x,y) \!\in\! {D_b}\} \tag{2} \end{align*} View SourceRight-click on figure for MathML and additional features.where \oplus is dilation and \Theta is erosion. {D_f}{\text{ and }}{D_b} present the domain of definition of f and b, respectively.

The image processing effect is always greatly influenced by the shape and values of the kernel. The kernels of morphology are usually chosen as fixed shapes, such as ellipses, rectangles, and cross, which have fixed values. The choice of kernels is based on professional experience. Therefore, converting the kernels of morphology b(x,y) into trainable parameters set initialized randomly and optimized through backpropagation makes the morphological module adaptively match the target detection tasks in either directivity or boundary thickness. The proposed morphological module is composed of trainable morphological layers.

The morphological module is composed of a denoising part and an edge extraction part, as shown in the orange dashed box in Fig. 2. Considering that the speckle noise in the SAR image is light noise, an opening operator is used to reduce the speckle noise. The opening operator is composed of an erosion operator and a dilation operator. The erosion operation sets the central pixel as the minimum value of the difference between the kernel function and the adjacent pixels, while the dilation operator sets the central pixel as the maximum value of the sum of the kernel function and the adjacent pixels. Therefore, the opening operator can make the image blurred. Moreover, when the kernels of the operator become larger, the image becomes fuzzier. In this model, the sizes of the kernels in this algorithm are set as small as possible, i.e., 3*3.

Fig. 2. - Design of morphological module.
Fig. 2.

Design of morphological module.

Fig. 3 shows the results of the morphological operation. Fig. 3(a) shows the original images and (b) shows the images after the denoising operation. It is shown that the images in Fig. 3(b) are more blurred. Then the edge extraction module combined with the morphological network is adopted to process the image after denoising to enhance the edge information of the image.

Fig. 3. - Results of morphological operation. (a) Original images. (b) Denoising images. (c) Images after edge enhancement.
Fig. 3.

Results of morphological operation. (a) Original images. (b) Denoising images. (c) Images after edge enhancement.

The edge extraction part is composed of dilation, closing, and subtraction \begin{equation*} g = (f \oplus {b_1}) - (f \bullet {b_2}) \tag{3} \end{equation*} View SourceRight-click on figure for MathML and additional features.where \bullet designates grayscale closing operator. {b_1}, {b_2} present different morphological elements trained by the network.

The dilation operator provides the coarsening of the image, and the closing operator has a smoothing effect, whose coarsening effect is weaker than that of dilation. As the homogeneous regions are unaffected, the subtraction operation tends to eliminate homogeneous areas. Therefore, the results are the edge of the area, producing a difference-like effect, as shown in Fig. 3(c). The preprocessing process reduces the impact of speckle noise, enhances the boundary information, and can provide clearer prior information of the target boundary for the subsequent detection network so that the entire network performs better.

Finally, we incorporate a morphological module into the network as the first few layers, which, together with the subsequent Yolo v4-tiny, form the entire target detection network.

2) Multichannels Image Construction

The enhanced images obtained after the edge extraction operation are shown in Fig. 3(c). Then, the original images are combined with the images processed by the morphological module. More narrowly, the original images are taken as the first and second channel, and the morphology enhanced images are taken as the third channel, as shown in Fig. 2. This method retains all the information of the original images and enhances the edge information. Compared with the case that the three channels are all original images, this method reduces the influence of speckle noise, enhances high-frequency contour information, and greatly enriches the information contained in the training data. In summary, the data from preprocessing part can guide the network to make feature mining and selection of the original image and edge information. The result of mining and selection can be mapped into the CNN to improve model training efficiency, reduce noise interference, and enhance the sensitivity and accuracy of detection.

C. FP-Based Yolo v4-Tiny

SAR image ship detection is mainly used in the military domain and marine safety, which have high demand on timeliness, so it is necessary to choose a lightweight and efficient detection algorithm as the backbone of the network. Therefore, we take the Yolo v4-tiny network as the basic detection algorithm.

Based on Yolo v3, Yolo v4 takes CSPDarknet53 as the backbone of the network and introduces SPP and PANet into the network to enhance the effect of feature extraction.

Yolo v4-tiny is a lightweight network of Yolo v4. In order to get a faster detection speed, Yolo v4-tiny takes the CSPdarknet-tiny network as the backbone. CSPdarknet-tiny network is successively composed of two convolution layers, three CSPBlock modules, and two convolution layers. CSPBlock module takes the infrastructure of CSPNet, adopting convolution layers and the structure of ResNet in local transition layers. In other words, the CSPBlock module is composed of convolution layers, skip connections, and feature concatenates, as shown in Fig. 4. Compared with the 60 million parameters in Yolo v4, Yolo v4-tiny has only 6 million parameters, which gives it better training speed and detection speed performance.

Fig. 4 - Structure of CSPBlock.
Fig. 4

Structure of CSPBlock.

Yolo v4 uses three feature maps as the input of the detection part, while Yolo v4-tiny uses two smaller feature maps as the final feature maps to be detected, which causes the decrease in the small targets detection rate. However, in the SAR ship images, the targets occupy a small space and the target sizes vary a lot. It is known that high-level features reflect abundant semantic information, while low-level features have better target resolution and more target details. To improve the detection rate of small targets, the network needs to integrate more low-level detailed information. In this way, it is conducive to extracting not only semantic features of large-scale targets but also detailed features of small-scale targets, ensuring the detection capability and detection accuracy of multiscale targets.

Therefore, we add a detection module for shallow feature maps with larger sizes into the network to detect small targets, which is called a feature pyramid fusion structure. We take the feature maps of 52 × 52 to fuse with other feature maps of 26 × 26 and 13 × 13 based on the actual size of the target. The constructed network is shown in Fig. 5, where the red part is the added part of the network.

Fig. 5. - Structure of FP-based Yolo v4-tiny.
Fig. 5.

Structure of FP-based Yolo v4-tiny.

SECTION III.

Experimental Results

In this section, the experiments are implemented to evaluate the detection performance of the Mor-FP Yolo v4-tiny. At first, the datasets and the detailed settings of related experiments are illustrated. Then the best combination of the morphological preprocessing module is explored, and ablation experiments are implemented to evaluate the performance of each component of the network. At last, the algorithm proposed in this article is compared with other detection methods.

A. Datasets and Settings

In our experiments, the performance of the proposed method is evaluated and analyzed on two different SAR ship datasets: SSDD and AIR SARShip-1.0.

1) SSDD: The SSDD dataset is constructed by Li et al. [43], containing multiscale ships in different environments, including different polarization modes, resolutions, and scenes. The data are mainly obtained from RadarSat-2, TerraSAR-X, and Sentinel-1 sensors with four polarization modes: HH, HV, VV, and VH. Moreover, the resolution of SAR images in the dataset ranges from 1 to 15 m. Ship targets in the images are located in large areas of sea and nearshore, which are various and abundant.

There are 1160 images in the dataset and 2456 ships in the images. The images are cut into sizes of about 500×500 pixels and labeled manually. To make the dataset easier to process, we transform the image size to 416×416, as shown in Fig. 6, and convert the annotation information into a standard XML format. The label of each target is represented as (x,y,w,h). (x,y) denotes the top-left coordinate of the rectangle label. w represents the width of the box and h represents the height.

Fig. 6. - Sample images and labels on SSDD.
Fig. 6.

Sample images and labels on SSDD.

2) AIR SARShip-1.0: AIR SARShip-1.0 dataset is a multiscenario multimode SAR ship dataset published by Aerospace Information Research Institute [44]. The dataset contains 31 large views with 1-m and 3-m spatial resolution under single-polarization from Gaofen-3, which is a C-band multipolarization high-resolution SAR satellite.

The images of the dataset have different sea conditions, scenes, and the number of ships, and most of the image sizes are 3000×3000 pixels. To make the dataset easier to process, we cut the raw images into 416×416. Therefore, we can get 930 images with ship targets. The annotation information is the same as SSDD.

3) Settings: In the experiment of this article, the datasets are randomly divided into three parts according to the ratio of 8:1:1, which are respectively the training set, validation set, and test set. Before training, we obtain the different sizes of anchor boxes through the K-means algorithm. Since the Yolo head of each scale in this article sets three anchors, we obtain nine anchors, whose sizes are mentioned in Tables I and II for SSDD and AIR SARShip-1.0, respectively.

TABLE I Sizes of Nine Anchors on SSDD
Table I- Sizes of Nine Anchors on SSDD
TABLE II Sizes of Nine Anchors on AIR SARShip-1.0
Table II- Sizes of Nine Anchors on AIR SARShip-1.0

Because the number of remote sensing images is limited, CSPdarknet-tiny is pretrained on the PASCAL VOC dataset and then domain-specifically fine-tuned to adapt to remote sensing images. The commonly used ADAM algorithm [45] is taken as the gradient optimization algorithm. Moreover, the training process is divided into two parts. First, we freeze the parameters in the backbone while training the parameters in the morphological operation part, feature pyramid fusion part, and detection part. In this stage, the initial learning rate is set to 0.001, the batch size is set to 32, and the training epoch is set to 50. And we take CosineAnnealing as the learning rate adjustment method. In the latter 100 epochs of unfrozen training, we set a lower initial learning rate to 0.0001. The other settings are the same as the first stage.

All experiments are implemented in the Keras framework and carried out on a computer with an Nvidia GeForce RTX 3090 card. The operating system is Linux with CUDNN v8.

B. Evaluation Criteria

To evaluate the performance of the detection algorithm quantitatively, we adopt the following indexes to contrast different algorithms, including precision, recall, average precision (AP), and F1-score.

These evaluation criteria are calculated based on four components: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). In this article, TP and TN indicate the number of correct detected ships and correct backgrounds, respectively. FP represents the number of false alarms, and FN is the number of undetected ships. In order to judge whether the detected frame is correct, Intersection over Union (IoU) is introduced. IoU is calculated as the ratio of the overlap between the bounding box and the single true box \begin{equation*} \text{IoU} = \frac{{{S_ \cap }}}{{{S_ \cup }}} \tag{4} \end{equation*} View SourceRight-click on figure for MathML and additional features.where {S_ \cap } denotes the area of intersection of predict frame and true frame, while {S_ \cup } is the area of the concurrent set of the two.

A detected box can be judged right if IoU is greater than the standard, set to 0.5 there.

The formulas of precision rate, recall rate, AP, and F1-score are as follows: \begin{align*} {\text{precision }}&=\frac{{\text{TP}}}{{{\rm{TP + FP}}}} \tag{5}\\ {\text{recall }}&=\frac{{\text{TP}}}{{{\rm{TP + FN}}}} \tag{6}\\ \text{AP} &=\int_{0}^{1}{{P(R)dR}} \tag{7}\\ F1 &=\frac{{{{2 \times \text{precision} \times \text{recall}}}}}{{{{\text{precision} + \text{recall}}}}}. \tag{8} \end{align*} View SourceRight-click on figure for MathML and additional features.

Precision indicates the correct proportion of all predicted targets. The recall represents the proportion of correctly located and the proportion of identified targets in the total number of targets. We use AP and F1-score to evaluate the balance between precision and recall. AP computes the average value of precision over the interval from recall = 0 to recall = 1. F1-score is the harmonic average of the two. The higher the two values, the better the detection performance.

C. Evaluation of Morphological Structures

In this section, some trainable morphological operators are selected for comparative experiments that are carried out on SSDD.

First, four denoising operators are selected for experiments and are shown as follows: \begin{align*} &{c_1} = f \oplus {a_1} \tag{9}\\ &{c_2} = f\Theta {a_2} \tag{10}\\ &{c_3} = f \bullet {a_3} \tag{11}\\ &{c_4} = f \circ {a_4} \tag{12} \end{align*} View SourceRight-click on figure for MathML and additional features.where \oplus is the dilation, \Theta is the erosion, and \circ and \bullet designate grayscale opening and closing operators, respectively. {a_1}, {a_2}, {a_3}, {a_4} present different morphological elements trained by the network.

We take Yolo v4-tiny combined with the feature pyramid fusion module as the basic network. Except for the different morphological denoising operators, the structure of networks and parameter settings are all the same. Table III provides the results of the denoising operations. It is noticed that the methods carried out with different morphological denoising operators perform differently. As the boundary blurs, some information loses and the detection performance becomes worse. In these operators, the methods of opening and closing with more information saved are better than those of dilation and erosion. Meanwhile, the method of {c_4} performs the best with a similar testing speed.

TABLE III Comparison of Different Denoising Operators
Table III- Comparison of Different Denoising Operators

Next, we take the same settings with the denoising operators for the edge extraction operations and select four edge extraction operators, which are shown as follows: \begin{align*} &{g_1} = (f \bullet {b_1}) - f \tag{13}\\ &{g_2} = f - (f \circ {b_2}) \tag{14}\\ &{g_3} = (f \oplus {b_3}) - (f \bullet {b_4}) \tag{15}\\ &{g_4} = (f \circ {b_5}) - (f\Theta {b_6}) \tag{16} \end{align*} View SourceRight-click on figure for MathML and additional features.where {b_1}{b_6} present different morphological elements trained by the network.

The detection performance of these operators is given in Table IV. It can be observed that the methods combined with morphological edge extraction perform better than the basic network. Among them, the edge extraction method of {g_3} has the best overall detection performance that AP achieves 95.44% and F1-score achieves 0.91. Furthermore, the testing time of the method increases by 0.02 s approximately.

TABLE IV Comparison of Different Edge Extraction Operators
Table IV- Comparison of Different Edge Extraction Operators

Then, we explore the methods that combine the above two parts. The morphological operators are shown as follows: \begin{align*} &{h_1} = (f \circ {d_1}) \oplus {d_2} - (f \circ {d_1}) \bullet {d_3} \tag{17}\\ &{h_2} = (f \bullet {d_4}) \oplus {d_5} - (f \bullet {d_4}) \bullet {d_6} \tag{18}\\ &{h_3} = (f \circ {d_7} \bullet {d_8}) \oplus {d_9} - (f \circ {d_7} \bullet {d_8}) \bullet {d_{10}} \tag{19}\\ &{h_4} = (f \bullet {d_{11}} \circ {d_{12}}) \oplus {d_{13}} - (f \bullet {d_{11}} \circ {d_{12}}) \bullet {d_{14}} \tag{20} \end{align*} View SourceRight-click on figure for MathML and additional features.where {d_1}{d_{14}} present different morphological elements trained by the network.

The results of edge extraction methods combined with denoising are illustrated in Table V. It is obvious that the combination of the two operations has a better effect. Among the methods in Table V, because more valid information in the image is removed with the increase of the denoising structure, the first two structures have better performance than the last two structures; meanwhile, the testing time increases. As mentioned above, the method with closing expands the boundaries of the target and brings difficulties to detection. The method with {h_1} has the best performance in the methods with higher accuracy and efficiency. To further substantiate our conclusion, it is displayed in Fig. 7 that {h_1} has a better performance than the other contrast methods. In summary, the use of trainable morphological denoising and edge extraction modules in the detection network can obtain more accurate boundary information and improve detection performance.

TABLE V Comparison of Different Preprocessing Methods
Table V- Comparison of Different Preprocessing Methods
Fig. 7. - Effect of edge extraction methods combined with denoising. (a) Ground truth. (b) Results of ${h_1}$. (c) Results of ${h_2}$. (d) Results of ${h_3}$. (e) Results of ${h_4}$. The green rectangles in (a) are correct ship targets. The red rectangles in (b)–(e) indicate detected targets.
Fig. 7.

Effect of edge extraction methods combined with denoising. (a) Ground truth. (b) Results of {h_1}. (c) Results of {h_2}. (d) Results of {h_3}. (e) Results of {h_4}. The green rectangles in (a) are correct ship targets. The red rectangles in (b)–(e) indicate detected targets.

D. Ablation Study

In this section, three ablation experiments are applied on SSDD to evaluate the effects of the morphological preprocessing model and feature pyramid fusion structure in detail.

The first experiment adopts the original Yolo v4-tiny. The second experiment adds the feature pyramid fusion model based on Yolo v4-tiny. And the third experiment adopts the morphological module as the preprocessing process of the second experiment. Except for the feature pyramid fusion structure and morphological model, the structure of the three networks and parameter settings are all the same.

Some test results can be seen in Fig. 8. Fig. 8(a) shows the ground truth, while Fig. 8(b)–​(d) show the detection results of the above methods, respectively. In Fig. 8(a), the green rectangles represent correct ship targets, while in Fig. 8(b)–(d), the red rectangles are the detection results of these three methods. In the ground truth and the results of the detection, a single rectangle represents a single target. Different scenes are shown in Fig. 8, including inshore or offshore locations, from the prospect or close view, clear or degraded due to noise.

Fig. 8. - Effect of ablation studies.
Fig. 8.

Effect of ablation studies.

Fig. 8. - (Continued.) (a) Ground truth. (b) Results of Yolov4-tiny. (c) Results of FP-based Yolov4-tiny. (d) Results of Mor-FP Yolo v4-tiny. The green rectangles in (a) are correct ship targets. The red rectangles in (b)–(d) indicate detected targets.
Fig. 8.

(Continued.) (a) Ground truth. (b) Results of Yolov4-tiny. (c) Results of FP-based Yolov4-tiny. (d) Results of Mor-FP Yolo v4-tiny. The green rectangles in (a) are correct ship targets. The red rectangles in (b)–(d) indicate detected targets.

The pyramid fusion of multiscale features contains both the image's high-level semantic and detailed information and brings more detailed information compared with Yolo v4-tiny. As shown in the first three rows of Fig. 8(c), the detection effect of the network on small targets is improved. However, the detection effect is unsatisfactory for targets inshore and targets close to or overlapping, as shown in the fourth to sixth rows of Fig. 8(c). Then, the morphological model is absorbed into the FP-based Yolo v4-tiny as the preprocessing of the detection network, which reduces speckle noise, brings more edge information, and enhances the detection effect. As shown in the fourth to sixth rows of Fig. 8(d), the algorithm detects more targets, and target locations are more accurate than the methods aforementioned. In the fifth row of Fig. 8, some ships are located in the lower part of the image, close to the coast, relatively close to each other, and challenging to distinguish. In the detection results of the Yolo v4-tiny and FP-based Yolo v4-tiny, they are only partially detected or completely undetected, while Mor-FP Yolo v4-tiny can detect more accurately with the help of the morphological model. In summary, the proposed method achieves a better detection performance than other methods.

The evaluation indicators of the detection performance of these three methods are displayed in Table VI. It can be seen that the feature pyramid fusion model brings advances in the recall, AP, and F1-score. In addition, the preprocessing module brings improvements in all evaluation criteria. Especially, the detection performance of the Mor-FP Yolo v4-tiny is 8% and 0.05 higher than Yolo v4-tiny in terms of AP and F1-score, respectively. The improvement of the F1-score signifies the balance between recall and precision. Meanwhile, the number of parameters increases, and the testing time increases by 0.03s in total, as given in Table VI. It is confirmed that the proposed method adopting the feature pyramid fusion model and morphological model achieves a good detection performance for multiscale ships in SAR images.

TABLE VI Ablation Studies of the Proposed Method
Table VI- Ablation Studies of the Proposed Method

E. Comparison With CNNs

In this section, some CNN-based ship detection methods are contrasted with the Mor-FP Yolo v4-tiny, including SSD, CenterNet, Yolo v4, Yolo v4-tiny, and Yolox-tiny. They are tested on SSDD and AIR SARShip-1.0. As the datasets contain images of different polarization modes and resolutions, the experiments verify the robustness of the method. These methods are carried out based on the same train and test sets as the proposed method. Meanwhile, the settings of the methods adopt the default parameters. To evaluate the overall detection performance of these methods quantitatively, evaluation criteria mentioned in Section III-B are utilized in this part, and the results are given in Tables VII and VIII.

As can be seen from the results on SSDD in Table VII, the AP of the Mor-FP Yolo v4-tiny has achieved 96.36%, which is 6.4%, 3.0%, 6.2%, 8.1%, 1.4% higher than SSD, CenterNet, Yolo v4, Yolo v4-tiny, and Yolox-tiny, respectively. Although SSD, CenterNet, and Yolo v4 have outstanding performance in terms of precision, the recalls of these methods are not satisfactory, which brings an imbalance of precision and recall and a lower F1-score. Meanwhile, the proposed method has fewer parameters, about 1/3 of SSD, 1/5 of CenterNet, and 1/10 of Yolo v4. On the other hand, lightweight networks, including Yolo v4-tiny and Yolox-tiny, have higher detection efficiency but are not good at detection accuracy. As shown in Fig. 9, Mor-FP Yolo v4-tiny has taken a balance between accuracy, lightweight, and detection speed, with faster detection speed than classic CNNs and better detection accuracy than the other state-of-the-art lightweight networks. To further verify the generalization ability of the proposed method, all the algorithms are tested on the AIR SARShip-1.0. The results are given in Table VIII and Fig. 10. The proposed method also has a good performance on the AIR SARShip-1.0, demonstrating the generalization ability and robustness of the method. Therefore, it can be concluded that the proposed method has the best performance in the mentioned algorithms. Considering the requirements of high precision, high efficiency, and lightweight for SAR ship detection, the proposed method is better than other mentioned CNN-based ship detection methods.

TABLE VII Comparison with Other Detection Methods on SSDD
Table VII- Comparison with Other Detection Methods on SSDD
Fig. 9. - Effect of AP and test time of different detection algorithms on SSDD.
Fig. 9.

Effect of AP and test time of different detection algorithms on SSDD.

TABLE VIII Comparison With Other Detection Methods on AIR SARShip-1.0
Table VIII- Comparison With Other Detection Methods on AIR SARShip-1.0
Fig. 10. - Effect of AP and test time of different detection algorithms on AIR SARShip-1.0.
Fig. 10.

Effect of AP and test time of different detection algorithms on AIR SARShip-1.0.

SECTION IV.

Conclusion

This article proposes a SAR ship detection method based on edge extraction and morphological network to mitigate the influence of background clutter on network training and enhance the edge information of targets. Besides, this article proves that the combination of the feature pyramid fusion structure with the Yolo v4-tiny can improve detection accuracy. Experiments with different polarization modes and resolutions show that the proposed method has good robustness. The proposed network can not only be used for target detection but also for target recognition and segmentation. Further work can be done to explore the best network structure for different application directions and the relationships between the change of morphology parameters and data resolution.

References

References is not available for this document.