Multifeature Semantic Complementation Network for Marine Oil Spill Localization and Segmentation Based on SAR Images

Marine oil spill causes severe damage to the marine ecological environment. Synthetic aperture radar (SAR) is widely used in marine oil spill detection due to its all-day and all-weather advantages. However, long stripe shape oil spill areas make it challenging to extract the oil spills effectively. A multifeature semantic complementation network (MFSCNet) is proposed for oil spill localization and segmentation of SAR images in one framework to address these problems. The long strip shape interference of oil spill is reduced by extracting intensity and damping ratio characters from nonpolarimetric features and entropy, anisotropy, and mean scattering angle from polarization features to form a multifeature SAR image. Then, the backbone feature network and feature fusion module are used for feature extraction. The decoupled head and the proposed oil spill semantic segmentation head are used for localization and semantic segmentation tasks, respectively. Also, the semantic complementation module is used in the training phase. It combines the results of localization and semantic segmentation to obtain complementation boxes for interactive iterative updating of the model parameters to enhance the detection accuracy of localization boxes. The effectiveness of the proposed model is demonstrated based on a lot of Sentinel-1 oil spill data compared with other state-of-the-art methods.

estimated that about two million tons of oil annually leak into the marine environment [2]. Such a large-scale oil spill not only causes enormous economic losses and environmental damage, but also damages the marine ecological environment of coastal countries. A severe oil spill even threatens human health. The occurrence of marine accidents, such as ship accidents and pipeline ruptures, has led to the emergence of oil spills, which makes the source and location of oil spill random [3]. The early remediation of oil spills is critical to the result. Therefore, timely and accurate monitoring of oil spills is required to facilitate oil spill clean-up operations and protect the marine ecological environment [4], [5].
Satellite remote sensing is widely used in marine oil spill detection due to its comprehensive coverage and high timeliness. In recent years, related researchers have conducted studies in satellite remote sensing [6], [7]. Synthetic aperture radar (SAR) has become a powerful tool for oil spill detection because of its all-day monitoring and unaffected capability by clouds and fog [8], [9], [10]. The presence of oil film on the sea surface suppresses the short gravity capillary waves and reduces the roughness of the oil spill area, which results in oil spill areas appearing dark in SAR images [11]. However, oil spill areas are usually a small part of the whole SAR image, and many natural phenomena also appear as dark spots, such as internal waves and low wind speed areas [1], [12]. It is essential to quickly locate the oil spill area from the entire scene image, realize semantic segmentation, and reduce the calculation amount of many background areas.
Oil spill detection can be divided into three steps: feature extraction, dark spot detection, and segmentation. Feature extraction plays a vital role in oil spill detection, which can enhance oil spill information and reduce the interference of the long strip shape oil spill. Many researchers have extracted oil spill features from SAR images. Topouzelis et al. [13] used a decision forest approach to evaluate a total of 25 features in geometric, physical, and textural and found that a combination of nine features was more effective for oil spill detection. Mdakane et al. [14] extracted geometric, physical, and texture totaling 29 features for oil spill detection. Fifteen significant features that can be used for oil spill detection were identified by using a multifeature selection method. These traditional feature extraction methods only consider the intensity information of SAR images and do not view the phase information. Polarimetric features contain This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ physical information about oil spills, overcoming the limitations of single polarimetric technology [15], [16]. Ma et al. [17] used both the amplitude and phase information to extract the polarimetric features of SAR images and experimentally demonstrated the capability of multilayer features for oil spill detection. Ren et al. [18] extracted polarimetric features from full-polarimetric SAR data for oil spill detection and experimentally verified the ability of the proposed features and methods to distinguish between crude oil, vegetable oil, and seawater. Song et al. [19] extracted polarimetric features and constructed nine-channel PolSAR data. The experimental results showed that the proposed method could effectively distinguish between oil spills and biooil films. However, most of the above methods obtain features from oil spill targets and satellite characteristics themselves, which will produce more redundancy and incompatibility of subsequent algorithms. Selecting salient features organically combined with subsequent positioning and semantic segmentation is critical to ensure the accuracy and spatial consistency of the long strip shape oil spill area.
In recent years, with the development of deep learning, it has been gradually applied to the field of computer vision [20], [21], [22], [23]. Relevant researchers have also used deep learning to remote sensing, such as ship detection and marine aquaculture extraction [24], [25], [26], [27]. Also, many researchers have applied deep learning to oil spill detection. Raeisi et al. [28] combined the cuckoo search algorithm and nonnegative matrix factorization of different Zernike moment features to distinguish oil spills and look-alikes in SAR images. Aghaei et al. [29] proposed OSDES-Net methods for oil spill detection based on efficient ShuffleNet blocks. Laurentiis et al. [30] used airborne uninhabited aerial vehicle SAR data to classify oil spills based on the deep learning framework and successfully classified mineral oil film, biogenic oil film, and clean sea, which verified the oil film classification potential of the convolution neural network. Bianchi et al. [31] used a fully convolutional neural network for oil spill segmentation and then used a classification network to predict 12 different categories, such as shape and texture. Zeng et al. [32] constructed a relatively deep-level DCNN for oil spill classification of SAR images based on the VGG16 network. Aghaei et al. [33] extracted oil spill features after dark spot detection of SAR images and used an improved level set method for classification identification. The classification results of these methods depend on the dark spot detection results, and the inaccuracy of the dark spot detection increases the computation time and false alarms. Yang et al. [34] used You Only Look Once version 4 (YOLOv4) method for oil spill detection in the region of interest, which was effectively validated on an extended dataset by removing tiny oil spill areas in the study area. Huang et al. [35] used an object detection model called Faster R-CNN to detect oil spill areas in large-scale remote sensing images and analyzed the interference of additional factors on oil spill detection to determine the optimal wind speed and incidence angle degrees for oil spill detection. Nieto-Hidalgo et al. [36] used three pairs of convolutional neural networks to form a two-stage oil spill detection model for side-looking airborne radar (SLAR) images, with the first network performing coarse detection and the second acquiring the exact pixels of the class, showing experimentally that the proposed method outperforms previous methods used for this task. Guo et al. [37] used 4200 images extracted from five original SAR images of oil spill to validate the potential of the SegNet model for dark spot detection in the oil spill area. Wang et al. [38] performed oil spill detection based on the AlexNet model and achieved oil spill segmentation of SAR images through slice classification and morphological filtering. Ma et al. [17] used the amplitude and phase information of SAR data to extract polarimetric features and combined multilayer features with a deep learning network model to achieve oil spill segmentation of SAR images, achieving excellent results in terms of accuracy and inference time. Zhu et al. [39] proposed a contextual and boundary-supervised network (CBD-Net) for oil spill detection, which improved the extraction results of oil spill regions in SAR images with intensity inhomogeneity, high noise, and boundary-blurring by fusing multiscale features, spatial and channel squeeze excitation block and joint loss functions. However, the localization approaches only capture the location of the oil spill and cannot obtain more information, such as the oil spill area. Semantic segmentation methods are susceptible to false detection due to the interference of coherent speckle noise and other marine targets. Therefore, fast localization and semantic segmentation collaboration are more suitable for the actual oil spill detection process.
Related researchers have already researched on simultaneous localization and segmentation. He et al. [40] proposed a two-stage target detection algorithm that enables segmentation during localization. The first stage uses a region proposal network to obtain proposals. The second stage uses RoIAlign to resample the proposal to the same size and perform classification, localization, and segmentation. Then, the target instance segmentation is achieved. Bolya et al. [41] proposed a one-stage instance segmentation model. It implements instance segmentation mainly through two parallel subnetworks. One network is used to generate the class, localization, and mask coefficients of each anchor. The other network generates a set of prototype masks, which are then multiplied with the mask coefficients to obtain the segmentation result for each target. Also, Yekeen et al. [42] used Mask R-CNN to detect marine oil spills, and they realized the detection and segmentation of oil spills, look-alikes, ships, and land areas after pretreatment. Although these methods allow for the localization and segmentation of oil spills, some issues still need to be solved. Due to the long strip shape of the oil spill with non-Gaussian distribution, as shown in Fig. 1, the localization box cannot completely cover the whole oil spill area. Incomplete localization can seriously affect the accuracy of semantic segmentation. However, due to other nonoil spill dark spots in SAR images, as shown in Fig. 1(e), the wrong segmentation tends to occur when only semantic segmentation is used.
To solve the above problems, this article proposes a multifeature semantic complementation network (MFSCNet) for oil spill localization and segmentation of SAR images. MFSCNet uses a deep convolutional neural network to extract depth features from multifeature SAR images. The feature extraction results are used in the decoupled head and the oil spill semantic segmentation head for simultaneous implementation of oil spill localization and segmentation. The semantic complementation module obtains the complementation box by combining the localization and segmentation results in the training phase. Then, the model is updated interactively and iteratively to improve the localization results. The main contributions of this study are as follows: 1) MFSCNet is proposed for performing marine oil spill localization and semantic segmentation simultaneously, which can avoid wrong segmentation. It includes multifeature extraction, PANet for feature fusion, a decoupled head, a segmentation head, and a semantic complementation module. 2) A multifeature extraction module is constructed to enhance the oil spill accuracy and spatial consistency and reduce the interference of long stripe shape oil spill area by combining intensity and damping ratio features in nonpolarimetric features and entropy (H), anisotropy (A), and mean scattering angle (Alpha) in polarimetric features. 3) A semantic complementation module is designed in the training phase. It integrates the localization and segmentation results to generate the complementation box, which is iteratively updated to cover the entire oil spill area effectively. The remainder of this article is organized as follows. Section II introduces the related work related to the research direction of this article. Section III introduces the proposed MFSCNet model in detail for marine oil spill in SAR image. Section IV shows the experimental results of the work. Section V summarizes the highlights of the article and future research.

II. RELATED WORK
Current object detection algorithms can be broadly classified into two categories, one is two-stage object detection algorithm and the other is one-stage object detection algorithm [43]. Two-stage object detection algorithms have higher accuracy but are slower, such as the R-CNN series algorithm [22], [40], [44]. They obtain higher detection accuracy by acquiring region proposals and then performing detection. One-stage object detection algorithm treats detection as a regression problem. It uses a framework to directly implement classification or localization, such as the YOLO series algorithm and SSD [21], [41], [45], [46], etc. This type of algorithm is more advantageous in terms of speed. It is more important to identify the oil spill area for oil spill detection quickly, so the one-stage object detection algorithm is more suitable for oil spill detection. Further, one-stage object detection algorithms can be classified into anchor-based and anchor-free. Anchor-based generates prediction boxes based on a predetermined number of anchors with fixed scales and aspect ratios. In contrast, anchor-free generates prediction boxes directly based on points [47]. For regular targets such as people or cars, the anchor-based approach will be more advantageous in terms of accuracy because their sizes are relatively fixed. However, the dimensions are not suitable for oil spills. The anchor-based approach will limit the ability of the model. So, the anchor-free method is more suitable for oil spill detection.
With the development of the YOLO series, YOLOX [45] is proposed as a one-stage anchor-free object detection algorithm, which is more suitable for oil spill detection. As shown in Fig. 2, the YOLOX model can be divided into four parts from input to output: Backbone, PANet, decoupled head, and loss. For the input SAR image, the feature extraction is first performed using a backbone feature network named CSPDarknet53 [48]. Then, the PANet [49] module is used to fuse the feature extraction results of the last three layers. After fusion, the output is performed using the decoupled head for the three feature layers. Two parallel branches are used in the decoupled head to predict localization boxes, confidence, and classification, respectively. Finally, the SimOTA algorithm is used to perform label assignment to calculate the loss. The loss function is as follows: where the L loss is the loss function of L cls and L conf , l i is the predicted probability of pixel i, l i is the ground-truth label, N is the number.
where the L reg is the loss function of the localization boxes, l box is the predicted localization boxes, l box is the ground-truth boxes.

III. PROPOSED METHOD
The proposed MFSCNet can be divided into five parts, and the framework is shown in Fig. 3. The first part is the multifeature input, which extracts the nonpolarimetric features and polarimetric features of SAR images to form the multifeature SAR image for reducing the interference of long stripe shape oil spill area and enhancing the oil spill information. The second part is a backbone feature network composed of CSPDarknet53 for preliminary feature extraction. The third part is the PANet module, which fuses the semantic and spatial information of the last three feature layers of the backbone by a top-bottom and bottom-up approach. The fourth part is a detection head consisting of a decoupled head and a segmentation head that uses the feature extraction results for the localization and segmentation tasks. The fifth part is the semantic complementation module, which uses the watershed algorithm to obtain complementation boxes by combining the localization and segmentation results. Then, it is used to interactively and iteratively update the model to improve the accuracy.

A. SAR-Based Feature Extraction
For better oil spill detection, a multifeature input module is proposed. A good feature enhances the extracted target information and enables the deep learning model to learn the valuable features better and faster. The long stripe shape oil spill area in SAR images brings some interference to oil spill detection. It is difficult to extract oil spill information effectively by filtering methods, or intensity features only. Therefore, the intensity and damping ratio features in the nonpolarimetric features and the H/A/Alpha feature in the polarimetric features are extracted for oil spill detection.
The intensity feature reflects the image brightness, which represents the backward scattering coefficient of the radar and reflects the roughness of the object. The damping ratio is the ratio of the backward scattering coefficient of the calm sea surface to the backward scattering coefficient of the oil film, which reflects some extent, the emulsification degree of the oil film in SAR images. A larger damping ratio value indicates a thicker oil film, while a smaller damping ratio value indicates a thinner oil film. The polarimetric features reflect the polarimetric scattering characteristics of the object. H describes the randomness of the target scattering. A is a complementary parameter to H. Alpha indicates the scattering type of the target.
1) Nonpolarimetric Features: Due to the interference of imaging weather, geographical location, marine targets, and other factors, most of the target information is in the low gray value area. In contrast, maritime targets, especially ships, are concentrated in the high gray value area. However, this information will interfere with oil spill detection. The intensity features obtained are decimalized to enhance the information in the low gray value region and compress the information in the high gray value region and the decibel is shown in the following: x db = 10 log 10 σ 0 where σ 0 is the backscattering coefficient and x db is the intensity feature after decibelization. The damping ratio is the ratio of the backscattering coefficient of the calm sea surface to that of the oil film and some extent and it can be calculated according to the following: where σ 0 water is the backscattering coefficient of seawater, σ 0 oil is the backscattering coefficient of oil, and x dr is the extracted damping ratio feature.
2) Polarimetric Features: SAR can obtain different polarimetric information by controlling the polarimetric mode of transmission and reception. Different polarimetric modes have various sensitivities to other ground objects. Multipolarimetric SAR can get more abundant target information to analyze the scattering mechanism of targets and avoid the uncertainty of target information. For dual-polarimetric SAR data, Cloude-Pottier can be performed to obtain the target's polarimetric information for better representation.
The C 2 polarimetric covariance matrix must first be extracted to extract the polarimetric information from the SAR image. After preprocessing operations such as multiview and radiometric calibration, the C 2 polarimetric covariance matrix is extracted with the following: where is the phase of polarimetric channel, * is conjugate operator, <> is statistical mean, and j is imaginary unit. The eigenvalue is calculated according to the polarimetric covariance matrix: where λ i is the eigenvalues in polarimetric covariance matrix and e i is the feature vector.
Then the probability of the eigenvector λ i corresponding to each eigenvalue e i is calculated, p i is as follows: where the λ 1 and λ 2 are two eigenvalues of polarimetric covariance matrix, p i is proportion probability. Next, the H/A/Alpha feature is calculated according to the pseudo probability and the eigenvalue obtained above. The H/A/Alpha is shown in following: with where H is the entropy, A is the anisotropy, α is the average polarimetric scattering angle, e 1i is the value of the first row, and the first column of the eigenvector. After feature extraction, different results have different orders of magnitude. Two percent truncated linear stretching is used to unify all data into the same dimension [50].

B. Oil Spill Segmentation Head
An essential problem in oil spill detection of SAR images is the interference of coherent speckle noise. Generally, the shallow features in the depth learning model are more susceptible to the interference of coherent speckle noise. As shown in Fig. 4, two different segmentation heads are proposed for judging the influence of coherent speckle noise on segmentation.
The semantic segmentation module uses the output T O 3 from PANet and the outcomes O 1 and O 2 from Backbone. After T O 3 is resampled and spliced with the output of Backbone of the corresponding size, a convolutional block consisting of a combination of convolution, BN, and SiLu is then used for feature extraction. Finally, the semantic segmentation result is output by convolution of 1×1.
The input T O 3 is first resampled to the same size as O 2 using a transposed convolution, then stitched in the channel dimension, followed by feature extraction using two convolution blocks. The feature extraction result continues with a transposed convolution to resample it to the same size as O 1 , followed by feature extraction using the same two convolution blocks. To ensure that the output segmentation size is consistent with the input image, a transposed convolution is used to resample it to the same size as the input. After using one convolution block for feature extraction, a convolution of 1×1 is used to output a semantic segmentation result that is the same size as the original image.
In the training phase, the semantic segmentation loss function with the following: where y i is predicted probability of pixel i, y i is the ground-truth label, and N is the number.
In the prediction stage, the activation function is used to obtain the probability of the category to which each pixel on the output feature map belongs. Then the index of the maximum value corresponding to each topic is selected as the category to which it belongs, and the semantic segmentation result is output where y i is the output after activation function of pixel i. The final semantic segmentation result is obtained after the above operation and is denoted as y ss . In addition, the localization box will be combined with the semantic segmentation result, and the intersection region of the two will be taken as the final segmentation result with the following: y ss = y ss ∩ y box (15) where y box is the predicted localization box, y ss is the predicted semantic segmentation, y ss is the result of semantic segmentation after taking the intersection.

C. Semantic Complementation
After obtaining the localization and semantic segmentation results, k samples are accepted according to the label assignment strategy SimOTA. The semantic complementation is performed on these k samples using the watershed algorithm. The outer rectangle of the segmentation results after the complementation is used as the complementation box, and the loss is calculated with the actual value to update the model parameters.
First, get the current semantic segmentation result, which is defined as y ss . Next, the value of the Intersection over Union (IOU ) between the k localization boxes and the actual boxes obtained by the label assignment strategy is calculated. When the result is less than 0.3, semantic complementation is performed. Otherwise, the development of the localization box remains unchanged, and no processing is performed. The intersection is used as the label map for subsequent completion s label = y ss ∩ y box , IOU(y box , y box ) < 0.3 pass, otherwise (16) where y box is the ground-truth box, s label is the intersection results, pass means no semantic complementation. The intensity data x db is then fed into the watershed algorithm along with the label map s label to obtain the complementation result, which is only performed if there is an oil spill area in the label map. Otherwise, the complementation result is 0 where wa(·) is the watershed algorithm and s mask is the complementation result.
After obtaining the complementation result, get the outer rectangle of the result to obtain the final complementation box, which will be 0 when the complementation result does not contain the oil spill area where y box is the semantic complementation box, h and v represent abscissa and ordinate, respectively, h 1 and v 1 represent the coordinates of the upper left corner of the complementation box, h 2 and v 2 represent the coordinates of the lower right corner of the complementation box. The semantic complementation module is formally described in Algorithm 1, the loss function of the semantic complementation module is the same as that of L reg .
Finally, the total loss function formula is shown below: (19) where Loss is the total loss, L se is the loss of semantic segmentation, L sc is the loss of semantic complementation, L reg is the loss of localization box, L cls is the loss of classification, L cls is the loss of confidence, w 1 is the weighting factor of L se , w 2 is the weighting factor of L sc , w 3 is the weighting factor of L reg .

A. Implementation Details
All experiments are compiled under Windows 10 with python 3.6, pytorch 1.7.1, and cuda 11.0, run with GeForce RTX 3080 GPU. The optimization algorithm uses Adam. The initial learning rate of the network is 1e−4. SetpLR is used to adjust the learning rate of each epoch, and the adjustment multiple is 0.92. The value of w 1 and w 2 are 2, 0.1, respectively. The value of w 3 is 5, referring the result of literature [45]. The training epoch is set as 100. Moreover, the total training time is 5.6 h. The proposed model is trained without the semantic complementation module in the first 80 epochs, and it is utilized in the final 20 epochs.

Algorithm 1: Semantic Complementation.
Input: x db : intensity data after processing k : k samples obtained by label allocation y box : location boxes obtained by prediction y ss : segmentation obtained by prediction y box : ground-truth boxes s label : intersection results s mask : complementation results y box : semantic complementation boxes IOU (·) : intersection over union wa(·) : watershed algorithm Output: Semantic complementation result. for epochs do Existing oil spill datasets are difficult to obtain, and there are few publicly available datasets to evaluate the dependability of oil spill detection technologies. Therefore, 82 Sentinel-1 data scenes are obtained from the Alaska Satellite Facility (ASF) data distribution website (https://search.asf.alaska.edu) for the period 2014-2021 based on oil spill information provided by other researchers and the findings of long-term monitoring of the Chinese Bohai Sea [17], [39], [51], [52], [53], [54]. Sentinel-1 data in vertical emission vertical reception (VV) and vertical emission horizontal reception (VH) polarimetric modes in Interferometric Wide (IW) swath strip scan mode are used in this work to extract different features.
Thermal noise removal, radiometric calibration, filtering, terrain correction, and multilooking preprocessing are all conducted on the obtained data. The filtering employs a 7 × 7 refined Lee filter and the multilooking size is 1 × 5. The oil spill region is then clipped out. Also, resampling and data augmentation are carried out. Finally, 1024 × 1024 pixels SAR images with 20 × 20 m resolution of 1050 views are obtained. The ratio of training and test sets is 7:3. This means that the number of training sets is 730, and the number of test sets is 320. Some sample images are shown in Fig. 5. In the experiment, the ground-truth of oil spill SAR images are collected through relevant news reports, literature publications, and our daily oil spill monitoring. The software of labelme is also used for labeling ground-truth labels.

C. Evaluation Criteria
In order to objectively evaluate the performance of different models, different evaluation metrics are used to compare the results of localization and semantic segmentation. Average precision (AP ) is used for localization. Overall accuracy (OA), mean intersection over Union (MIoU), F 1-score (F 1), and Kappa are used for semantic segmentation.
1) The Evaluation Criteria of Localization: Before AP calculation, Precision box and Recall box need to be obtained. These two values need to be obtained according to intersection over union (IOU ) with the following: where P B is the prediction boxes and T B is the ground-truth boxes. Next, Precision box and Recall box are obtained for different confidence levels according to the confidence level of the prediction boxes, with the following: Recall box = T P T P + F N (22) where T P , T N, F P , and F N denote the number of true positives, true negatives, false positives, and false negative samples, respectively Precision box and Recall box are the result of localization. After receiving Precision box and Recall box at different confidence levels, the P-R curve is plotted, the area under the P-R curve is calculated, and the result is the AP with the following: (23) where p(r) is the value of Precision box under different confidence levels.
2) Evaluation Criteria of Semantic Segmentation: The metrics OA, F 1, MIoU, and Kappa are used to compare the semantic segmentation results of different methods. There are calculated as follows: where

1) Comparison With Other Oil Spill Detection Methods:
To validate the performance of the proposed model adequately, it is compared with the present oil spill detection algorithm. The proposed method is compared with AlexNet 1 [38], Seg-Net 2 [37], Faster R-CNN 3 [35], YOLOV4 4 [34], YOLOX 5 , Mask R-CNN 6 [42], and YOLACT 7 [41]. Due to the differences between the various comparison methods, they are compared in terms of localization and semantic segmentation, respectively.
The experimental results are shown in Table I. The proposed MFSCNet achieves the highest accuracy in both localization and semantic segmentation. Unlike conventional object detection, the shape and size of the oil spill area are not uniform and exhibit a non-Gaussian normal distribution. This leads to anchorbased depth learning object detection algorithms such as Faster R-CNN, YOLOV4, Mask R-CNN, and YOLACT, which are difficult to find a suitable anchor. Thus, reducing the localization This study also achieve the highest accuracy in semantic segmentation. The segmentation results are shown in Fig. 6. 8 It can be seen that MFSCNet proposed in this article can extract oil spill regions of different sizes and shapes completely with high accuracy. The Alexnet method performs semantic segmentation by slider classification. Mask R-CNN resamples the feature maps to the same size by RoIAlign before performing semantic segmentation. This leads to poor prediction results in the edge region for both methods, which are prone to misclassification. In contrast, SegNet is semantic segmentation of the whole image, and the presence of targets such as coherent speckle noise and maritime targets affects the segmentation results. In contrast, YOLACT combines localization and semantic segmentation, and each localization box has a semantic segmentation result. The localization boxes limit the accuracy of semantic segmentation, leading to the truncation phenomenon shown in Fig. 6(f). This splits a complete oil spill target into two separate targets, reducing the accuracy of the split. Finally, the MFSCNet proposed in this article treats localization and semantic segmentation as two parallel tasks that restrict the semantic segmentation results through the localization box and takes the intersection region of both as the final segmentation result to extract the oil spill area effectively. Also, Table II shows the computational cost of different oil spill detection methods. Since the AlexNet method classifies for small images instead of segmentation, it takes a longer time but has lower Giga Floating Point of Operations  (GFLOPs) [45] values. The MFSCNet proposed in this study achieves a balance between time and GFLOPs. Fig. 7 shows the prediction results of the proposed method in this article. It can be seen that the SAR oil spill images of different sizes and shapes can be extracted well. Also, the image with other nonoil dark spots can also be well extracted.
2) Comparison of Different Features: This study uses the intensity and damping ratio features in the nonpolarimetric features and the H/A/Alpha features in the polarimetric features. Oil spill detection is carried out by combining polarimetric with nonpolarimetric features. The detection results are shown in Table III, and the visualization results under different features are shown in Fig. 8. The combination of different features has higher accuracy in localization than single features. This indicates that the extracted multifeature module is effective. Although, the addition of the damping ratio feature enhances the information on the dark spot region and improves the detection accuracy. However, the damping ratio reflects the emulsification degree of the oil film. It will have different feature extraction results for  the same oil spill area, which is difficult to have consistency. Thus, the phenomenon of overlapping localization boxes will occur, shown in Fig. 8(i) and polarimetric features can effectively maintain the spatial consistency of the oil spill area by obtaining richer target information. Both the sparse oil spill area in Fig. 8(a) and the long strip of the oil spill area in Fig. 8(f) can be detected well. Finally, 82.85% is achieved in AP , which indicates that the multifeature module proposed in this study is practical and feasible to effectively enhance the spatial consistency of oil spills and improve the oil spill detection accuracy. 3) Comparison of Different Modules: Different modules will have different impacts on the results. The effects of the modules proposed in this study on the oil spill detection accuracy are shown in Table IV. With the addition of modules, the accuracy of oil spill detection is further improved. After using the multifeature module, the interference of speckle noise and offshore targets on oil spill detection is reduced, and the localization evaluation metric AP is increased from 82.85% to 85.26%, which shows that the proposed feature is feasible and can effectively improve the accuracy of oil spill detection. Adding a semantic segmentation module not only realizes the segmentation of the oil spill area, but also enhances the accuracy of the localization box. Finally, the semantic complementation module enhances the localization and segmentation branches through the organic combination of the localization and semantic segmentation branches. MFSCNet model obtains AP , OA, F 1, MIoU, and Kappa are 86.24%, 99.41%, 79.10%, 83.20%, and 0.79.
Considering the effect of coherent speckle noise in SAR images, two semantic segmentation heads shown in Fig. 3 are proposed in this study. The evaluation results are shown in Table  V, and the segmentation results are displayed in Fig. 9. The proposed fine segmentation head offers better detection performance. It can be seen from Fig. 9 that the two modules are disturbed by the coherent speckle noise, which produces different phenomena. The rough segmentation head has poorer prediction results for the edge region. The expansion phenomenon occurs in the edge region, and the background area is predicted as the oil spill area. The misclassification region in Fig. 9(h) and (i) appear because the shallow features are not considered. For the fine segmentation head, the edge region is better predicted, and it is easier to distinguish the boundary area of the oil spill and the background. However, it will be affected by the background noise and the phenomenon indicated by the red boxes shown in Fig. 9(k) appears. Still, it has less impact on the overall semantic segmentation results. So, this study uses the fine segmentation head for the semantic segmentation task.

4) Comparison of Different Hyperparameters:
Different values of the weighting factors of the loss function will affect the results, and experiments are conducted to find the appropriate weighting factors.
The results of semantic segmentation loss with various weighting factors are compared, and the implementation results are shown in Table VI. Because the presence of the localization  box limits the semantic segmentation results, the segmentation accuracy is affected by the localization accuracy. As the value of the weighting factors increases, both localization and semantic segmentation results show a trend of increasing and then decreasing. When the value is 2, the greatest accuracy for both localization and segmentation is achieved. Therefore, the weighting factor of the semantic segmentation loss function is set to 2.
The semantic complementation module uses the results of localization and semantic segmentation to produce a label map. Then, the watershed algorithm is used to obtain a complementation box for semantic complementation. After the loss calculation of the complementation box with the ground-truth box, updating the model parameters. However, not all localization boxes are subjected to semantic complementation, and the intersection ratio between localization boxes and ground-truth boxes needs to be judged and different thresholds affect the number of localization boxes that perform complementation, so the results of varying threshold cases are compared. The results are shown in Table VII, and the highest accuracy is achieved when the threshold value is 0.3. Therefore, the threshold value for the semantic complementation module is determined to be 0.3. Although, the semantic complementation module is executed in the last period of the training phase. However, there is a specific error between the output of the training phase and the ground-truth, and the watershed algorithm cannot completely complement the oil spill region, which leads to the error between the final acquired complementation box and the ground-truth box as well. Therefore, it is crucial to choose a suitable loss function weighting factor. Then, considering the existence of errors, 0.1, 0.5, 1, 2, and 3 are selected for comparison. The results are shown in Table VIII, and the highest accuracy is achieved in localization and segmentation when the value is 1.
As the value of the weighting factor increases, the accuracy gradually decreases. Still, all of them are higher than the accuracy when the module is not used. This indicates that the proposed semantic complementation module is practical and can improve localization and semantic segmentation accuracy. Therefore, the value of the loss function weighting factor of the semantic complementation module is determined to be 1.

V. CONCLUSION
This study proposes MFSCNet model to implement oil spill localization and semantic segmentation in the single neural network model. The multifeature input module eliminates the long stripe interference of oil spills with the non-Gaussian distribution. A semantic complementation module is proposed to improve the accuracy of the localization box by combining the localization and semantic segmentation results. The best optimal parameters of the model are determined through hyperparameter experiments. MFSCNet is superior to other oil spill detection methods and more suitable for daily monitoring of oil spill events. In addition, how to distinguish oil spills from many kinds of look-alikes can be researched in the future.