A Semisupervised Arbitrary-Oriented SAR Ship Detection Network Based on Interference Consistency Learning and Pseudolabel Calibration

The rapid development of deep learning cannot be achieved without the support of abundant labeled data. However, obtaining such a large amount of annotated data needs the support of professionals in the field of synthetic aperture radar (SAR) image understanding, which leads to the scarcity of SAR datasets with annotations. The scarcity of annotations poses a bottleneck in the performance of SAR ship detectors based on deep learning. Recently, semisupervised learning has become a hot paradigm, which can mine effective information from unlabeled data to further improve the performance of SAR ship detectors. However, existing semisupervised SAR ship detection studies all adopted multistage semisupervised frameworks, which are complex and inefficient. In this article, we first design an end-to-end semisupervised framework for SAR ship detection. To overcome the strong interferences resulting from the imaging or quantization processes in SAR, we introduce the interference consistency learning mechanism to enhance the model's robustness. To solve the complex background in the inshore scenario, a pseudolabel calibration network is designed to calibrate the pseudolabel according to the context knowledge around the ships. Based on the high-resolution SAR images dataset (HRSID) and the other four datasets, the superiority of the proposed approach over several state-of-the-art semisupervised frameworks has been evaluated under various labeling ratios, i.e., 1%, 5%, 10%, and 100%.


I. INTRODUCTION
S YNTHETIC aperture radar (SAR) [1] ship detection plays a pivotal role in interpreting SAR images, which has been widely applied in maritime traffic safety and battlefield reconnaissance. Traditional SAR ship detectors can be divided into the following four categories: 1) threshold-based approaches [2], [3]; 2) saliency-based approaches [4], [5]; 3) hand-crafted feature-based approaches [6], [7]; 4) statistical modeling-based approaches [8], [9]. With the emergence of deep learning, convolutional neural networks (CNNs) [10] and graph convolutional networks [11] have been applied in remote sensing and have shown great advantages against handcrafted features. These deep learning detectors outperformed the traditional methods in SAR ship detection tasks [12], [13], [14], [15]. However, these are horizontal bounding box (HBB) based detectors. Ships have arbitrary orientations and elongated shapes. HBB is not an effective representation of ships, particularly when ships are densely arranged near the wharf. To solve this problem, the oriented bounding box (OBB) is used to locate the target in remote sensing image (RSI) [16], [17], [18], [19], [20], [21]. At the same time, more and more high-resolution SAR datasets with OBB annotations have emerged [22], [23], [24], [25], which has led to significant progress in deep learning based SAR ship detectors. Cui et al. [26] and Su et al. [27] adapt the novel attention mechanism and deformable convolution to enhance the model's ability to extract key information from complex backgrounds. He et al. [28] designed a polar encoding to solve the problem of angle discontinuity in oriented SAT ship detection. Zhou et al. [29] attempts to replace CNN with transformer to introduce the global attention mechanism, which can further improve the detection performance of SAR ships in complex scenarios.
The above-mentioned models that must be trained with labeled data belong to supervised learning (SL). The dependence on the labeled data becomes the bottleneck of the existing CNNbased SAR ship detectors in real applications. On one hand, the data labeling of SAR is difficult. It can only be done with the support of professionals with the corresponding backgrounds. Researchers have attempted to use optical, infrared, and other sensor data to assist in improving the performance of models on SAR images [30], [31]. However, this requires paired annotated data, which is difficult to meet in some practical applications. On the other hand, massive SAR images are produced every day. These unlabeled data account for a much larger proportion than labeled data. Without labeling the targets in time, these massive SAR images cannot be used effectively. If considering the practical applications of the SAR ship detection tasks, the unlabeled data containing useful information cannot be neglected.
Recently, semisupervised learning (SSL) has become a hot paradigm, which can mine effective information from unlabeled data to further improve performance. Fig. 1 shows the difference between SL and SSL. Compared with SL, the training process of SSL involves unlabeled data, which means that the SSL paradigm has potential application value in some actual scenarios where massive unlabeled SAR image data can be obtained. Wang et al. [34] directly tried to apply the SSL based image classification to a SAR ship detection task. Chen et al. [35] designed a cross-domain coattention feature correlation module that addresses the domain adaptation problem. Hou et al. [36] designed an adversarial network to make the local features of the SAR ship in the unlabeled images closer to those in the marked images. However, existing SSL applications [32], [33] in SAR ship detection all adopt the multistage framework, which limits the training efficiency and performance. On one hand, the training process of the multistage framework is complex. As shown in Fig. 2, it includes three stages. It needs to train the teacher model first and then train the student model, so the training time is much longer than SL. On the other hand, the performance of the detector will be limited by the initial pseudolabels. Once pseudolabels of multistage frameworks are generated, they will not change throughout the entire training process. The errors in the initial pseudolabels will mislead the student model's learning.
Unlike the multistage methods, [37] introduced an end-to-end training strategy into an SSL, which has one training stage. During each iteration, the teacher model's parameters are updated by the student model's exponential moving average (EMA). As the pseudolabels of the end-to-end framework update dynamically, the quality of the generated pseudolabels will be improved gradually. However, various interferences may be introduced in SAR imaging and quantization, e.g., speckle noise and scattering interference [38]. Once the ship in the unlabeled image has been contaminated, it is difficult to detect it. The number of pseudolabels will be significantly reduced, and the potential information in unlabeled data cannot be fully mined. The quality of pseudolabels will also decrease, which may mislead the model's learning. These interferences are unique to SAR images and have not been discussed in previous end-to-end semisupervised frameworks. Moreover, SAR ships in inshore scenes are subject to serious background interference, which can impact the accuracy of pseudolabels and lead to false alarms, such as wharf buildings with shapes similar to ships. To solve these issues, we propose an end-to-end semisupervised network for arbitrary-oriented SAR ship detection based on interference consistency learning (ICL) and pseudolabel calibration network (PLC). ICL enhances the model's robustness to different interferences in SAR images, thereby improving the quality of pseudolabels in interference scenarios. PLC is designed to calibrate the pseudolabel according to the context knowledge surrounding ships. It can alleviate the error caused by inaccurate pseudolabels in inshore scenarios, further improving detection accuracy. Previous SAR ship semisupervised work has not attempted to use contextual information to improve the quality of pseudolabels in inshore scenarios.
The main contributions of this work are summarized as follows.
1) We propose the first end-to-end semisupervised framework for SAR ship detection, which is more efficient than existing multistage semisupervised frameworks.
2) The ICL is introduced to enhance the model's robustness under strong interferences resulting from the imaging or quantization processes in SAR.
3) The PLC is introduced to calibrate the incorrect pseudolabel introduced by complex inshore backgrounds. 4) The experimental outcomes across high-resolution SAR images dataset (HRSID) and four other datasets demonstrate that our proposed approach outperforms several prevalent frameworks. The rest of this article is organized as follows. The related works are introduced in Section II. In Section III, the SAR-Teacher is introduced, whose experimental results compared with other state-of-the-art methods are given in Section IV. Finally, Section V concludes this article.
Notations: Throughout the article, matrixes, vectors, and scalars are represented by bold uppercase letters X, bold lowercase letters x, and regular letters x, respectively. The primary notations used in this article are listed as follows.

A. Multistage Versus End-to-End Frameworks
This section compares the mainstream paradigms of multistage and end-to-end semisupervised frameworks for object detection, which are shown in Fig. 2. The multistage semisupervised object detection framework consists of the following three stages [37]: 1) In stage I, the teacher detector utilizes the labeled data for supervised training, whose loss is defined as follows: (1) where j belongs to the set of (x, y, w, h, θ). The center coordinates, width, height, and angle of a rotated bounding box are represented by x, y, w, h, and θ, respectively. N l represents the number of proposals from labeled images. t i is a binary value (t i = 1 for ship, t i = 0 for background), t i denote the predicted probability value of ship class. v ij andv ij represent the ground truth and the predicted value of the ith rotated bounding box offsets. The location and classification loss are defined as follows: 2) In stage II, there is no training process. The teacher detector uses the model weights obtained in stage I to infer unlabeled images and generate pseudolabels. Usually, a detection score threshold is manually set here to ensure the pseudolabel's high quality. For multistage semisupervised frameworks, once the pseudolabels are generated, they no longer change. 3) In stage III, the unlabeled images, with pseudolabels, and the labeled images, with grounded truth, work together in training the student model. The overall loss of stage III can be formulated as where L s represents the supervised loss of labeled images, which is the same as stage I given in (1). L u represents the unsupervised loss of the unlabeled images, which is defined as where N u represents the number of proposals from unlabeled images. The superscript p indicates that the variable is a pseudolabel. The aim of most semisupervised frameworks is to design the unsupervised loss L u . However, the multistage semisupervised framework includes multiple training stages, and the training process is redundant and inefficient. End-to-end frameworks have the same overall loss as the stage III of multistage frameworks, but the main difference lies in the joint optimization of student and teacher models [37]. Fig. 2 illustrates that the training process for end-to-end frameworks involves a single stage. In each iteration, the training data batch consists of the following two parts: 1) labeled and 2) unlabeled images half and half. The student model is then trained using both the labeled and unlabeled images. Upon completion of each iteration, the teacher model is updated through the EMA of the student model. This joint optimization strategy has the potential to improve the pseudolabel's quality dynamically, which is significant for SSL.
To our best knowledge, there are no existing works designing SAR ship detectors based on the end-to-end semisupervised framework for SAR ship detection. Due to the scattering interference and speckle noise in SAR images, the credibility of pseudolabels will be reduced. Incorrect pseudolabels will mislead the student model, damaging the ship detection performance. Thus, directly applying the end-to-end semisupervised frameworks designed for optical images cannot provide satisfactory results. We demonstrate the proposed end-to-end semisupervised framework for arbitrary-oriented SAR ship detection. The core is to improve the pseudolabel's quality for the application in SAR, including interference consistency learning and pseudolabel calibration, which is shown in Fig. 3.

B. Interference Consistency Learning
Previous SAR ship semisupervised work did not discuss the impact of various interference in SAR images on pseudolabels. However, in practical applications, these interferences cannot be ignored. They will increase the feature differences between the targets, leading to the model being prone to missing interfered targets. To suppress various interferences introduced in the imaging or quantization process of SAR images, we introduce interference consistency learning. First, we construct the interference consistency constraint by conducting interference simulations on SAR images. Then, we gauge the model's confidence about the pseudolabels by the interference consistency coefficient. We will introduce these two parts one by one.

1) Interference Consistency Constraint:
The end-to-end semisupervised framework of this article adopts the teacherstudent paradigm. We first feed the original unlabeled SAR images into the teacher model to generate pseudolabels. Then feed the same SAR image after simulated interference into the student model. Since the simulated interference does not change the position of the target, the pseudolabels generated by the teacher model can be directly used to provide supervision information for the student model. In this way, the framework can use unlabeled SAR images to construct interference consistency constraints.
For convenience, we mark the teacher model as φ and the student model as ϕ. Given an image X, the teacher model output the results as follows: (6) where τ represents the trainable parameters of the teacher model. D denotes the number of detection results.v d represents the dth predicted rotated bounding box vector. After passing through the filter F (e.g., a fixed confidence threshold),v d will become the pseudolabel v pre d . Then, we implement the simulation interference on the same image to obtain the S(X), and construct a constraint as follows: where δ represents the trainable parameters of the student model. S(·) represents the operation to subject simulated interference on unlabeled SAR images, which are composed of seven different interference simulations as shown in Fig. 4: 1) speckle noise; 2) solarization; 3) sharpness; 4) posterization; 5) equalization; 6) contrast;

7) brightness.
We train the student model under the simulation interference S(X) to promote the student to learn the consistency of SAR image interference. This consistency knowledge will be transmitted to the teacher model as it is updated, thereby improving the quality of pseudolabels in unlabeled SAR images with speckle noise or scattering interference.
2) Interference Consistency Coefficient: In the previous methods, unsupervised loss treats all pseudolabels equally. However, there is a possibility of errors in pseudolabels. We need to pay more attention to reliable pseudolabels. Therefore, we introduce an interference consistency coefficient σ to measure the credibility of pseudolabels under severe interference, which ranges from 0 to 1. Each pseudolabel has a corresponding interference consistency coefficient. The larger the coefficient, the more confident the teacher model is in the current pseudolabel. On the contrary, smaller coefficients indicate a higher probability of the pseudolabel being incorrect.
Deep learning-based detectors allocate multiple proposals for each ground truth box. If the location of a target is uncertain, the model may generate multiple scattered proposals near the ground truth box. Conversely, if the model is certain about the location of a target, the generated proposals will be concentrated around the ground truth box. We aim to gauge the model's confidence about the pseudolabels by the proposals as follows: where σ ∈ [0, 1). N p denotes the number of positive samples assigned to the kth pseudolabel. IoU kj denotes the rotated IoU between the ith positive sample and the kth pseudolabel. IoU is defined as the overlap area divided by the union area of two rotated boxes. Next, σ is used to weight the unsupervised loss of the unlabeled images L u . The larger the σ, the greater the L u , and vice versa. Thus, the σ can reduce the attention of the model to uncertain pseudolabels, and put more attention on pseudolabels with higher reliability.

C. Pseudolabel Calibration Network
Another challenge in SAR ship detection is the presence of numerous small ships overwhelmed by complex surrounding backgrounds in inshore scenes. The complex background can blur targets' boundaries, leading to inaccurate positioning and reducing pseudolabel's quality, hampering the student model's performance. It has been proven that exploiting the context knowledge around small objects is beneficial to locate their bounding box accurately [46]. However, previous SAR ship semisupervised works did not attempt to use contextual information to improve the quality of pseudolabels in inshore scenarios. Motivated by this idea, we have designed a pseudolabel calibration network, which can calibrate the pseudolabels based on the context knowledge around the ship and improve the quality of the pseudolabel.
The structure of the pseudolabel calibration network is shown in Fig. 5. First, we enlarge each proposal with factors {2, 4}, and get two additional proposals. Second, we feed these three proposals into the rotated region of interest (RoI) Align operator [45] to obtain multiple contextual features, which contain information from the target and surrounding background. Then, we concatenate the three features and feed them into a lightweight information fusion module. This module consists of a convolutional layer and three fully connected layers. By calculation, the computational complexity of the lightweight information fusion module is 20.875 GFlops. Finally, we can obtain the pseudolabels after calibration.
We use the labeled SAR images to train the PLC network in a supervised method. The PLC loss L p can be formulated as where IoU i denotes the rotated IoU between the ith positive sample and the ground truth assigned to it. It should be noted that PLC loss only applies to the training of labeled images. Then, we update its weight in the student model to the teacher model for calibrating pseudolabels. PLC can appropriately mitigate the deviation between the pseudo labels and the ground truth and improve the quality of pseudolabels. It is worth mentioning that PLC is only used during training phase and can be removed during testing phase. Therefore, PLC does not increase the computational complexity and inference speed of the testing phase.

D. Loss Function
Finally, the loss of the proposed end-to-end framework can be formulated as where σ k denotes interference consistency coefficient of the kth pseudolabel, which reduces the possibility of student models being misled by incorrect pseudolabels. S(v kj ) are the predicted offsets of the ith rotated bounding box of simulated interference SAR images. Since the proposed simulated interferences are invariant to the box coordinates, the operation S does not need to apply on the v p kj . The proposed end-to-end framework uses the EMA strategy to update the teacher model after each iteration. This joint optimization strategy can dynamically improve pseudolabel's quality.

A. Dataset Description and Implementation Details
Experiments are performed on the HRSID [23], which is a ship detection dataset from high-resolution SAR images with massive ship samples. As can be seen from Fig. 6, it includes both offshore scenes with clean backgrounds and inshore scenes with complex backgrounds. It also has various scattering interferences and speckle noise. In addition, the other four SAR ship datasets are also used as unlabeled in the extended experiments. It contains three OBB datasets [22], [24], [25] and one HBB dataset [44]. It should be noted that we did not use the annotation of these four datasets during the whole experiment. The detailed information of these open-source SAR ship datasets is shown in Table I.
Our algorithm implementation and hyper-parameters settings are based on a unified rotated object detection tool-box (MMRotate) [45]. For all datasets, the image shape for network input is 800 × 800 pixels. Our experiments were conducted on a CentOS 7.3 system with an RTX 3090Ti graphics processing unit (GPU). The variance of speckle noise is set to 1. We referred to the [33] and used the following two experimental settings in this article: 1) partially labeled data and 2) fully labeled data. We will introduce the setting details of them below, which has significant differences.
In the partially labeled data setup, we randomly sample 1%, 2%, 5%, and 10% images from the HRSID training set as labeled training data, and the images that are not selected are considered as the unlabeled data. The overall number of HRSID training set images is 3623, which means that 1% of the training set has only 36 images. Therefore, this experimental setup can verify the fewshot learning ability of the model. To guarantee the experiment's rigor, we randomly choose five different data folds for each data proportion, and the final performance is calculated as the mean of the five folds. The training process involves running each model for 6 k iterations on one GPU with stochastic gradient descent (SGD) as the optimizer training. The learning rate equals to 0.0025 and decreased by a factor of 10 after 4 k and 5.5 k iterations. Each batch comprises of four images, which includes two labeled and two unlabeled samples. Moreover, weight decay and momentum are set at 0.0001 and 0.9, respectively, for all models.
In the fully labeled data setup, the fully HRSID training set is utilized as labeled data, and supplement it with unlabeled data from other SAR ship datasets. The goal is to verify the performance of semisupervised models on large-scale unlabeled samples. It is more consistent with real-world application scenarios and more challenging. Each model is trained for 65 k iterations on one GPU. The learning rate equals to 0.0025 and decreased by a factor of 10 after 43 k and 60 k iterations.

B. Evaluation Metrics
We use average precision (AP) to quantitatively evaluate the detection performance and the latency per image to evaluate inference speed Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. where N pred represents the total number of predicted boxes, N tp is the number of targets correctly detected, N target denotes the actual number of targets, P (R) is precision-recall curve, and AP is mean AP.
To measure the efficiency performance of different models more comprehensively, we also adopt seven evaluation metrics in Table II.

C. Experiments and Analysis 1) Effect of End-to-End:
We compare the end-2-end method with the multistage framework on 1% partially labeled HRISD as shown in Table III. When using RetinaNet OBB as the detector, the performance of the multistage semisupervised framework is only 0.2 points higher than that of the supervision framework, while the performance of the end-to-end framework is 3.0 points higher than that of the supervision framework. Next, we replace the detector with the oriented RCNN (ORCNN) and repeat the above experiment. The E2E method also outperforms the multistage framework and supervised framework. However, the performance of ORCNN's multistage framework is even   worse than the supervised framework. Meanwhile, the train time of the end-to-end semisupervised framework is only half that of the multistage semisupervised framework, significantly improving training efficiency. Compared with SL, the training time of end-to-end semisupervised framework increased slightly. This experiment verifies that the end-to-end scheme is not only superior to the multistage scheme in performance but also more versatile for the different detectors. To make the results of the semisupervised algorithm more competitive, we use the ORCNN as the baseline detector in our following experiments.
2) Ablation Experiment: We conducted ablation experiments on different interference combinations, and the experimental results are shown in Table IV. The experimental results show that the speckle noise can be used with random color transformations for better performance. The quantitative comparison of ICL is shown in Table V. We can see that the AP 50 of ICL is highest in both offshore and inshore scenes. The AP 50 of 1%  VII  COMPARISON WITH STATE-OF-THE-ART METHODS (PARTIALLY LABELED DATA) train set outperforms the baseline by 3.4 points in the inshore scenario. And ICL finally increases the AP 50 of 1% train set by 1.6 points in offshore scenes. The experimental results show that ICL shows more vital ability in complex backgrounds. The ablation experiment of ICL and PLC is shown in Table VI. The experimental results show that PCL can be used with ICL to enhance the performance of our semisupervised framework further.
3) Comparison with Representative Methods: This part compares SAR-Teacher with several state-of-the-art end-to-end semisupervised frameworks for object detection. The quantitative comparison between the proposed framework and other frameworks on the partially labeled data is given in Table VII. The AP of the proposed framework is the highest in 1%, 2%, 5%, and 10%. We can also find that the two-stage detector, Faster RCNN (FRCNN) and ORCNN, has better learning ability than the single-stage detector (RetinaNet) when the train data are less. MeanTeacher performs poorly when the proportion of labeled data is low. SoftTeacher's performance on the SAR dataset is subpar, which highlights the significant gap between SAR and general optical images. Therefore, it is essential to devise a semisupervised framework specifically for SAR images. Moreover, we add the large selective kernel network (LSKNet) [42] detector to validate the effectiveness of the proposed method. Compared to ORCNN, LSKNet has a significant improvement effect when the proportion of labeled data is low, while its improvement effect is not as good as ORCNN when the proportion of labeled data is higher. Therefore, in the fully labeled data experiments, we chose ORCNN as our basic detector.
Besides AP under different proportions of labeled data, we also used six metrics to measure the efficiency performance of different models more comprehensively. Among them, train memory and iteration time belong to the training phase, while the other three metrics belong to the inference phase. According to Table VII, although the proposed semisupervised framework has higher spatial complexity, its time complexity is similar to other semisupervised methods.
Although the semisupervised object detection framework based on the student-teacher architecture has two object detectors during the training phase, only the student detectors need to be retained during the testing phase. Therefore, we used two sets of indicators to separately evaluate the computational complexity and speed of the model during the training and testing phases. Although our proposed framework will increase the model complexity of the training process, it will not affect the model's inference speed and memory usage during the testing phase.
We also compare the qualitative results of the proposed method to those of the supervised baseline under 1% HRSID train set are shown in Fig. 7. By comparing the yellow and green ellipses in the figure, it can be found that SAR-Teacher has detected many targets that are missed in the supervised baseline.

4) Validation on Full Labeled Data:
In our previous experiment, a portion of the HRSID training set was employed as the labeled dataset, while the remaining images acted as unlabeled data. To verify the performance of this semisupervised framework on the full HRSID dataset, we conducted a comparative experiment on the other four unlabeled datasets, which are shown in Table VIII. When SSDD is used as the unlabeled dataset, the semisupervised framework is 1.4 points higher than the fully supervised model. The advantages of the semisupervised framework become more evident as the number of unlabeled samples increases. When three SAR datasets are used as unlabeled datasets, the semisupervised framework is 2.5 points higher than the fully supervised model. This shows that the semisupervised framework proposed in this article has the ability to mine information from massive unlabeled SAR images, which can reduce the time spent by researchers on labeling images.

5) Validate on Data Including Pure Background Images:
The images in the above datasets are obtained after being cut and filtered by researchers. Each picture slice contains at least one target, virtually reducing learning difficulty. However, in the actual application scenario, the obtained images are usually large-scale SAR images, and we need to split them into slices. Because we do not have annotations, we cannot filter out slices without targets. To verify the performance of this semisupervised framework in real application scenarios, we conducted a comparative experiment on whether the unlabeled dataset contains pure background images. We selected the large-scale SAR ship detection dataset, LS-SSDD, as the unlabeled samples for the experiment. We split the original images into 800 * 800 slices  The bold items denote the optimal value in the columns.  Table X. The number of unlabeled samples does not play a decisive role in the results. Although SRSDD has the smallest number of images, it significantly improves the semisupervised model. On the contrary, the RSDD dataset, which has the largest number of images and ships, brings limited improvement to the semisupervised model. We believe that the resolution and image size play a decisive role. Only when the image of the unlabeled dataset is resized to make the ship size close to the labeled dataset, can the performance of the semisupervised framework be maximized. This shows that SAR-Teacher cannot accurately produce pseudolabels for unlabeled datasets with large differences in resolution. Therefore, we recommend that SAR images with the same resolution be used as unlabeled samples in the actual application of the SAR-Teacher. However, if there are no unlabeled samples  X  ANALYZE THE CHARACTERISTICS OF UNLABELED SAMPLES AND THEIR IMPACT ON THE RESULTS   TABLE XI EXPERIMENT SETTINGS OF KNOWLEDGE FORGETTING AND ERROR ACCUMULATION ON HRSID DATASET of the same resolution, we can also improve the model's performance by increasing the number of unlabeled samples. As shown in Table VIII, more unlabeled samples can also improve the performance of semisupervised models.

A. Knowledge Forgetting and Error Accumulation Issues
As shown in Table XI, the experimental setup in this section is slightly different from the experimental section above. To explore the knowledge-forgetting problem of the model, we replaced the testing set with a 1% training set to see if the model had forgotten the initial supervised data. To examine the problem of error accumulation in the model, we replaced the testing set with 99% of the training set, which can show whether the performance of the model on unlabeled data will gradually decrease.
The problem of knowledge forgetting in deep learning models was first discussed in [47]. When a trained model on a task is trained on a new task, previous knowledge may be severely forgotten. To verify whether the semisupervised object detection framework will suffer from sample forgetting, we conduct an experiment as shown in Fig. 8. From the experimental results, both semisupervised frameworks improve detection accuracy compared to supervised models. For the multistage semisupervised framework, the introduction of pseudolabels not only does not cause knowledge forgetting but also improves the detection accuracy of labeled data. It indicates that the semisupervised detection framework does not have the problem of knowledge forgetting.
Error accumulation is another issue that pseudolabels may cause, which was also first discussed in [47]. The main manifestation is that as the training progresses, the detection performance of the model decreases due to being misled by false labels. For multistage detection frameworks, once pseudolabels are generated, they are invariant. According to Fig. 9, it can be seen that both semisupervised frameworks improve detection accuracy compared to supervised models. For the multistage semisupervised framework, the number of erroneous labels does gradually decrease as training progresses. It indicates that the

V. CONCLUSION
In this article, an end-to-end semisupervised framework, SAR-Teacher, is proposed for arbitrary-oriented SAR ship detection. It can significantly reduce the detector's demand for labeled SAR images, which can break through the bottleneck of SL SAR ship detection. Specifically, we designed ICL to construct interference consistency constraints, which can prompt the learning ability of the model for ship detection from SAR images. Moreover, we propose a PLC network that utilizes contextual knowledge around the ship to calibrate incorrect pseudolabels and reduce the negative impact of complex inshore backgrounds. The superiority of the proposed semisupervised framework is verified through experiments conducted on only 1% labeled and fully labeled data.
Semisupervised technology is crucial for the intelligent interpretation of synthetic aperture radar ships. In the future, we will attempt to improve the efficiency of the semisupervised object detection framework and consider different application scenarios. On one hand, not all unlabeled images have a learning necessity, such as duplicate or similar images. Therefore, we can incorporate active learning techniques to select unlabeled images that are more worthy of learning and improve the training efficiency of the semisupervised target detection framework. On the other hand, this article only utilizes rotated box annotations. In fact, we can also utilize other SAR datasets with annotation difficulty lower than that of rotated box detection tasks, such as horizontal box detection tasks and scene classification tasks. The datasets for these tasks are easier to obtain online. In this way, the framework can fully utilize various labeled and unlabeled data, further expanding its application scenarios.