Learning Polar Encodings for Arbitrary-Oriented Ship Detection in SAR Images

Common horizontal bounding box-based methods are not capable of accurately locating slender ship targets with arbitrary orientations in synthetic aperture radar (SAR) images. Therefore, in recent years, methods based on oriented bounding box (OBB) have gradually received attention from researchers. However, most of the recently proposed deep learning-based methods for OBB detection encounter the boundary discontinuity problem in angle or key point regression. In order to alleviate this problem, researchers propose to introduce some manually set parameters or extra network branches for distinguishing the boundary cases, which make training more difficult and lead to performance degradation. In this article, in order to solve the boundary discontinuity problem in OBB regression, we propose to detect SAR ships by learning polar encodings. The encoding scheme uses a group of vectors pointing from the center of the ship target to the boundary points to represent an OBB. The boundary discontinuity problem is avoided by training and inference directly according to the polar encodings. In addition, we propose an intersect over union (IOU)-weighted regression loss, which further guides the training of polar encodings through the IOU metric and improves the detection performance. Comparative experiments on the benchmark Rotating SAR Ship Detection Dataset (RSSDD) demonstrate the effectiveness of our proposed method in terms of enhanced detection performance over state-of-the-art algorithms and other OBB encoding schemes.


I. Introduction
S HIP detection in synthetic aperture radar (SAR) images is an important branch in SAR image interpretation.It can be widely applied in many fields, such as harbor monitoring, fishery monitoring, maritime traffic monitoring, intelligence acquisition and so on [1]- [5].Therefore, it has attracted much attention in recent years.Traditional SAR ship detection algorithms usually include multi-steps as follows: (1) sealand segmentation; (2) image preprocessing; (3) candidate region extraction and (4) false alarm rejection.Based on the pipeline, researchers have proposed a variety of methods, which can be mainly classified as threshold-based [6], [7], saliency-based [8], [9], hand-crafted feature-based [10], [11] and statistical modeling-based methods [12], [13].With the increase of SAR data amount and resolution, these methods face difficulties in meeting the practical demands in terms of accuracy, robustness and speed [14]- [18].It is mainly due to their complex detection flows and high dependence on prior knowledge such as specific statistical distribution modeling and manually-designed features [19]- [23].Hence, it is urgent to develop smarter and more automated SAR ship detection methods.
Recently, with the development of deep learning theories and the substantial improvement of the hardware, deep convolutional neural network (DCNN)-based algorithms have achieved great success in computer vision fields, such as target detection, recognition, segmentation, tracking and so on [24]- [31].In the field of target detection, Ren et al. [32] extracted candidate target regions through region proposal network (RPN) in Faster-RCNN and conducted end-to-end training to achieve high detection accuracy and speed.To achieve higher efficiency, researchers put forward to use DCNN to directly regress the target locations without extracting candidate regions, such as Single-Shot multi-box Detector (SSD) [33], RetinaNet [34], YOLO [35], [36], etc.In addition, methods based on key point detection have received much concern lately [37]- [40].For instance, Law et al. [37] proposed to locate targets by regressing the upper left and lower right corners of the bounding boxes; CenterNet by Xing et al. [38] detected targets by locating the center points and the length and width of the bounding boxes.In the task of ship detection in SAR images, the DCNN-based algorithms also achieve great performance.For example, Deng et al. [21] trained DCNN-based ship detector from scratch on SAR dataset by introducing dense blocks and new training losses; Liu et al. [41] combined pyramid features extracted by DCNN into sealand segmentation and the ship detection process; Gao et al. [42] improved CenterNet by attention mechanism and feature reuse strategy to achieved good performance in SAR ship detection.
The above methods all adopt the horizontal bounding box (HBB) for target localization.However, HBB-based methods suffer certain difficulties.In SAR images, ship targets are slender, arbitrary-oriented and sometimes densely distributed.Anchor-based methods locate targets by predicting the errors between the preset anchors and the actual bounding boxes of the targets.For example, Xia et al. [43] proposed FRCNN-OBB algorithm, which utilized Faster-RCNN-based framework to regress the errors between the OBBs and horizontal anchors.Ding et al. [44] trained DCNN to transform horizontal region of interest(H-ROI) into rotated region of interest(R-ROI), so as to improve the feature extraction ability of the network for arbitrary-oriented targets.For angle predictionbased methods, the OBB of the target is identified by these parameters: the location of the target, the length, width and rotation angle of the bounding box.For instance, Hu et al. [45] put forward to use two-dimensional periodic vectors to represent the angle of OBB, and a length-independent intersection over union metric is proposed to guide network training.Yang et al. [46] predicted the rotation angle of OBB by classification rather than regression and designed a periodic loss function, so as to alleviate the boundary discontinuity for OBB regression.Key point regression-based methods represent OBB as key points.For example, Yi et al. [47] denoted target OBBs with four vectors distributed in different quadrants of Cartesian coordinates.Xu et al. [48] first predicted the horizontal enclosing rectangle of the target OBB, then regressed the distances between the vertices of the enclosing rectangle and the vertices of the OBB to position the OBB.Zhao et al. [49] regressed the polar coordinates of the four vertices of the target OBB to achieve OBB localization.Fu et al. [50] detected the OBB of the target by utilizing fully convolutional networks to locate a group of points distributed evenly inside the OBB.
In the field of ship detection in SAR images, OBB-based methods have also been widely studied.For example, Wang et al. [51] proposed an improved SSD network, where the angle information was utilized to further encode the orientation of the ship targets.Chen et al. [52] built a feature alignment module to extract the features of the ship targets more accurately based on oriented anchors.Chen et al. [53] designed multilayer anchors and rotation non-maximum suppression postprocessing to improve the detection performance for oriented ship targets.Pan et al. [54] used rotating region proposal network (RRPN) to extract candidate target regions, and then multi-layer cascade network was employed to fine tune the OBB detection results.An et al. [55] proposed multi-layer anchor settings and a new encoding scheme to compute the errors between anchors and the predicted OBBs, so as to alleviate the boundary discontinuity problem in OBB prediction.
The fore-mentioned OBB-based methods encounter the boundary discontinuity problem in varying degrees.The causes of the problem can be attributed to two sides, the periodicity of angle (POA) and the exchangeability of edge (EOE) [45], [48], [56].The boundary discontinuity problem leads to mismatching between annotations and predictions during the training stage, causing performance degradation.To deal with the problem, researchers propose different approaches.For example, Yi et al. [47] and Xu et al. [48] first set a fixed threshold manually for distinguishing HBB from OBB, and added a network branch to classify the target bounding box as HBB or OBB.Different regression rules were adopted for different kinds of target bounding boxes.However, this kind of solution leads to more complex network structures, difficulty on tuning the parameters and decrease of the network convergence.Hu et al. [45] used periodic loss function for angle regression of OBB to reduce the negative influence of the POA on network training.However, this approach still needs to manually set parameters to define when the boundary discontinuity problem occurs.Yang et al. [46] transformed the task of angle regression into angle classification, so as to avoid the boundary discontinuity problem.However, angle quantization brings unacceptable computation burden on the classification task, leading to poor real-time performance.
To provide a direct and effective solution for the boundary discontinuity problem, in this paper, we propose to detect ship targets in SAR images based on polar encodings.Through the polar encoding and decoding process, the boundary discontinuity problem can be naturally addressed.To be specific, we encode the OBB of the ship target by sampling a group of ordered boundary points on the OBB.In this way, the ground truth is in one to one correspondence with the prediction at each fixed angle, which prevents ambiguity in training.Furthermore, the function of the sampling distance to the sampling angle is periodic, which guarantees the continuity in boundary cases.In addition, to further guide the training of polar vectors and improve the detection performance, we propose to use Intersect over Union(IOU) metric to weight the regression loss.The experimental results on the Rotating SAR Ship Detection Dataset (RSSDD) are given to verify the effectiveness of the proposed polar encoding scheme and the IOUweighted regression loss function.The comparison results with other OBB-based detection methods demonstrate that our method outperforms other comparison methods, achieving better detection results.Fig. 2. The overall architecture of our method.The network structure can be divided into three parts: the feature extraction backbone, the feature refinement network and the OBB detection branches.They output the center heatmap P, the center offset map O and the encoding map E, respectively.In the training stage, the output of the detection branches is combined with the center information and the encoded OBB parameters for calculating the multi-part loss function.In the inference stage, the polar decoding process is proposed to obtain the detection results, and NMS algorithm is used for removing the duplicate targets.
The rest of the paper is organized as follows: Section II describes the boundary discontinuity problem and our proposed method as solution in detail.The experimental results on RSSDD dataset are given in Section III.Section IV presents discussions.Section V concludes the paper.

II. Proposed Method
The overall architecture of our method is illustrated in Fig. 2. Firstly, the input SAR image is fed as input of the ResNet-101 feature extraction backbone [57], through which features of five different scales { 1 ,  2 ,  3 ,  4 ,  5 } are obtained.As the shallowest feature  1 contains little semantic information, { 2 ,  3 ,  4 ,  5 } are chosen to be combined through feature fusion module, and the resolution of the final output feature    is 1/4 as the input image.   is then processed by three network branches, from which we can obtain the center heatmap , the center offset map  and the encoding map , respectively.In the training stage, the multi-part loss function is calculated according to the center information of the ship targets and the polar encodings.The losses are combined to train the branches jointly.In the inference stage, the output of the branches is decoded through the polar decoding process.And the non-maximum suppression algorithm(NMS) is adopted to remove the duplicate detections and obtain the final detection results.In the polar encoding process, for each ship target, we sequentially sample the distances between the center point of the ship target and the boundary of the OBB every / in the range of [0, ).The sampled N values are combined to form an encoding vector.Due to the central symmetry of the OBB, the encoding vector can represent the shape of the whole OBB.In the polar decoding process, the center points of the ship targets are first extracted from .Then the downsampling quantization errors are compensated in terms of the predictions from .The OBBs of the ship targets are finally restored through the processes of extracting the polar vectors from , converting the polar vectors into the boundary point sets, and finding the minimum bounding boxes of the point sets.
In this section, we will first introduce the boundary discontinuity problem from OBB detection; then we will describe the network architecture and the specific process of polar encoding and decoding in detail; finally, we will introduce the loss functions for network training.

A. The Boundary Discontinuity Problem
For OBB-based methods, problems occur in the boundary cases where the predictions of the OBB parameters will change discontinuously.In particular, the boundary discontinuity problem can be attributed to two reasons: the periodicity of angle (POA) and the exchangeability of edge (EOE).Due to the POA, the angle parameter suffers discontinuity.For instance, the lower and upper bounds of the angle parameter denote basically the same orientation but their values differ greatly.The EOE refers to the problem that the order of the lengths or key points of the OBB will suddenly change in the boundary cases, leading to discontinuity.The discontinuity caused by POA and EOE will lead to a high loss value even if the OBBs from the prediction and the ground truth share high overlap, which is prone to cause the convergence problem.
For angle prediction-based methods, there are mainly two kinds of representations for OBB: 90°-based representation and 180°-based representation.In both cases, the OBB is determined by the center point, length, width and rotation angle (, , ℎ, ).For 90°-based representation, the rotation angle is defined as the angle from the x-axis counterclockwise to the first coincident edge of the OBB, the range of which is [0, 90 • ).The length of the first edge that coincides is denoted as .A typical boundary case is shown in Fig. 3(a).The predicted edges and angles are mismatch with the ground truth due to the POA and EOE.For the 180°-based representation, the rotation angle  is determined according to the angle from the x-axis to the long side of the OBB.The range of  is (−90 • , 90 • ].As shown in Fig. 3(b), in the boundary case, the edges of the predicted OBB and the ground truth correctly correspond to each other, but the angle suffers discontinuity because of the POA.In addition, the performance of the angle prediction-based methods is sensitive to angle prediction errors [49], [56].As shown in Fig. 4, with large aspect ratios of the OBB, the small angle prediction errors will cause a rapid drop For key point regression-based methods, similar problems exist.Yi et al. [47] expressed the target OBB as a group of midpoints from four edges of the OBB.The four midpoints are distributed in the four quadrants of the Cartesian Coordinates, respectively.In this way, the one to one correspondence between the prediction and the ground truth is established.The boundary case of this method is shown in Fig. 3(c).When the prediction and the ground truth share high overlap, the distance errors between the actual point set (p 1 , p 2 , p 3 , p 4 ) and the predicted point set (p 1 , p 2 , p 3 , p 4 ) is large due to the EOE problem.Xu et al. [48] determined the shape and orientation of OBB by regressing four distances between the four vertices of OBB and that of the HBB.As shown in Fig. 3(d), in the boundary case, the errors between the predicted distances ( 1 ,  2 ,  3 ,  4 ) and the ground truth ( 1 ,  2 ,  3 ,  4 ) are too large to indicate the actual overlap degree.In this paper, in order to avoid the fore-mentioned boundary discontinuity problems, we propose an encoding and decoding scheme for OBB-based ship detection in SAR images.

B. The DCNN architecture of our method
The overall DCNN structure of our method is illustrated in Fig. 2. It can be divided into three parts: the feature extraction backbone, the feature refinement network, and three branch networks for OBB detection.The detailed structure is described as follows: 1) Feature Extraction Backbone: We adopt ResNet-101 [57] as the feature extraction backbone.It consists of five convolutional stages.With the stage going deeper, the resolution of the features gradually decreases, the receptive field and the semantic information increase.Given the input SAR image  ∈ R  × ×3 , the feature extraction backbone generates five scales of features { 1 ,  2 ,  3 ,  4 ,  5 }.
2) Feature Refinement Network: The deep features contain richer semantic information and larger receptive fields, which are suitable for detecting large ship targets.And the shallow features are of high resolution, which are helpful for detecting small targets.Therefore, different scales of features, { 2 ,  3 ,  4 ,  5 } are fused in the upsamling process by the feature fusion module.As shown in Fig. 2, for two input features of different scales, the feature fusion module first upsamples the lower resolution feature and performs a 3×3 convolution.Then the upsampled feature are concatenate channel-wise with the high-resolution feature.The output feature is obtained by employing a 1×1 convolution for channel dimension reduction.The process of the feature fusion module can be represented as follows: where   denotes the low-resolution input feature,  ℎ is the high-resolution input feature, ⊗ denotes the feature fusion operation by the feature fusion module,    2× stands for the upsampling operation and represents the channel concatenation operation.{ 2 ,  3 ,  4 ,  5 } are fused through feature fusion module successively as follows: where    is the high-resolution feature output by the feature refinement network, whose size is 1/4 of the input SAR image.
3) OBB detection branches: The feature output by the feature refinement network is then fed as input of three network branches, namely the center prediction branch, the offset regression branch and the encoding regression branch.The center prediction branch and the offset regression branch are both composed of a 3×3 convolution and a 1×1 convolution, which output the center heatmap  ∈ R H × W ×1 and the offset map  ∈ R H × W ×2 , respectively.The process of the center prediction branch and the offset regression branch can be expressed as: The encoding regression branch consists of two cascaded 7×7 convolutions and outputs the encoding map  ∈ R H × W × , where  is the number of encoding points.The process of the encoding regression branch can be represented by the following equation: The detection results can be obtained by these three outputs through polar decoding process, which will be described in detail below.

C. Polar Encoding
To avoid the boundary discontinuity problem, we propose to encode the OBB of ship target into a group of sequential values by using polar coordinates.The encoding diagram is shown in Fig. 5.The boundary points of the OBB are sampled at fixed angles.The distances between the boundary points and the ship center are collected as the OBB parameters, which can be predicted using DCNN.The steps of the polar encoding are listed in Algorithm 1.For common methods, the OBB of the ship target is annotated by its four corners P = {p  |p  = (  ,   ),  = 1, 2, 3, 4}.The center point c can be calculated by c = (  ,   ) = 4 i=1 p  /4.Then the corner vectors pointing from the center to the corners can be obtained by  = {v  |v  = (  −  ,   −  ),  = 1, 2, 3, 4}.For each vector v  in , calculate the vector angle   : where u(•) denotes the Heaviside function, Because of the central symmetry of the OBB, the encoded parameters e = (d 1 ,d 2 , • • • ,d  ) can actually represent 2 boundary points distributed around the OBB.In addition, due to the central symmetry, the process of obtaining the OBB parameters is equivalent to sampling from a periodic function with period  in the interval [0, ) .And the rotation of the OBB is equivalent to the translation of the periodic function.The periodicity ensures the natural continuity in the boundary cases, which is helpful for improving the performance of the network.The detailed discussion can be referred to Section IV-B of the paper.

D. Polar Decoding
The overall diagram of the polar decoding process is shown in Fig. 6, and the processing steps are given in Algorithm 2. The polar decoding process decodes the information from the center heatmap  ∈ R H × W ×1 , the center offset map  ∈ R H × W ×2 and the encoding map  ∈ R H × W × into the detection results.To be specific, firstly a 3×3 Maxpooling layer is employed to process the center heatmap  and output  .The ship centers  : {(  ,   ) |  = 1, 2, . . . } are collected by finding points such that   33 () = , where  denotes the number of the detected centers.For each center point (  ,   ), the predicted downsampling quantization errors (Δ  , Δ  ) can be obtained from the corresponding location of the offset map .Hence, the coordinate of the   ℎ refined center point can be represented as (  + Δ  ,   + Δ  ).
Next, for each detected ship center, the  channels of values from the corresponding location in  are extracted as the predicted OBB parameters.Let e = ( 1 ,  2 , • • • ,   ) denote the predicted OBB parameters of the   ℎ ship target .Since the parameters represent the distances between the boundary points and the center point of the OBB at fixed angles, the boundary point set  of the OBB can be restored as: where = / denotes the sampling interval angle.
The next step is to calculate the minimum bounding box(MBB) of .First, we calculate the convex hull of , of , we obtain the unit vectors of its parallel and orthogonal directions by:    denote the unit vectors parallel and orthogonal to the edge, respectively.Then the maximum and minimum projections of the vertexes of  in the parallel and orthogonal directions are calculated by: where max  , min  stand for the maximum and minimum pro-jection parallel to ⇀   , and max  , min  denote the maximum and minimum projection orthogonal to ⇀   .By calculating the difference between the maximum projection and the minimum projection in two directions, we can estimate the side lengths of the bounding box in the   ℎ direction:   = max  − min  ,   = max  − min  (10) where   and   are the lengths of the two sides of the bounding box, respectively;   =   •   represents the area of the   ℎ bounding box.For all the edges of , the above calculations are carried out to find the smallest bounding box, which is taken as the estimated MBB of the   ℎ ship target.
The process is as follows: where   is the calculated bounding box in the   ℎ direction.
We compute MBBs of all ship targets as the detection results.

E. Loss Function
The training loss of our method is composed of three parts, corresponding to three OBB detection branches.
For the center prediction branch, we adopt the same training approach as [38].Firstly, for each ship center in the SAR image, a two-dimensional Gaussian mask is generated at the corresponding position of the ground truth map.The standard deviation  of the Gaussian distribution is set to 1/3 of the ship width.When two Gaussian masks overlap, the larger value is taken for every overlapped position.The loss for the center prediction branch can be calculated by: where  ∈ [0, 1] H × W ×1 denotes the output center heatmap;  ∈ [0, 1] H × W ×1 is the ground truth map; P = 1 −    and Ȳ = 1 −    ;  and  are hyper parameters to control the attention for difficult samples, which are set empirically to 2 and 4 respectively as in [38];  is the number of ship targets in the SAR image.
Because the resolution of the output is 1/4 of the input SAR image.The discrete quantization errors are produced in the downsampling process.The center offset regression branch is used for predicting the errors.The center offset regression branch is supervised with the following loss: where  represents the downsampling rate, which is 4 in this paper;   =(   ,    ) ∈ R 2 denote the coordinates of the   ℎ ship center after downsampling quantization; c =   /4 is the downsampled coordinates of the   ℎ ship target; ⌢  c is the predicted discrete quantization error from the offset map  at corresponding   ℎ center location.
For the encoding regression branch, we use smooth-L1 loss for supervised training.But the encoded boundary points contribute to the detection IOU differently.In order to further guide the training of the encoded parameters directly by the IOU metric, we propose to use the IOU metric to weight the smooth-L1 loss.The final loss for the encoding regression branch is calculated as follows: where Ê  and   stand for the predicted and the actual encoded parameters of the   ℎ ship target; IOU represents the IOU calculation operation for the predicted and the actual OBB of the   ℎ ship target;   ( ) is the smooth L1 regression loss;  is the weight parameter, which is set to 1 in our experiment.By dividing the magnitude of the smooth-l1 loss, the IOU metric provides the gradient magnitude and the smooth-l1 loss determines the direction of the gradient.
Finally, the three parts of the loss are combined to form the overall loss function:

III. Experimental Results
In this section, we report the experiments carried out on the RSSDD dataset in detail to evaluate the effectiveness of our proposed method.Firstly, the information of the dataset and the experimental settings used in this paper are described.Then, the evaluation metrics are illustrated.Next, qualitative and quantitative comparison results between the proposed method and other OBB encoding schemes and other detection methods are given to verify the effectiveness of the proposed method.
RSSDD is a publicly available OBB-based SAR ship detection dataset [58], composed of SAR images of multi resolution, multi polarization and multi scene.The specific information of RSSDD is given in Table I.RSSDD contains 1160 SAR images and 2456 ship targets, all of which are labeled with four corners of the OBB.The sizes of the SAR images are In the experiments, the dataset is divided into the training set and the test set with the ratio of 8:2.To be specific, the training set contains 928 SAR images and the test set contains 232 SAR images.The dataset contains a variety of scenes.As shown in Fig. 7, Fig. 7(a)-(c) give the examples of images with inshore scenes, Fig. 7(d)-(f) show several offshore scene images.As can be seen, Fig. 7(a),(d) contain large ship targets, while other images exhibit small scale ship targets.Compared with the offshore scenes, the inshore scenes contain more land clutters, which make the detection more difficult.Besides, the number of inshore scene images is less.In order to better evaluate the performance of the detector in different scenes, we further split the test set into two kinds of scenes: the inshore scenes and the offshore scenes, which contain 39 and 193 SAR images, respectively.
The input SAR images are resized to 608 × 608 in both the training and the inference stage, and the output feature    is with the resolution of 152×152.In the training stage, we use the ImageNet pre-trained weights to initialize the parameters of the feature extraction backbone.The hyper parameter N in the polar encoding process is set to 8. Adaptive moment estimation (Adam) optimizer [59] is adopted as the training optimizer, the weight decay of which is 0.0005.The initial learning rate is set to 1.25 × 10 -4 .The learning rate is then adjusted according to the exponential decay rule.The minibatch size used in the stochastic gradient descent algorithm is 8.The model is trained in a total of 150 epochs.The algorithm is implemented with the deep learning framework Pytorch [60].The comparison experiments are conducted based on the framework proposed by [61].All the experiments are carried out on the platform with Ubuntu18.04system, 32G memory and Tesla P100 GPU .

A. Evaluation Metrics
Three widely adopted metrics, the precision-recall curve(PR curve), AP and F1 are used to evaluate the performance of the models.For the PR curve, the recall rate (  ) is taken as the x-axis and the precision rate (  ) is taken as the y-axis, which can be calculated as follows: where    is the number of the correctly detected targets,   denotes the total number of the detected targets, and   represents the actual number of the targets.The AP metric quantitatively evaluates the comprehensive detection performance of the detector by calculating the area under the PR curve as follows: AP measures the overall detection performance of the detector under different thresholds.And the F1 metric indicates the comprehensive performance of the detector under the singlepoint threshold.As F1 varies with thresholds, we take the maximum F1 under all thresholds for comparison.The F1 metric is defined as:

B. Comparison with different OBB encoding schemes
In order to evaluate the effectiveness of the polar encoding scheme, we implement three different OBB encoding schemes based on the same center-point-based detection framework [38], including the angle-based, the point-based and the proposed polar-based encoding scheme.These encoding schemes all adopt the center prediction branch and the offset regression branch to locate the center point of the target.But they represent the OBB of the ship targets in different ways.Among them, the angle-based scheme is the 90°-based representation introduced in Section II-A.It represents the position and shape of the OBB by the center point, width, height and the rotation angle of the OBB.The detailed introduction of pointbased encoding scheme can be referred to [47].This method represents OBB by the center point and four vectors pointing from the center point to the midpoint of four edges of the OBB.In order to reduce the boundary discontinuity problem, the method also distinguishes HBB from OBB by training a classification branch.Different regression rules are applied in the HBB and OBB prediction process.In order to quantitatively measure the detection performance of the three encoding schemes, the detection metrics of the three encoding schemes are listed in Table II.It can be seen that the F1 and AP metric of our method are higher than those of the other two methods in both scenes.In addition, the PR curves of the three encoding schemes are shown in Fig. 8.We can see that the PR curve of our method lies outer than those of the other two methods, indicating that our method has better detection performance.Because of the boundary discontinuity problem and the IOU sensitivity problem in the angle regression, the detection performance of the angle-based method is worse than the other two methods in both scenes.To overcome the boundary discontinuity problem, an extra classification branch is required for the point-based method to distinguishing HBB from OBB.It leads to extra training objectives and the decline of the detection performance.For our method, the boundary discontinuity problem is addressed by the specially designed polar encoding and decoding process.
The training objectives are more direct and more concentrated, thus the overall detection performance is improved.
Fig. 9 gives the detection results of different encoding schemes in the boundary cases.Fig. 9(a) gives the ground truth and Fig. 9(b)-(d) show the detection results of our method, the point-based method and the angle-based method.It can be seen from Fig. 9(b) that the proposed method can accurately locate the ship targets in the horizontal direction, which benefits from the boundary continuity of our polar encoding scheme.As can be seen from Fig. 9(c), the point-based encoding scheme fails to detect the ship target in the first SAR image.While in the second image, the orientation of the ship target is mispredicted.For the third row, the ship target and the land clutter are both located by HBB.It is because that the point-based encoding scheme has to introduce the extra network branch and loss to distinguish HBB from OBB, which increases the difficulty in network training.The results of the angle-based method shown in Fig. 9(d) indicate missed detections for inshore scenes.And the result in the offshore scene is inaccurate.This is due to the boundary discontinuity and the IOU sensitivity in the angle regression.

C. Comparison with other OBB-based ship detection methods
In this section, we compare our method with several stateof-the-art OBB-based ship detectors, including Box Boundary Aware Vectors (BBAVectors) [47], Region of Interest Transformer (ROITransformer) [44], OBB-based Faster-RCNN (FRCNN-OBB) [43] and OBB-based RetinaNet (RetinaNet-OBB).BBAVectors is an anchor-free detection method, which combines CenterNet [38] with point-based encoding scheme.ROITransformer is an anchor-based method, which transforms the horizontal ROI into rotating ROI through training, so as to improve the feature extraction ability for arbitrary-oriented targets.FRCNN-OBB is a two-stage detection method based on Faster-RCNN.It first extracts features of the candidate target regions, which are then used to detect targets by predicting the errors between anchors and the OBBs.RetinaNet-OBB is a one-stage method based on RetinaNet, which directly regresses the errors between anchors and OBBs without extracting candidate regions.
Table III shows the quantitative comparison between our proposed method and other methods on the RSSDD dataset.It can be seen from the table that, without using IOU-weighted regression loss, the F1 and AP metrics of our method are higher than other methods in the inshore scenes, but slightly lower than ROITransformer in the offshore scenes.After using the IOU-weighted regression loss, the F1 and AP of our method are better than other methods in both inshore and offshore scenes.Among other methods, BBAVectors performs relatively better in the inshore scenes, whose F1 metric reaches 0.7909.But there still exists a gap of more than 5% from our method, and the AP of BBAVecotrs is 2.5% lower than that of our method.ROITransformer achieves slightly worse performance than BBAVectors in the inshore scenes.And the detection performance of FRCNN-OBB and RetinaNet-OBB is poor compared to other methods in inshore scenes.The above results suggest that our method can effectively avoid the boundary discontinuity problem and improve the detection To discuss the detection efficiency of different detection methods, the average detection time per image on the test set is also given in Table III.It can be seen from the table that the detection time of different methods is relatively close.Our method is slightly slower than RetinaNet-OBB and faster than the other four methods.This is because (1) our method adopts an anchor-free framework, which avoids complex calculation of IOU between anchors and the target bounding boxes; (2) our method does not need to add extra network branches to deal with the boundary discontinuity problem; (3) although the calculation of the minimum bounding box is carried out on CPU, which cannot take advantage of the parallel computation capacity of GPU, we can first filter out the low confidence targets before calculating the minimum bounding box, so it will not become the calculation bottleneck.
Fig. 11 shows the comparison of PR curves of different methods in different scenes.It can be observed from Fig. 11(a) that the PR curve of our method lies outer than other methods regardless of whether the IOU-weighted loss function is used or not, which proves the effectiveness of our method for detecting inshore ships.The PR curve shows improvement after using the IOU-weighted regression loss, which indicates that the IOU-weighted loss can further guide training and improve the detection performance.For the offshore scenes, the PR curves of FRCNN-OBB and RetinaNet-OBB from Fig. 11(b) obviously lie lower than others, showing their poor detection performance.Among other methods, our method lies outer than other methods, demonstrating the effectiveness of our method.
In order to visually compare the proposed method with other methods, detection results of different methods from different scenes are given in Fig. 10.Fig. 10(a) shows the ground truth, Fig. 10(b)-(f) give the detection results of our method, BBAVectors, ROITransformer, FRCNN-OBB and RetinaNet-OBB, respectively.From Fig. 10(b), we can see that the proposed method detects the ship most accurately compared with other methods in the inshore scenes, with fewer false alarms and missed detections.To be specific, there is a false alarm in the fourth row and a missed detection in the fifth row.In the last SAR image, there are relatively less false alarms and missed detections, showing more accurate detections.It can be seen from Fig. 10(c) that BBAVectors has more false alarms from the second row to the fourth row than our method.In the fifth image, there exists an inaccurate ship prediction.And in the offshore scene, there are more missed detections.In Fig. 10(d), for ROITransformer, there are false alarms in the first and second images, two missed detections in the fifth image, and some missed detections in the offshore scene.The detection results of FRCNN-OBB in Fig. 10(e) shows a lot of false alarms in all inshore scenes.In the offshore scene, the land clutter is mistakenly detected as the ship target, and some missed detections occur.For the detection results of RetinaNet-OBB in Fig. 10(e), many false alarms and inaccurate detections occur in the inshore scenes.In the offshore scene, there appear many false alarms, showing unsatisfactory detection performance.To summarize, our method achieves more accurate detection results than other methods in both inshore and offshore scenes, which verified the effectiveness of our method.Fig. 12 shows several detection results of our proposed method in the inshore and offshore scenes, where the red points denote the ship centers, the yellow points are decoded boundary points, and the green boxes represent the detection results by finding the minimum bounding boxes of the boundary points.It can be observed from Fig. 12 that: (1) the proposed polar encoding method is capable of accurately locating different scales of ship targets in different scenes, indicating the effectiveness of our method; (2) for ship targets with large aspect ratios, it is important to the accurately regress several key points to determine the length of the long side of the OBB.We propose to use the IOU-weighted regression loss to guide the training the polar encodings.In this way, the contribution of these key points on the loss function is increased and the detection accuracy is improved; (3) the number of the boundary points N, which is a hyper parameter, is set to 8 in our experiments.As can be shown, eight boundary points can well represent the OBB of the ship targets in the dataset.But in the future, if our method is applied for detecting targets with larger aspect ratios from other datasets, a larger  is needed for accurate OBB representation.

IV. Discussion
A. The influence of the hyper parameter N The hyper parameter  in the polar encoding process determines the angle sampling rate, which is important for OBB representation.If the sampling rate is too low, the information contained in the sampling points may not adequately represent the shape of the OBB.If the sampling rate is too high, the calculation efficiency will be reduced and the difficulty of training will be increased.Therefore, we test the detection performance of our method under different  values, and the results are shown in Fig. 13.When  = 4, the detection performance suffers great drop compared to other  values.It is because that the sampling rate is too low to fully represent the shape information of the OBB and guide network training.Among other  values, the detection performance for different scenes is generally robust.For the inshore scenes, the best detection performance is achieved when  = 8.For the offshore scenes, the AP metric is at the highest when  = 8 and the F1 achieves the best when  = 12.According to the above results,  = 8 is chosen in our experiments for better detection performance and computational efficiency.

B. The continuity of our method in the boundary cases
In order to exhibit the continuity of our method in the boundary cases, we can assume the case as shown in Fig. 14(a).Given an OBB  0 of the ship target, the height and width of which are ℎ and , respectively. 0 rotates clockwise about its center with angle , the resulting rotated rectangle Taking the angle error  as the independent variable and the sum of the absolute differences  between the sampling points as the dependent variable, the curve of the function () is drawn in Fig. 15.Since () indicates the value of the regression loss, we can observe how the loss changes with different angle errors.Fig. 15(a) shows the curves of () when the aspect ratio of the OBB is 2 and the hyper parameter  is 8, 32.We normalize the values of () into [0, 1] for clarity.We can find from Fig. 15(a) that: (1) Within the range of [0, ), the loss value first increases from 0 and then decreases to 0. This is ideal because in the actual situations, the overlap between OBBs is the least when the angle error is /2, and the overlap between OBBs becomes the largest when the angle error is close to . (2) When N is small, the curve is relatively rough.And it becomes smoother when  is in a larger value.But in general, they indicate the same tendency.Fig. 15(b) shows the curves of () for OBB with the aspect ratio of 1.It can be seen that the loss value reaches 0 when the angle error is 0, /2 and , respectively.And the period of () is /2, which is in accordance with the fact that the square bounding box coincides with itself every /2 rotation angle.To sum up, our method can produce corresponding periodic loss functions for different aspect ratios of ship OBBs, so as to overcome the boundary discontinuity problem caused by the periodicity of angle.

V. Conclusion
In this paper, we propose an DCNN-based detector using polar encoding and center point detection for arbitrary-oriented ship detection in SAR images.In order to overcome the boundary discontinuity problem caused by the periodicity of angle and the exchangeability of edges, we design the specific encoding and decoding process.In the polar encoding process, the OBB of the ship target is encoded with the help of the polar coordinates of the boundary points.The encoded parameters are trained and regressed end to end.And the polar decoding process is used for restoring the detection results from the encoded parameters.In order to further improve the training of the encoded parameters, we propose an IOU-weighted regression loss, which uses IOU metric to guide network training.Experiments on RSSDD dataset demonstrate that the proposed method can deal with the boundary discontinuity problem better than other encoding schemes, and the IOU-weighted loss can further improve the detection performance.The experimental results also show that our method outperforms other state-of-the-art OBB-based detectors, verifying the effectiveness of our method.

Fig. 3 .
Fig. 3.The boundary cases for different OBB-based methods, the blue rectangle denotes the ground truth, the red rectangle represents the prediction.(a) the boundary case for 90°-based representation; (b) the boundary case for 180°-based representation; (c) the boundary case for [47];(d) the boundary case for [48].

Fig. 4 .
Fig. 4. The IOU sensitivity of the angle regression.(a) The relationship between the angle error and the detection IOU for OBBs with different aspect ratios; (b) A local view of (a).

Fig. 5 .
Fig. 5.The diagram of the polar encoding process.The polar vectors  are vectors pointing from the center to the boundary of the OBB.The lengths of the vectors are encoded as the OBB parameters .

Fig. 6 .
Fig. 6.The diagram of the polar decoding process, where the outputs of the detection branches are combined to produce the detection results.The center heatmap P and the center offset map O are used for obtaining the center locations of the ship targets.The detection results can be achieved through boundary point extraction and finding the minimum bounding boxes(MBB) from the predicted encoding map E.

Fig. 8 .
Fig. 8. PR curves of different OBB encoding schemes.(a) PR curves for the inshore scenes; (b) PR curves for the offshore scenes.

Fig. 9 .
Fig. 9.The detection results of different encoding schemes in the boundary cases, the green rectangles denote the detected ship targets.(a) Ground truth; (b) our method; (c) the point-based method; (d) the angle-based method.

Fig. 12 .Fig. 13 .
Fig. 12.The detection results of our method using polar decoding.The red points represent the detected center points of the ship targets; the yellow points denote the boundary points decoded from the polar encodings; and the green rectangles are the minimum bounding boxes of the boundary points, i.e. final detection results.

TABLE I The
detailed information of RSSDD dataset from one another, from the smallest 217 × 214 to the largest 526 × 646.The average size of the images is 481 × 331. different

TABLE III The
detection results of different methods.The bold items denote the optimal values in the columns, the underlined items represent the suboptimal values in the columns.OBB and RetinaNet-OBB.But the gap between the F1 and AP of our method and that of BBAVectors and ROITransformer is modest.The reason why the gap is not obvious is that the offshore scenes are generally simpler and contain less clutters.As a result, it is easier for the detectors to obtain comparable performance in the offshore scenes, which narrows the gaps between different methods.In summary, we can see from the results that the proposed method can achieve better detection results compared with other methods because it avoids the boundary discontinuity problem and uses IOUweighted loss to further guide the network training.