Remote Sensing Image Object Detection Based on Angle Classification

Arbitrarily-oriented object detection is a challenging task. Since the object orientation in remote sensing images is arbitrary, using horizontal bounding boxes will lead to low detection accuracy. Existing regression-based rotation detectors can lead to the problem of boundary discontinuity. In this paper, we propose a remote sensing image object detection method based on angle classification that uses rotation detection bounding boxes with angle information to detect objects. Specifically, we incorporate the neural architecture search framework with feature pyramid network (NAS-FPN) module in a dense detector (RetinaNet) and use a binary encoding method in angle classification. This method reduces the background influence, so that there is almost no overlap between detection boxes. Based on the angles of the detection boxes, we can infer the information of the motion direction of the target and further determine the motion trajectory of the target. We conducted ablation experiments on a large publicly available for object detection in an aerial imagery (DOTA) to verify the effectiveness of each module in the method and compared the method with several other detection methods. The experimental results demonstrate the effectiveness of our method.


I. INTRODUCTION
Objection detection is a fundamental task in computer vision, and many researchers have applied horizontal bounding boxes to locate objects in images. The use of horizontal bounding boxes can make the representation of candidate regions more concise and intuitive. In many methods based on deep learning [1]- [5], a large number of labeled samples are often needed to train the object detector model, and using an axis-parallel labeling frame can greatly improve the efficiency of labeling, to quickly obtain a large number of labeled samples. In addition, the horizontal bounding boxes involve fewer parameters, simplifying the training process of the detection model. Therefore, in most object detection methods, a horizontal bounding box is used to represent the The associate editor coordinating the review of this manuscript and approving it for publication was Gulistan Raja . approximate range of the target in remote sensing images, as shown in Fig. 1.
However, objects in aerial images are often arbitrarily oriented. Therefore, the use of horizontal bounding boxes to detect objects [6]- [9], will give rise to several problems. First this type of object detection frame often contains many background areas. As shown in Fig. 1(a), approximately 60% of the area in the figure belongs to the background area. The presence of too many background regions in the detection frame not only increases the difficulty of the classification task, but also leads to the problem of inaccurate representation of the target range. Second, the horizontal bounding boxes will lead to strong overlap between detection frames, as shown in Fig. 1(b), reducing detection accuracy. Finally, since the objects in images such as aircraft, ships, and vehicles contain motion direction information, information regarding the direction of motion of the targets cannot be obtained if the horizontal bounding boxes are used.
The above three problems can be effectively solved by using a rotation detection frame with angle information, as shown in Fig. 2. First, rotation detection can precisely locate the objects in the images, and the bounding boxes contain almost no background area, thus greatly reducing the influence of background on object classification. Second, there is almost no overlap between the rotating detection frames, so that the objects contained in the frames can be more clearly identified. Finally, the motion direction information of the object can be roughly obtained from the rotating detection frame, so that the motion trajectory of the object can be judged. In summary, the use of rotation detection frame with angle information in remote sensing image object detection task obtains superior performance.

A. RELATED WORK
Most of the classical target detection methods use horizontal bounding boxes. Object detection methods using deep learning can be broadly classified into two categories: two-stage detectors and single-stage detectors. Two-stage detectors first extract candidate regions from images, and then predict and classify objects within the candidate regions. R-CNN [1] applies complex neural networks to the objects detection task, followed by Fast-RCNN [2] and Faster-RCNN [3] that are faster and have higher detection speed. Single-stage detectors predict the bounding box and class probability of the object with only one evaluation of the image. In the field of single-stage detectors, the representative methods include SSD [10] and YOLO [4]. Single-stage methods have higher detection speed than two-stage methods. However, the objects in remote sensing images have the characteristics of small size, large scale difference and diverse directions, so that horizontal detection frame cannot detect the object accurately.
Rotation detectors are used in a wide range of applications in aerial images and scene texts. [11], [12] use rotation detection frames to detect ships in aerial images. In recent years, deep learning techniques have developed rapidly, and many researchers have applied these techniques to object detection in remote sensing images. [13] adds the angle regression function into the detector to achieve the detection of arbitrary angle objects, and [14] improves the twostage detection algorithm to generate the rotating bounding box using regression to improve the detection accuracy. The scenes in remote sensing images are generally complex, with a large number of objects and uncertain angles. To solve these problems, some robust algorithms have been developed, such as some current state-of-the-art algorithms SCRDet [15], and ROI-Transformer [16]. However, most of the above mentioned algorithms have boundary problems due to the regression method [17], [18]. In this paper, we propose to avoid the boundary problem by using classification algorithms instead of regression algorithms.

B. CONTRIBUTION
In this paper, we hope to find a method to avoid the boundary problem and at the same time be able to improve the accuracy of object detection. Specifically, we propose an object detection algorithm for remote sensing images based on angle classification. The method uses a deep residual network to extract features in remote sensing images, employs the long-edge definition method and uses a neural structure search-based feature pyramid network (NAS-FPN) [19] for fusion of feature maps at different scales. The long-edge definition method is then used to represent the rotating detection box and the binary-valued coded labeling technique from the dense coded labeling technique is utilized in the detection frame regression task. This technique transforms the angle regression problem into an angle classification problem, which can avoid the problem of boundary discontinuity that exists in the long-edge definition method. The main contributions of this paper are as follows.
(1)We use the IoU-smooth L1 loss function on the anglebased classification method in the regression loss part of calculating the bounding box, which measures the intersection ratio between the prediction and ground truth boxes. We validate the effectiveness of the network on a large publicly available for object detection in an aerial imagery (DOTA) and the detection accuracy of the network is better than that of some current remote sensing image-based target detection methods.
(2)We use a rotation detector based on angle classification to avoid the boundary discontinuity problem that occurs with parametric regression methods, and we use a binary encoding tag-based encoding method for angle classification, which has a shorter encoding length compared to other encoding methods and can improve the model efficiency.

II. OBJECT DETECTION METHOD BASED ON ANGLE CLASSIFICATION A. NETWORK ARCHITECTURE
The proposed rotation detector framework is presented in Fig 3. Our network is based on the RetinaNet framework. The feature maps labeled C2, C3, and C4 in the figure are extracted by the deep convolution neural network. The overall steps of the method are as follows: first, the feature extraction network is used to extract the features in the remote sensing images, and the NAS-FPN is used to fuse the extracted features to obtain the feature maps at different scales. Then, we use the long-edge definition method to represent the rotation detection frame, and the binary encoding labeling technique is used to transform the angle regression problem into an angle classification problem in the frame regression task. Some of the important structures in the method are described in detail below.
The backbone network RetinaNet used in this paper is an end-to-end object detection algorithm, and based on this network, we replace the obsolete parts of the RetinaNet network with new techniques that do not harm the end-to-end learning approach. We apply this method to the rotation detection task, and from the experimental results, our method performs well in all 15 classes of images in the DOTA dataset, and has the best overall performance. Moreover, this paper verifies the effectiveness of each module of the network by ablation experiments.

B. NAS-FPN MODULE
In recent years, deep learning has been widely used in a variety of automated tasks. The success of deep learning relies heavily on the powerful learning capability of the technique, the huge amount of data, and the evolving hardware capabilities. The most critical task in deep learning techniques is the design of the neural structure, such as designing the number of layers of the network. The design of neural architecture is also known as neural architecture search (NAS) [20]. Most of the NAS still relies on manual analysis that does not guarantee the stability of the neural structure. To address this problem, researchers have started to focus on the study of neural architecture search that can learn autonomously.
A feature pyramid network can fuse feature maps at different scales, but the network focuses too much on low-level features and neglects the optimization of high-level features, leading to a decrease in the detection accuracy of large objects. Moreover, the network is based on manual design, and since the number of combinations of feature maps at different scales increases exponentially with the number of the layers in the network, the manual design approach will lead to a huge design space, making the performance of the feature pyramid network not necessarily optimal. To obtain a feature pyramid network with better performance and more variability, Ghaisi et al. combined the idea of cross-layer connectivity to find a feature pyramid network structure with optimal performance in a deterministic search space. The structure is called NAS-FPN.
In NAS-FPN, the most important structure is the merged cell structure that consists of a collection of feature graph nodes, a pool of operations, and a search termination condition. Below, the search process of the feature graph is briefly described in Fig. 4.
1) A feature map is randomly selected from the feature map node set as one of the inputs. The initial set of feature map nodes contains five scales of feature maps, denoted as {C1, C2, C3, C4, C5}.
2) Randomly select another feature map from the feature map node set as another input.
3) Select the resolution of the output feature map. 4) Select an operation in the operation pool to operate on the feature map nodes selected in (1)(2) to produce a feature map with the same resolution as the output feature map and add this feature map to the feature map node collection for selection.
5) The above steps are repeated cyclically. The termination condition of the search is to generate five feature pyramid  networks with the same resolution as the initial feature map, which is denoted as {P1, P2, P3, P4, P5}.
Step (4) consists of two operations, namely sum and global pooling. The sum operation scales the smaller of the two input feature maps to the same size as the larger feature map, and then fuses the two feature maps by using pixel-by-pixel summing. The global pooling operation pools the smaller feature map, multiplies the larger feature map by pixel after the sigmoid operation, and then adds the obtained feature map with the smaller feature map for fusion. The feature pyramid network obtained by the NAS-FPN module achieves a certain improvement in the detection accuracy of the object detection method.

C. ROTATION DETECTION FRAME
There are three typical angle coding methods, including two five-parameter methods for different angle ranges [21] and an eight-parameter method [22]- [24]. The details are as follows: Five-parameter method with 90 • angular range (OpenCV definition method): its schematic diagram is shown in Fig. 5. This definition method contains five parameters [x, y, w, h, θ]. Here, x and y are the center coordinates of the rotating frame, θ is the acute angle between the rotating frame and the x-axis, and the counterclockwise direction is specified as the negative angle, so that the angle range is [−90 • , 0); the width w of the rotating frame is the side where the rotating frame is located in the angle, and the height h of the rotating frame is the other side.
Five-parameter method with 180 • angular range (long-side definition method): its schematic diagram is shown in Fig. 6. The definition method also contains five parameters [x, y, w, h, θ], x and y represent the center coordinates of the rotation frame. The difference between the two definition methods is that this definition method first specifies that the long side of the rotating frame is the height h and the short side is the width w. It also specifies that the counterclockwise direction is the negative angle and the clockwise direction is the positive angle, while the angle θ represents the angle between the height h and the x-axis of the rotating frame, and the angle range is [−90 • , 90 • ). Eight-parameter method: The schematic diagram of this definition method is presented in Fig. 7, and shows that the definition method contains eight parameters [a1, a2, b1, b2, c1, c2, d1, d2], the point in the upper left corner of the definition is the starting point, and the remaining points are sorted counterclockwise.
The representation of the rotating frame is not limited to the above three methods, but the representation of the rest of the rotating frame can be obtained by transforming the above three methods.

D. ANGLE CODING METHOD
Parametric regression is currently a popular method for rotation object detection. However, the parametric regressionbased rotation detection method has some fundamental drawbacks. These methods often suffer the boundary discontinuity VOLUME 9, 2021 problem, leading to inconsistent regression forms of the model at the boundary. The boundary discontinuity problem is mainly caused by the periodicity of angular and the exchangeability of edges. The periodicity of the angular and the commutativity of the edge will be explained in detail in the following section by combining the above three representation modes of the rotating frame.
(1) Five-parameter definition method for the 90 • range: The boundary discontinuity problem of this rotating frame representation is sketched in Fig. 8. . The total loss is the difference between the predicted offset and the target offset after the smooth L1 function, and the total loss is much larger than 0.
From the above analysis, it is clear that the loss of this rotating frame representation is not continuous, and the loss at the boundary will increase suddenly. There are two main reasons for this phenomenon: first, the problem of the periodicity of the angular. Although angle rotation is a continuous process physically, the process has a large impact on the loss calculation. The second is the exchangeability of the edges. In the five-parameter definition method in the 90 • range, the width w and height h may switch with each other, leading to a mismatch between the width and height of the proposed box, the ground truth box and the prediction box that give rise to a further increase in the loss. To reduce the loss, the network must adopt a more complex regression method, for example rotating the blue proposal box 67.5 • clockwise and scaling the width w and height h. However, this method will greatly increase the difficulty of regression.
(2) Five-parameter definition method for the 180 • range: The boundary discontinuity problem of the representation of this rotated box is illustrated in Fig. 9. In Fig. 9, the green box is also the bounding box of the truth label that can be represented by a five-parameter definition of the 180 • range as a value of [0, 0, 100, 25, 67.5 • ], indicating that the width w is 100, the height h is 25, and the angle is 67.5 • . The blue box is the proposed bounding box that can be expressed as [0, 0, 100, 25, −90 • ] using the five-parameter definition method for the 180 • range, indicating that the width w is 100, the height h is 25, and the angle is −90 • . The red box is the prediction box that can be expressed as [0, 0, 100, 25, −112.5 • ], indicating that the width w is 100, the height h is 25, and the angle is −112.5 • . This angle represents the angle between the rotating box and the x-axis. It is observed from the figure that the optimal angle regression should be obtained by rotating the blue proposed box counterclockwise by 22.5 • to obtain the red predicted box, the target offset is [0, 0, 0, 0, 157.5 • ] and the predict offset is [0, 0, 0, 0, −22.5 • ]. It is obtained from the above analysis that the loss of this rotated box representation also increases abruptly at the boundaries, but the only reason for this phenomenon is the periodicity of the angular and not the exchangeability of the edges, because this method fixes the long and short sides of the rectangular box to be specified as w and h. Nevertheless, the loss in this method is still much larger than 0. To reduce the loss, the network must use a more complex regression method, for example rotating the blue proposed box clockwise by 157.5 • , but this method will also greatly increase the difficulty of the angle regression.
(3) Eight-parameter quadrilateral definition method: The problem of boundary discontinuity in the representation of the rotated box is shown in Fig. 10. The blue box is the proposed bounding box. If the red box is the truth label, after defining the distance and sorting the points according to the angle regression, the ideal is consistent with the actual angle regression as [(a1→a2), (b1→b2), (c1→c2), (d1→d2)]. When the green box is the truth label, the ideal and actual angle regressions are not consistent after the distance is defined and the points are sorted according to the angle regression. The ideal regression should be [(a1→b3), (b1→c3), (c1→d3), (d1→a3)], but the actual situation is [(a1→a3), (b1→b3), (c1→c3), (d1→d3)]. The problem also arises because of the existence of angular periodicity.
Rotation object detection methods based on angle regression have achieved good performance in various advanced vision tasks, and provide inspiration for many detection methods. However, these methods inevitably suffer from the boundary discontinuity problem. The problem is usually caused by the angular periodicity and edge exchangeability in the five-parameter definition method and the angular point arrangement order in the eight-parameter definition method. The boundary discontinuity will cause problems such as the sudden increase in the loss of the model at the boundary and the inconsistency of the regression form at the boundary and at the non-boundary. Although some special tricks are incorporated in many rotation object detection methods based on angle regression to alleviate the boundary discontinuity problem, these tricks increase the computational cost of the model and the difficulty of boundary prediction making these models unsuitable for the high-precision rotation object detection task, and reduce the detection accuracy of large aspect ratio objects. The boundary discontinuity problem that occurs with the rotation detection method based on angle regression usually arises because of the angular periodicity or corner ordering. and the root cause is not limited to a specific representation of the bounding box; therefore, to avoid the boundary discontinuity problem, we adopt the detection method based on angle classification.
Angle classification is to encode each angle. Each angle is considered a category, and the angle prediction problem is transformed into an angle classification problem. The commonly used angle encoding methods are shown in Fig. 11. There are two types of commonly used angle coding methods: sparse coded labels [18] and dense coded labels [25]. The sparse coding labels contain one-hot labels and circular smoothing labels (CSL), while the dense coding labels contain binary coding labels (BCL) and grayscale coding labels (GCL). It has been experimentally demonstrated that the object detection performance of the angle coding method based on binary coding labels is better than those of the other angle coding methods [25]. Other encoding methods require a longer number of bits for the encoding, while the binary encoding tag-based encoding method has a shorter encoding length compared to other methods, thus improving model efficiency. Therefore, the following section focuses on the binary encoding labeling process in dense encoding labels. Table 1 shows the encoding process of the binary encoded tag, and Table 2 shows the decoding process of the binary encoded tag.
Below, we provide a specific example in order to illustrate the encoding and decoding process of binary encoded tags, as shown in Fig.12.
In the encoding process, it is assumed that the angle GT=88 • for the ground truth label box; and it is assumed that the angle size ω = 180 • /256≈0.703125 • for each category because we use the five-parameter definition method of 180 • range to represent the rotation box, then the angle range AR=180 • ; the encoding length n = log 2 (AR/ω) = log 2 256=8, representing the use of an 8-bit binary number for encoding. The result of -round((GT−90 • )/ω) is converted to a binary number and the final code is 00000011.

E. LOSS FUNCTION
We use a multitask loss function to describe the difference between the ground truth and the predicted value. The multitask loss function contains three components: the regression loss of the bounding box, the classification loss of the angle, and the classification loss of the category, as descried in Eq 1.
obj n L bcl (θ gt , logits) L cls (p n , t n ) (1) λ 1 is the weight coefficient, N represents the number of proposal boxes, obj n is the binary value, obj n = 0 for background and obj n = 1 no regression for background, x, y, w, h is the center coordinates, width and height of the proposal box, v nj represents the prediction vector of x, y, w, h; v nj is the truth vector of x, y, w, h, as expressed by Eqs 2 and 3; L reg (v nj, v nj ) is calculated using the smooth L1 function, and IoU is the intersection ratio between the prediction frame and the ground truth frame.
In the regression loss part of the bounding box, the IoU-smooth L1 function is used to calculate the loss to further eliminate the discontinuity problem at the boundaries. In the categorical loss of angle part, λ 2 is the weight coefficient; N represents the number of proposed boxes; obj n also represents a binary value, and L cls (p n , t n ) is calculated by binary coded label loss function, as expressed by Eq 4.
In the classification loss part of the category, λ 3 is the weight coefficient; N represents the number of proposal frames, p n represents the predicted probability distribution of each category, t n represents the ground truth label, and L cls (p n , t n ) is specifically calculated using the focus loss function.
In this paper, the hyperparameters λ1, λ2, and λ3 of the three components of the loss function are taken as 4, 1, and 2, respectively. The values of the three weight coefficients are derived from experiments, and we focus on the detection effect of the network after adding some new models. While there is no specific index to measure the angle prediction accuracy, the angle prediction accuracy and the target detection accuracy are consistent, and the more accurate angle prediction implies a more accurate target detection.
L bcl (θ gt , logits) = FL(Encode bcl (θ gt ), logits) In Eqs. 2 and 3 [15], x, y, w, and h are the central horizontal coordinate, central vertical coordinate, width and height of the box respectively. The variables x, x a , x are the central horizontal coordinates of the ground truth box, the proposed box and the predicted box, respectively, y, y a , y are the central vertical coordinates of the ground truth box, the proposed box and the predicted box, respectively, w a , w are the width of the proposed box and the predicted box, respectively, and h a , h are the height of the proposed box and the predicted box, respectively.
In Eq. 4, θ gt is the angle of the ground truth frame, logits is a list of angular prediction probabilities of the prediction frame, as shown in Eq. 5, where p is each prediction probability in the list, FL is the focal loss function, and Encode bcl is the binary encoding function, as shown in Table 1.

III. EXPERIMENTAL PARAMETERS AND EVALUATION INDEXES A. EXPERIMENTAL DETAIL
The experimental environment for our work is shown in Table 3. We use the DOTA in this paper. DOTA is one of the largest aerial image detection benchmarks with quadrangle annotations. DOTA contains 2806 aerial images from different sensors and platforms and the size of the image ranges from approximately 800 × 800 to 4000 × 4000 pixels. Instead of the horizontal labeling method and the fiveparameter labeling method, the DOTA was chosen to use the quadrilateral labeling method to label the four vertices of the object that can be combined with Fig. 13 to understand the labeling method of this dataset. Specifically, the starting point is marked first. Usually, the head of objects such as a baseball field, an airplane, or a vehicle is used as the starting point, but for objects such as a basketball court or a soccer field that do not have an obvious head, the top-left point is usually used as the starting point, and the remaining three vertices are then labeled clockwise.
The spatial resolution of the DOTA dataset is very high, giving rise to some difficulty in model training. Second, the size of the various types of objects in this dataset varies  greatly, and most of the objects are small. For example, a car can be as small as 30 pixels and a bridge can be as large as 1200 pixels, which is 40 times the size of a car, as seen in Table 4. The DOTA requires the model to be sufficiently flexible to handle both small and large objects. In addition, the objects in this dataset show a large variation in aspect ratio, further enhancing the difficulty of target detection in this dataset. Table 5 lists the important experimental parameters of our method. The batch size is 1, corresponding to 1 image per training. We experimentally found that the best training results are achieved when the batch size is set to 1, so that we set the batch size to 1. The total number of training rounds (epochs) is 20; the momentum is 0.9; the initial learning rate is 0.0001; the Learning Rate Decay Rate is 10, indicating the decay rate of the learning rate, and the decay step of the learning rate is 5, meaning that after every 5 rounds of training, the learning rate will decay by a factor of 10.

B. EXPERIMENTAL PARAMETERS AND EVALUATION INDEXES
The experiments still use the common evaluation metrics of object detection to evaluate the performance of our method, including the single-class average accuracy (AP), the mean average accuracy (mAP), precision, recall, and the F1-score.

C. EXPERIMENTAL PROCEDURE AND ANALYSIS OF RESULTS
Due to the large size of the images in the DOTA dataset, the images were cropped into smaller images of 600 pixel × 600 pixel for training prior to the training process. We generate new label information for the cropped images to facilitate the model training later. There are approximately 27,000 small images obtained after cropping. In the experiment, the training loss is reduced to less than 0.06 and we  consider that the model has converged correctly. To verify the effectiveness of each module in our method, ablation experiments among the modules, including the NAS-FPN module, the binary coded label BCL module, and the IoU-smooth L1 loss function, are performed first. Then, the method is compared with six existing high-performance rotating frame object detection methods to demonstrate the detection performance.
In Table 6, when the base method uses ResNet50 as the feature extraction network, the average detection accuracy mAP is only 63.89%. When ResNet152 is used as the feature extraction network, the average detection accuracy (mAP) can reach 66.85%, which is an improvement of 2.96%. Therefore, ResNet152 is used as the feature extraction network in our method. After adding the NAS-FPN module, the mAP is improved by another 3.31%, when the binary coding module BCL is added again, the mAP is further improved by 1.99%, and finally, after adding the IoU-smooth L1 loss function, the mAP reaches the highest value of 75.53%. Table 6 shows that the detection accuracy of most of the objects increases after the modules are added one by one. According to the above ablation experimental results, each module added to the basic method helps to improve the detection accuracy of the remote sensing image objects, illustrating the effectiveness of each module.
According to Table 7, compared with the basic method, the precision and F1-score have improved significantly, although the recall has decreased slightly, indicating that our method can detect the objects more accurately and comprehensively,  and the detection performance is better than that of the basic method. Fig. 14 shows the detection results of the basic method and our method on some images in the DOTA, and only some of the original images are captured here to make the detection results more obvious. From the figure, we can see that the basic method is prone to mis-detection, such as the roundabout intersection and port in the figure; and the angle prediction of the basic method is not accurate enough, such as for the soccer field and tennis court in the figure. By contrast, our method can detect most of the objects with higher object detection accuracy, and more importantly, it can mark the location of the object using a rotating detection frame with a more accurate angle.
We compared our method with eight existing highperformance methods, including R2CNN (Rotational Region CNN) [26], RRPN (Rotation Region Proposal Networks) [27], RetinaNet [28], ICN [29], RoI Transformer [16], CADNet [30], MFIAR-Net [31] and DRN [32]. The results of the comparison experiments are shown in Table 8, which includes the average detection accuracy AP and the mean average detection accuracy mAP for each type of target in VOLUME 9, 2021 the DOTA dataset. Since the publisher of the DOTA dataset does not publish the truth labels of the test set, the AP and mAP values discussed here are obtained by submitting the prediction files to the official DOTA evaluation server for evaluation.
As illustrated in Table 8, our method not only has the optimal multiclass average accuracy compared with the eight object detection methods, but also has improved the single-class average accuracy for most of the objects. This reflects the greater ability of our method to use more accurate rotated detection frames to indicate the location and class of objects in remote sensing images.
As shown in Fig. 15, due to the large size of the measured images, small images containing typical scenes are selectively shown here. It is observed from the figure, that our method can accurately detect the position of the objects by using a rotating frame with an angle, and can also give the approximate angle value of the rotating frame.
Our method is an improvement for on RetinaNet. Since there is no precedent for combining these techniques, it is unclear whether their combination will produce better detection results. We performed ablation experiments and compared the accuracy with some other methods. According to the above experimental results, our proposed method (ResNet152+NAS-FPN+BCL+IoU-smooth L1) is superior to the other methods.

IV. CONCLUSION
In this paper, we proposed a rotation detector based on angle classification. The embodiment is improved on the basis of RetinaNet. First, we use the residual network to extract the features in images, and the feature pyramid network based on neural structure search is used to fuse the extracted feature maps to obtain feature maps of different scales. Then angle classification is used to avoid the problem of periodicity of angular, while the five-parameter definition method with 180 • range is adopted to solve the problem of exchangeability of edges. Finally, the IoU-smooth L1 function is added to the loss function to further eliminate the boundary discontinuity problem. The effectiveness of our method is verified by ablation experiments and comparison experiments, and the dataset used is the DOTA dataset. The results of the ablation experiments show that each proposed module contributes to improving the object detection accuracy. The results of the comparison experiment further demonstrate that the proposed method has higher accuracy in remote sensing image object detection compared with the comparison methods, and also can locate the objects with more accurate rotation detection frames.
In the field of machine learning, learning tasks can be broadly classified into the two categories of supervised learning and unsupervised learning. Usually, both need to learn predictive models from training datasets containing a large number of training samples. Although current supervised learning techniques have achieved great success, it is important to note that it is difficult to obtain strongly supervised information such as full truth labels for many tasks due to the high cost of the data labeling process. Unsupervised learning is quite difficult due to the slow development of the learning process. Therefore, weakly supervised learning has been gradually attracting attention. Several studies using weakly supervised learning in combination with perspective regression algorithms have already been reported. From the point of view of training, the difference between classification models and regression models is their loss functions. Some work on detection using weakly supervised learning with angle regression algorithms has been carried out [33], and we believe that detection methods based on angle classification can also be implemented using a deep learning approach of supervised learning. We expect to conduct research in this area in future work. University. Her current research interests include massive multiple-input-multiple-output systems and information acquisition and processing. VOLUME 9, 2021