An Automatic Detection Method of Bird’s Nest on Transmission Line Tower Based on Faster_RCNN

The bird’s nest on the transmission line tower has a bad impact on the transmission equipment, and even threaten the safe and stable operation of the power grid. In recent years, the number of bird pest in transmission line is increasing year by year, resulting in increasing economic losses. The traditional bird’s nest identification method of transmission line is time-consuming and labor-intensive, and its security level is low. Therefore, this paper proposes an automatic detection method of bird’s nest on transmission line tower based on Faster_RCNN convolution neural network. This method can automatically identify the location of the bird’s nest on the transmission line tower by using the image collected by unmanned aerial vehicle (UAV). The problem of insufficient training samples and overfitting of neural network classifier is solved by enlarging the bird’s nest image. The experimental results show that this method can effectively detect bird’s nest targets in complex environment, and the highest recall rate can reach 95.38%, the highest F1 score can reach 96.87%, and the detection time of each image can reach 0.154s. Compared with the traditional nest detection method, this method has stronger applicability and generalization ability. It provides technical support for analyzing bird activities and taking effective preventive measures.


I. INTRODUCTION
The bird pest, lightning disturbance and external force damage are the three main obstacles of overhead transmission lines. According to statistics of relevant data, bird damage accounts for 32% of the total, and the number of line trips caused by bird activities is second only to that caused by lightning damage and external force damage [1]- [3]. In recent years, with the continuous improvement of the natural environment, the breeding of birds is gradually accelerated, and the activities of birds are increasingly frequent. The bird pest of overhead transmission lines increases year by year, which seriously threatens the safe and stable operation of transmission line equipment. In addition, composite insulators are The associate editor coordinating the review of this manuscript and approving it for publication was Hao Luo . widely used in electrical power system because of excellent pollution resistance, high mechanical strength and environmental friendliness [4]. Abnormal temperature rise (ATR) of composite insulator [5] and insulation breakdown are the main causes of electrical damage, and the activities of birds indirectly affect the insulation performance of transmission lines [6]. At present, the research on bird's nest automatic detection is just beginning. The rapid development of UAV and image processing technology has brought convenience to online target detection. Basso et al. [7] proposed a UAV entire guiding system based on image processing techniques. The filter for the image pretreatment to remove the false positive lines, and the algorithm for generating the guiding parameters based on detected crop rows. Zhou et al. [8] proposed a novel edge detection method, which selected optimal parameters for changing backgrounds, and built the first fully automated UAV that successfully tracked power lines in the real world. Baker et al. [9] used Hough transform and line tracing techniques to implementation power line detection. Xu et al. [10] proposed to use the special texture, color and shape of the bird's nest region to judge whether there is a bird's nest target in the image. However, the grass, branches and so on also have strong texture characteristics, and the color is similar to the bird's nest, and the interference is large. Duan et al. [11] extracted the HOG feature of the bird's nest in the image and classified it with SVM, but this scheme was not suitable for bird's nest detection in complex environments. Wang [12] used Gabor and PCA to extract the features of the bird's nest image and ELM algorithm as a classifier to identify and judge the bird's nest image, but this method was complex and its generalization ability was low. Wu et al. [13] extracted the main part and branch part of the image through binary and morphological preprocessing operations, extracted the branch part of the bird's nest by using the detection of the suspension point, and classified the branch information by using SVM classifier. However, the applicable object of this method were towers on both sides of the railway track and the background was simple. There is no trees or grass for disturbance, so the robustness was low.
In the field of target recognition, in 2006, Hinton et al. [14] proposed an algorithm based on deep learning, and used deep convolutional neural network to learn high-level features from a large amount of data. Deep convolutional neural network fused feature extraction, feature selection and feature classification in the same model. Through end-to-end training, it optimized the overall function and further strengthened the classification of features. In 2012, Krizhevsky et al. [15] used convolutional neural network method in ILSVRC competition, won the first place with 15.3% error rate, far exceeding the second place. Girshick R et al. proposed regional convolution neural network (RCNN) [16], [17], which used the selective search method to select several candidate regions of the same size in the image to be detected, and used the deep convolution neural network for high-level feature extraction. Then, multiple SVM were used to classify the features to complete the target detection. Girshick R in order to improve the detection accuracy and speed of RCNN, Fast_RCNN method [18]- [20] and Faster_RCNN [21]- [24] were proposed successively. Shi studied the bird's nest detection algorithm on the transmission line, and trained AdaBoost classifier with Haar feature and LBP (local binary pattern) feature respectively, the recognition accuracy of bird's nest was 66.87% and 80.63% respectively. In addition, the accuracy of using Fast_RCNN neural network model to detect bird's nest was 92.46%. Zou et al. [25] used R-FCN algorithm to identify the defects in the transmission line, and the recognition accuracy of bird's nest reached 90%. However, the practicality and accuracy for bird's nest need further investigation.
In order to speed up the extraction of features and overcome the problem of poor robustness of artificial design features. This paper uses Faster_RCNN model in the field of target detection [21]- [24], adds K-means algorithm to extract initial candidate regions, and uses regional proposal network (RPN) [26]- [28] to generate candidate regional proposal, using detection network to classify and locate bird's nest target. K-means algorithm and RPN are used together to reduce the calculation and save the time of generating candidate regions. Because there is no public nest data set at present, this paper increases the number of samples and solves the overfitting problem of model training by increasing the number of UAV images in Jiangxi area. The recognition accuracy and recognition time of ZFNet [29], Vgg16 [30], [31] and ResNet-50 [32] are studied. Experiments show that Faster_RCNN can effectively detect bird's nest targets in complex outdoor environment. The highest recall rate can reach 95.38%, the highest F1 score can reach 96.87%, and the detection time of each image can reach 0.154s. Compared with the existing work results, the experiment verified that Faster_RCNN used in bird's nest detection takes less time, missed detection and false detection. Compared with the traditional nest detection method, this method has stronger applicability and generalization ability. It provides technical support for analyzing bird activities and taking effective preventive measures.

II. BIRD'S NEST DETECTION MODEL BASED ON FASTER_RCNN
In order to detect the bird's nest on transmission line tower automatically, it is necessary to eliminate the influence of complex background of transmission line. The RCNN series algorithm based on region selection is a classical algorithm for target detection [16], [17]. RCNN algorithm can be divided into four steps: 1) For each sample image, a large number of candidate regions with different sizes are generated by selective search. 2) Each candidate region is input to CNN network for feature extraction.
3) The feature vector is sent to SVM classifier to classify the target and judge whether the region belongs to the target or the background. 4) Bounding box regression (BBR) is applied to the identified candidate areas to correct the box location and size. Although R-CNN successfully combines deep learning with target detection tasks, there are many defects, such as too many candidate boxes, time-consuming training, large disk space, and many candidate boxes have duplicate areas, resulting in repeated calculation. Therefore, scholars have proposed the improved RCNN method for the above defects of RCNN model. Fast _RCNN [18]- [20] adopted adaptive scale pooling, which can optimize the whole network and improve the accuracy of deep network identification. Faster_RCNN [21]- [24] used regional proposal network (RPN) [26]- [28] instead of the selective search method with large time cost. RPN predictd regional proposal frames had less quantity and higher quality. In addition, most of RPN prediction was completed in GPU, and convolution network and Fast_RCNN shared the same time, so the speed of target detection was greatly increased. Therefore, in this paper, we use Faster_RCNN algorithm to detect the bird's nest on the transmission line tower. The detection process of the bird's nest on the transmission line tower based on Faster_RCNN is shown in Figure 1. The whole process of the bird's nest detection is divided into five processes. The image features are extracted by Vgg16, ZFNet and ResNet-50 respectively. The network structure of VGG16 consists of 13 convolutional layers, 13 relu layers and 4 pooling layers. The network structure of ZFNet consists of 5 convolutional layers, 7 relu layers, 3 pooling layers and 3 full connection layers. ResNet-50 has a deeper network layer of 50 layers. The generation of accurate candidate regions uses RPN and the classification and regression calculation of the detection box of the bird's nest uses the detection network.

A. EXTRACT THE INITIAL CANDIDATE REGIONS
The input image is divided into m × n cells. Each cell is given B initial candidate regions of different specifications. The initial candidate regions are extracted by convolution calculation. The number of initial candidate regions of each image is m × n × B. In the training stage, the initial specifications and quantity of candidate regions need to be set. With the increasing number of iterations, the parameters of candidate regions are adjusted continuously, and finally close to the real bird's nest region. In order to speed up the convergence, K-means method is used to cluster and get the candidate regions close to the bird's nest in the image. In general, K-means clustering uses Euclidean distance to measure the distance between two points, and clusters the ratio of the height and width of candidate region to the length of unit grid. IOU (intersection over union) is an important index to reflect the candidate region and the real nest region. The larger the IOU value is, the smaller the difference is, and the clustering objective function is: where N is the clustering category, M is the clustering sample set, Box [n] is the width and height of the candidate region obtained by clustering, and Truth [M] is the width and height of the real bird's nest region.

B. REGIONAL PROPOSAL NETWORK
RPN takes the convolution layer feature extracted from the original image through the pre trained model VGG16 on Imagenet as the input, and outputs a series of rectangular candidate box and whether the rectangular candidate box is the target score. The structure of RPN is shown in Figure 2.
It takes an image of any size as input and outputs a set of target proposals. Each proposal corresponds to the probability and location information of a target. RPN adopts the sliding window mechanism. A sliding window is added to the feature map of the last shared convolution layer of convolution neural network. The sliding window input feature graph n × n, which is fully connected. Each sliding window is mapped into a short vector of low dimension, which is input to two parallel fully connected network layers. One network layer outputs whether the feature in the sliding window region belongs to the image background or target, and the other network layer outputs the regression coordinates of the location of the area. When the n × n sliding window slides on the feature map matrix, each position of the sliding corresponds to k anchor boxes in the original image. Therefore, one full connection layer outputs 2 × k dimension vectors, corresponding to the fraction of K anchor boxes targets and background, and the other full connection layer outputs 4 × k dimension vectors, indicating the transformation parameters of K anchor boxes corresponding to the real target boxes. Each anchor boxes take the center of the current sliding window as the center, and corresponds to a scale and length width ratio respectively. As shown in Figure 2, RPN network uses a 3 × 3 spatial pooling window to slide on the feature map of shared volume layer. In order to adapt to different sizes of objects, three kinds of sliding windows with different sizes are used in this paper. The ratio of length and width is 1:1, 2:1 and 1:2, respectively. These windows are convoluted to form a 256 dimensional vector. Finally, non-maximum Suppression (NMS) [33], [34] was used to select 300 candidate boxes with high scores as the final proposals window.

C. DETECTION NETWORK AND TRAINING
In Faster_RCNN, the image is input into convolutional neural network for feature extraction, and RPN is used to generate proposals, which is then mapped to the feature map of CNN's last convolution layer. Through the ROI pooling layer, each ROI (region of interest) is generated into a fixed size feature map. The detection network has two parallel output layers. The output of the classification layer is the probability distribution p = (p 0 , p 1 ) of each box on the two categories of bird's nest and background. The output of the border regression network is the border position parameter, t k = (t k x , t k y , t k w , t k h ), and k represents the category. Border regression network and border classification network are trained by joint loss function: where L cls (p, u) = −log (p u ) is the logarithm loss of the real category u. L reg is activated only when the region to be detected is the bird's nest, p * i = 1. In order to get the accurate rectangle box, two sets of parameters are defined: the real frame of category v = (v x , v y , v w , v h ), the forecast frame of category t u = (t x , t y , t w , t h ). The detailed process is as follows:   where (x, y, w, h) is the center coordinate and border width and height of the real bird's nest target, (x a , y a , w a , h a ) is the center coordinate of the candidate region and the width and height of the region. Finally the loss functions Softmax Loss and SmoothL1 Loss are used combined training of classification probability and bounding box regression. During the test, the original candidate window is corrected with the regression value of the bounding box, and the coordinates of prediction window are generated. In the whole model framework, the calculation formula of multitask loss function of Faster RCNN is as follows: where, i is the index of an anchor in a mini-batch and p i is the predicted probability of anchor i being an object. The ground-truth label p * i is 1 if the anchor is positive, and is 0 VOLUME 8, 2020 if the anchor is negative. t i is a vector representing the 4 parameterized coordinates of the predicted bounding box, and t * i is that of the ground-truth box associated with a positive anchor. The classification loss L cls is log loss over two classes (object vs not object). For the regression loss, we use where R is the robust loss function (smooth L1). The term p * i L reg means the regression loss is activated only for positive anchors (p * i = 1) and is disabled otherwise (p * i = 0). The outputs of the cls and reg layers consist of {p i } and {t i } respectively. R(x) is the loss function of smoothL1, and its calculation formula is as follows: The model training and test flow chart of bird's nest detection on transmission line tower based on Faster_RCNN is shown in Figure 3. The overall training process is as follows: 1) Preprocess the initial nest data set, label the data set with LabelImg, generate the sample XML file, and generate the corresponding file in VOC207 data set format.
2) The image of bird's nest is expanded and the corresponding XML file is generated.
3) The convolution neural network is used to extract features. 4) Model training. 5) The optimal detection model is obtained by fine tuning the parameters of the network model.

III. EXPERIMENTAL RESULTS AND ANALYS A. EXPERIMENTAL ENVIRONMENT
The software and hardware platform and parameters used in this paper are shown in Table 1. All experiments in this paper are carried out on the platform in the following table. We use Win10 operating system. The framework of the algorithm is Tensorflow. Tensorflow is developed and maintained by Google brain. It is an open source software library that uses data flow graphs and can be used for numerical calculation. It has a multi-level structure, can be deployed in various servers, PC terminals and web pages, and supports GPU and TPU high-performance numerical calculation. It is widely used in the programming of various machine learning algorithms. The processor we use is Intel Xeon gold 5120t CPU @ 2.20GHz × 16, and graphics card is Geforce RTX 2080ti.

B. DATA SETS
Since there is no public bird's nest data set, this paper uses 130 bird's nest images of transmission lines in Jiangxi Province, which were taken by UAV. Because the insufficient number of bird's nest samples will lead to over fitting of model training and other problems, this paper uses data enhancement methods to expand the initial bird's nest data to form 2700 augmented images, such as rotating, mirroring, changing brightness, adding gaussian noise, and scaling, the specific data enhancement method is shown in Table 2.

C. EXPERIMENTAL RESULTS AND ANALYS
In this paper, fast RCNN is used to train the network model. ZFNet, Vgg16 network and ResNet-50 network are used to initialize the pre trained ImageNet respectively. For each model training, max iteration is 30000, batch size is 256, learning rate is 0.001, weight decay rate is 0.0005. In this experiment, 50 images of bird's nest on transmission line tower in Jiangxi province are selected for testing. If the bird's nest target and background can be classified, the detection is successful. The test effect is shown in Figure 4. In Fig. 4 (a), a single bird's nest target is detected at the top right of the image. In Fig. 4 (b), in the middle of the image, a single bird's nest target obscured by the tower is detected. In Fig. 4 (c), two bird's nest targets are detected on the left and right sides of the image. In Fig. 4 (d), in the middle of the image, a single bird's nest target obscured by the tower is detected. In Fig. 4 (e), a single bird's nest target obscured by a tower is detected in the upper part of the image. In Fig. 4 (f) and Fig. 4 (g), two bird's nest targets obscured by the tower with different distances are detected. In Fig. 4 (h), three bird's nest targets obscured by the tower are detected in the middle and left and right sides of the image. In Fig. 4 (i), two bird's nest targets are detected on the left and right sides of the image at low contrast. Finally, the test results are statistically analyzed. The statistical results are shown in Table 3. The results show that, while maintaining high detection accuracy, the method in this paper only needs 0.154s on average to detect each image and achieves a fast speed. According to Table 3, for the effect of bird's nest detection, ResNet-50 has the highest accuracy, recall rate and F1 score, followed by Vgg16 and ZFNet. For the detection time of a single image, ResNet-50 has the longest detection time, followed by Vgg16 and ZFNet. Therefore, with the increase of network depth, the accuracy and recall rate of bird's nest detection become higher, and the detection time becomes longer.
The comparison of the detection effect in this paper with the existing work is shown in Table 4.
It is found that the accuracy of the Faster_RCNN algorithm proposed in this paper is 31.54% and 17.78% higher than that of Haar + AdaBoost and LBP + AdaBoost, 5.95% higher than that of the Fast_RCNN algorithm proposed by Shi [2], and 8.41% higher than that of the RFCN method proposed by Zou et al. [25].Therefore, the method proposed in this paper is more suitable for the intelligent identification of bird's nest in practical application.

IV. CONCLUSION
This paper presents an automatic detection method of bird's nest on transmission line tower, which solves the shortcomings of traditional bird's nest detection method, has better applicability and generalization ability, and is helpful to the safe and stable operation of transmission line. The problem of insufficient training samples and overfitting of neural network classifier is solved by enlarging the data of bird's nest image. The experimental results show that this method can effectively detect bird's nest targets in complex environment, and the highest recall rate can reach 95.38%, the highest F1 score can reach 96.87%, and the detection time of each image can reach 0.154s. The method can greatly improve the efficiency and quality of inspection, and lay a good foundation for the intelligent detection on overhead transmission lines. On the basis of the method proposed in this paper, the automatic detection of other typical equipment in transmission line will be considered next.