Detection and Evaluation Method of Transmission Line Defects Based on Deep Learning

The issues of existing research on transmission line detection include the following three: only detects a few categories, no open transmission line component dataset, and no unified, comprehensive evaluation index. In this paper, we propose a detection and evaluation method of defect for transmission line inspection based on deep learning. The transmission line contains various pivotal components, while previous research has mostly focused on a few categories. In the proposed approach, the following study is performed by establishing a transmission line dataset named Wire_10, which considers defects as a category. Wire_10 contains 8 defects in transmission line components, such as insulator defect, triple-plate defect, damper defect, grading ring defect, and et al., as well as nest and foreign body that attached to the transmission line. The object detection of aerial images taken during the actual inspection is susceptible to background and lighting. These two factors are used as variables to define the background-dataset and the lighting-dataset. Faster R-CNN, an end-to-end and high recognition accuracy deep learning algorithm, is used to build detection models with transfer learning and fine-tuning. The results show that the detection method can accurately identify the defect categories in the Wire_10 dataset and is robust to aerial images with complex backgrounds and different lighting. The proposed method can effectively and accurately identify defects in the automatic inspection of transmission lines.


I. INTRODUCTION
With the development of the power grid and the popularity of the unmanned aerial vehicle (UAV) inspection tour technology, the inspection of the transmission line by using the UAV has become the primary way of the defect detection of the power supply system. Instead of manual inspection, UAV inspection reduces the cost and risk of transmission line inspection. However, the traditional artificial screening and check of aerial images collected by UAV have the disadvantages of low efficiency and high missing rate of inspection. It is imperative to carry out efficient and reliable intelligent detection on aerial images. The object detection of aerial images has the problems of small proportion, complex background, multiple interferences, large variations in the contrast between object and background under different illumination conditions. Meanwhile, transmission line The associate editor coordinating the review of this manuscript and approving it for publication was Canbing Li . components have polysemy. For example, insulators include needle insulators, rod insulators, and suspension insulators. These problems are the difficulties in defect detection of transmission line components in aerial images.
In traditional methods, various hand-crafted features are commonly adopted. Zhu et al. [1] and Ban et al. [2] proposed the semantic model of local contour features through lines and curves to detect the spacing rod, the damper, and the grading ring. Zhang et al. [3] adopted Hough transform [4] and statistical texture features to form feature sequence curves for defect detection of insulators. Wang et al. [5] identified glass insulators and composite insulators by combining shape, color, and texture features, and diagnosed the glass insulator's off-chip defect. The algorithm for detecting objects by manual features has high customizability, and relatively, its versatility is poor. Moreover, aerial images are susceptible to cluttered environments and changes in lighting. The accuracy and reliability of such detection methods in practical applications are limited. On the other hand, the machine learning algorithm is also applied in transmission line component detection. Wu [6] proposed a method to detect insulators based on the region of interest and AdaBoost [7] cascade classifier. Zhai et al. [8] used the object recommendation algorithm and the structure search method, and the classifier is trained by the extracted insulator local features. The above methods have more robust feature representation capabilities and improve detection accuracy. However, they cannot handle massive high-pixel images with the shallow architectures.
Deep learning, a branch of machine learning to solve the problem of feature extraction, can automatically learn the association between features and tasks, as well as process large and complex data. Convolution Neural Network (CNN) [9] in deep learning can learn more robust and expressive features that have been applied to transmission line detection and achieved remarkable performance. Zhao et al. [10] utilized CNN to represent the feature of the infrared image, so as to classify insulators. Tang et al. [11] used Faster R-CNN [12] to detect five types of components and defects, among which the recognition accuracy of the grading ring reaches 96.8%. However, it had fewer categories and did not consider the detection effect of the algorithm under extreme conditions. Miao et al. [13] used single shot multibox detector (SSD) [14] to detect insulators, achieving an excellent real-time effect. Zhang [15] used the YOLOv3 [16] to detect the abnormal target in the transmission line. Although SSD and YOLO [17] improve the speed of object detection, Faster R-CNN has a definite advantage in detection accuracy.
In this paper, to address the above challenges, we attempt to establish an evaluation criterion for object detection on transmission lines and propose a deep learning based method to detect transmission line defects. We build the detection model based on Faster R-CNN to detect the 10 categories, including 8 component defect types, nest, and foreign body. The transfer learning method is used to train the detection model. By comparing the recognition accuracy under different base networks and training iterations, we obtain a detection model with the best recognition effect. Then the aerial images with complex background and diverse light intensity are studied to verify the robustness of our detection method.
The rest of this paper is organized as follows. Section II describes the establishment of datasets and the proposed framework of the detection approach for transmission line defects. The experimental results on the different tasks are presented in Section III. In Section IV, we conclude from the results and also discuss our future work plans.   image preprocessing, training of deep learning network Faster R-CNN, and detection of the aerial images. First, we use UAV to inspect the transmission lines, obtain aerial images, and establish a transmission line defect dataset named Wire_10 containing 10 defect types. Defining two sub-datasets: background-test dataset and light-test dataset, which are based on the background and lighting of the image, to verify the robustness of the detection model to the background and light. For the detection model, we utilize the Faster R-CNN algorithm to train by feeding the images and initialize it by ImageNet [18] model. Once the detection model of transmission line defects is well trained, it can be used to detect defects of the transmission line in aerial images taken by UAV. Compared with the detection of transmission line components, the detection of the component defects can better meet the needs of practical applications and contribute to the inspection work. Therefore, the defects of the transmission line are treated as a category when building the dataset. In the paper, the self-built transmission line defects dataset was named Wire_10, and related research was carried out on this basis.
This dataset contains 10 categories of transmission line defects. There are 8 types of component defects: tower foundation defect, insulator defect, grading ring defect, contact terminal defect, triple-plate defect, damper defect, earth wire defect, and bird thorn defect, and 2 types of other defects: nest and foreign body. Table 1 is a detailed description of the defects contained in the Wire_10 dataset. The sample of the Wire_10 dataset is shown in Fig. 2. The descriptions in Table 1 corresponding to the component defects in Fig. 2 are as follows: the tower foundation defect is overgrown weeds, the insulator defect is breakage, the grading ring defect is deformation, the contact terminal defect is deformation, the triple-plate defect is breakage, the damper defect is breakage, the earth wire defect is break, and the bird thorn defect is lack of thorns. It should be noted that each component defect type is a general term and does not distinguish its specific genre, which is for facilitating the detection.

2) IMAGE UNDER SPECIAL CONDITIONS
During the inspection of the UAV, the background of the aerial image is very complicated due to the diverse environments. Considering the impact of different image backgrounds on recognition accuracy, this paper divides the aerial images with the river, forest, building, and farmland as the main background into different datasets, respectively. Samples of the background-dataset are shown in Fig. 3.
Similarly, the aerial image is subjected to different illumination intensities due to time and weather, and the contrast between the object and the background will also be  changed, thus affecting the detection accuracy. We simulate the change in light intensity by adjusting the contrast and brightness of the image. Selecting the image with moderate illumination intensity as the original image, and merging the zero-pixel image with the original image to adjust the contrast and brightness. The method is as shown in the formulas (1) and (2). Fig. 4 shows images obtained by simulated illumination.
f n is the original image, g n is the adjusted image, and f 0 is the zero-pixel image with the same size as f n , α is a multiple of the original image, β is a multiple of the zero-pixel image, γ is the added pixel value. The larger the α, the higher the contrast. The larger the γ , the brighter the image.
The image is dark and blurred when α is 0.4. If it continues to reduce, it will seriously affect the detection effect. In actual operation, it is prohibited to conduct UAV VOLUME 8, 2020 inspections with extremely low visibility. Therefore, for image contrast, α includes six different levels of 0.4, 0.6, 0.8, 1.0, 1.2, and 1.4. Similarly, the image is generally dark when γ is -20, so for brightness, γ is designed as -20, -10, 0, 20, and 40. Based on the original image with γ as 0 and α as 1, the images with γ less than 0 or α less than 1 are simulated cloudy or afternoon light; the images with γ greater than 0 or α greater than 1 are simulated brilliant or early morning light.

C. TRANSMISSION LINE DETECTION NETWORK
Faster R-CNN is a classical method of object detection that has high recognition accuracy. It is suitable for the detection task of the aerial image. The schematic diagram of the Faster R-CNN object detection network for transmission lines is illustrated in Fig. 5. The network is mainly composed of the base network, the region proposal network (RPN) [12], and the refinement network.

1) BASE NETWORK
The base network is an indispensable part of computer vision tasks, which extracts image features for subsequent tasks. The classification network with excellent performance is usually used as the base network for complex tasks such as detection and segmentation. The features extracted from these networks are more comprehensive than traditional hand-craft features. The ZF [19] and VGG-16 [20] are chosen as the base network in [12]. In this paper, the base network of Faster R-CNN is VGG16, which is due to its better feature extraction ability. The VGG16 consists of 13 convolutional layers, 5 pooling layers, and 3 fully connected layers. The features of the image are extracted using the first 13 convolutional layers of VGG16. ResNet-101 [21] is proposed after VGG, which adopts residual learning ideas and solves the feature loss problem to gain accuracy. Therefore, another base network of Faster R-CNN is ResNet-101 (except the fully connected layer).
The VGG16 and ResNet-101 networks differ in the strategy, depth, and complexity of extracting features, resulting in different feature maps extracted from the same image. The VGG16 and the Resnet-101 will be selected as the base networks of the detection model for transmission line defects, respectively. The recognition accuracy of different defects is studied in Section III.

2) REGION PROPOSAL NETWORKS
The Region Proposal Network is a fully convolutional network which inputs image with any size and outputs a set of rectangular object proposals. In detail, the RPN takes feature maps output by the last convolutional layer as input, then slides and scans on the feature maps with a sliding window of 3 * 3 size, and finally generates a feature vector that passes to the class decision layer and the position regression layer. The class decision layer determines whether the object belongs to the foreground or the background through the probability value of the foreground and background of the proposed region of each position. The position regression layer is to locate the object coarsely. The schematic diagram of the RPN is shown in Fig. 6.
We generate 9 proposed regions (named anchors) for each sliding window in the feature map. Anchors are a set of fixed-size proposed regions with three dimensions {128 2 , 256 2 , 512 2 } and three aspect ratios {1:1,1:2,2:1}. Then each sliding window is mapped to a low-dimensional feature (512-d for VGG16 and ResNet-101) and passes the low dimensional vector to the class decision layer and the position regression layer.
When training the RPN, each anchor needs to be divided into positive and negative samples, i.e., foreground and background. Therefore, the number of convolution kernel channels of the classification layer is 18, while the regression layer is 36. The position information of each anchor is described by the coordinates: x, y, and the length and width values: w, h.
The loss function includes two parts: the position regression loss function and the classification loss function. The loss function is shown in (3).
Here, i is the index of the anchor and p i is the predicted probability of anchor i that being an object. The value of p * i is 1 if the anchor is positive, and is 0 if the anchor is negative. t i contains the location information of the prediction proposed region, and t * i contains the location information of the positive anchor. L cls is the classification loss function and L reg is the position regression loss function. The equations of L cls and L reg are shown in (4), (5). R in (5)   function defined in (6).

3) REFINEMENT NETWORK
The refinement network for classification and position consists of the RoI pooling layer, fully connected layer, class decision layer (softmax), and position regression layer (Bounding box regression). The inputs of the network are the outputs of the base network and the region proposal network. The RoI pooling layer uses the max-pooling to normalize the fixed features map. After the fully connected layer, the category probability and the position information of the object are output through the class decision layer and the position regression layer. The refinement network structure is shown in Fig. 7.
The class decision layer determines the category of the RoI using the softmax classifier. When there are k categories, an array of k + 1 dimensions (k categories and backgrounds) is output by softmax, which means the probability that the object belongs to each category. In this paper, we choose the highest probability as the detection result. The probability score of each ROI is P = (p 0 , p 1 · · · p 10 ). For a class u, its loss function is shown in (7).
The position regression layer uses bounding box regression to adjust location information. The output of this layer is an array of size 4 * k dimensions indicating the value that should be panned and scaled if the object belongs to a particular category. Then determining the location of the output box. For a class u, t u = (t u x , t u y , t u w , t u h ) represents the four parameters that should be translated and scaled between the ground truth and represents the predicted value. The loss function of the position regression is shown in (8).
When training the refinement network, its loss function consists of the class decision loss function and the position regression loss function. As shown in (9). L cls (p, u) is the class decision loss function and L loc (t u , v) is the position regression loss function. λ is hyper-parameter and λ = 1 in VOLUME 8, 2020

D. MODEL TRAINING
The proposed detection network adopts the multi-task learning method by sharing convolutional layers. To ensure that the RPN and the refinement network realize the convolutional layer sharing of the base network during the training process, the alternating training and transfer learning method is used for network training. The training process is shown in Fig. 8.
The alternating training steps are as follows: 1) Initializing the region proposal network with the pre-trained ImageNet model and fine-tuning the region proposal network to get the region proposal box.
2) The ImageNet model is used again to initialize the detection network, and the regional proposal box is used as the input of the refinement network to generate an independent detection network. 3) Using the independent detection network to initialize the regional proposal network, and train after fine-tuning it. 4) Finally, the shared convolutional layer is fixed, and only the fully connected layer in the refinement network is fine-tuned so that two networks can share the convolution layers. We train the detection model of transmission line defects on the Wire_10 dataset. The recognition accuracy and generalization ability of the model will be tested in Section III.

III. EXPERIMENTAL RESULTS
In this section, we introduce the details of the evaluation, including the selection of the base network, settings of the experiments, and obtained results.

B. IMPLEMENTATION
The proposed approach is implemented in Caffe [22]. It is running on a computer equipped with Intel(R) Core (TM)i7-7700K CPU, an NVIDIA GeForce GTX 1080 TI GPU, and 16GB of RAM under Ubuntu16.04.
The parameters of the proposed method are set as follows. The initial learning rate is 0.001, the decay factor is 0.1, and the momentum is 0.9. L2 regularization is used to avoid overfitting problems in the model. The base network selects VGG16 and ResNet-101, respectively. There are two options for training iterations, 240,000 and 400,000. Since aerial images are high-definition images, CNN will inevitably suffer from insufficient memory due to excessive calculations. Therefore, the shortest edge scale of the image is set to 300, the number of regions of interest is 128, and the maximum batch size is 128. During the actual inspection, the false detection and leakage detection have a powerful influence on the inspection work. We applied recall, precision, false alarm, average precision (AP), and mean average precision (mAP) to evaluate the detection performance.

C. DETECTION RESULTS
Our framework uses VGG16 and ResNet-101 as the base network and trains for 24 and 40 million iterations, respectively. The experimental results for 10 defect types are shown in Table 3.
According to the detection result of the transmission line defects, the mean average precision (mAP) of the base network ResNet-101 is higher than that of the VGG16 under the same number of iterations. At the same time, the mAP of the network is different under different iteration times. Under the same base network, the more iterations, the higher the accuracy of the model. The detection model Faster R-CNN D has the highest accuracy for each defect, and the AP of grading ring defect reaches 99.72%, the triple-plate defect, tower foundation defect, and foreign body defect are more than 98%, the average accuracy reached 91.1%. Table 4 shows the other results of the four models. Faster R-CNN D has the highest recall and precision, and the false alarm reaches 0.68. Therefore, Faster R-CNN D is selected as the follow-up research detection model. The part of the recognition results is shown in Fig. 9. It can be seen from Fig. 9 that the model has a better recognition effect on transmission line defects.

D. ROBUST EXPERIMENT
In practical applications, the aerial images collected by UAV inspections often have complex and diverse backgrounds. The backgrounds of images in different seasons and geographic environments vary greatly. At the same time, due to the impact of shooting time, the light intensity of the images changes acutely. Therefore, this section will verify the robustness and generalization of the trained detection model for transmission line defects.

1) EFFECT OF COMPLEX BACKGROUND
This part uses the background-dataset to analyze the detection performance of the proposed detection model for defect types in different backgrounds. There are four main backgrounds: building, forest, river, and farmland. The experimental results are shown in Table 5. The name of each category is represented by the first letter.
For different defect types, triple-plate defects have the highest recognition accuracy in the above 4 backgrounds, reaching 99.0%, and the recognition accuracy is more than 90% except for bird thorn defects and earth wire defects. For bird thorn, it may be attributed to the absence of a standard theoretical basis for non-defected bird thorn, and the defects were determined manually, i.e., it is not clear how VOLUME 8, 2020  many bird thorns are classified as defect of bird thorn, so it has a significant influence on the final recognition result of bird thorn defect. As for the earth wire, its color is similar to the background of the image. After the multi-layer convolution operation, the features of the earth wire are lost, resulting in a lower recognition result compared to other defects. For different background types, the mAP reaches 97.5% in images with farmland background, and the mAP is above 91.5% for all background types. For farmland, it is generally empty and has a single color, so it has less impact on object detection. However, for building, forest, and river, other irrelevant objects and various colors often appear in them, making detection difficult. So their accuracy is slightly worse than that of farmland.
In general, for the aerial images of 4 different backgrounds, the average mAP of 10 types of transmission line defect reached 93.6%. It can be considered that the detection model of transmission line defects in this paper has strong robustness under complex backgrounds.

2) EFFECT OF LIGHT INTENSITY
For the light-dataset, the contrast and brightness of the image are adjusted to simulate the lighting changes during the inspection, and the robustness of the detection model to lighting changes is verified. For different α and γ , the test results are shown in Table 6.
The detected images had the highest mAP reaches 95.0% when γ and α were 40 and 1.0. For α in the range of 0.8-1.4 and γ in the range of -10-40, the lighting conditions of the image are in line with most practical situations. The mAP of the detection model was almost unchanged, except that α was 0.8 and γ was -10, both of which were above 92.0%. When γ and α are -20 and 0.4, respectively, mAP is 61.5%. However, this case is a result obtained under extreme environments.
Then, we conduct separate experiments under different α and γ conditions to discuss the effects of contrast and brightness. The experimental results are shown in Fig. 10. Fig. 10 (a) shows the results with different α. As α increases, 38456 VOLUME 8, 2020 the overall mAP of each defect shows an upward trend, but when α is greater than 1, the mAP of earth wire defect decreases slightly. The main reason is that there are many high voltage wires in the background of the image, and the color of the tower poles is similar to the earth wire. Fig. 10 (b) shows the experimental results obtained under different γ . With the increase of γ , the overall mAP also slowly increases. When γ is greater than 0, all mAPs are above 80% except earth wire defect. The main reason is that when the brightness of the image is very intense, the color of the earth wire is similar to the tower poles in the background, which causes the recognition accuracy to decrease.
When α and γ gradually increase, the contrast and brightness of the image increase, then the image becomes sharper. Then the features extracted by CNN will be more effective, resulting in an increase in recognition accuracy. However, excessive brightness and contrast will reduce the recognition accuracy, as some objects (e.g., earth wire) will become blurred.
On the whole, increasing the contrast and brightness of the image advisably can improve the recognition accuracy. By testing on images that simulate changes in illumination, we can draw conclusions that the proposed detection model for transmission line defects can effectively identify and detect images taken under different light intensities such as cloudy, sunny, morning, and afternoon, i.e., the detection model is robust to different light intensities.

E. COMPARISON WITH OTHER METHODS
The proposed detection approach is compared with the other three deep learning based object detection architectures: Fast R-CNN [23], YOLO, and SSD, since traditional methods were only applied for a single component category. Fast R-CNN, the predecessor of Faster R-CNN, is a region proposal based method, while YOLO and SSD are regression-based methods. They were trained in a normal way with the Wire_10 dataset. The comparison results with other methods are shown in Table 7. It shows that the proposed method has excellent recall and precision compare to the previous work. Faster R-CNN combines region proposal and CNN classification by introducing RPN to form an end-to-end detection network, which improves the accuracy and speed of detection. YOLO and SSD convert the object detection task into a regression problem, which speeds up the computation time for a single image but also reduces the accuracy. The proposed method has excellent recognition accuracy for 10 defect categories in the Wire_10 dataset.
After the model is well-trained, we can apply it to the transmission line inspection system. The defect detection method is a momentous technique and necessary prerequisite for intelligent inspection of transmission lines. During the actual inspection, the aerial image obtained by the UAV can accurately detect the defects of the transmission line through the detection model, and guide the subsequent maintenance work.

IV. CONCLUSION
In this paper, the object detection technology in computer vision is used for the detection of transmission line defects based on aerial images. Instead of the traditional manual detection, it can achieve the intelligence of image recognition and reduce the labor intensity. Aiming at the problem that there is no open and standard dataset in the field of transmission line components, a transmission line defect dataset named Wire_10 was established by UAV, and the detected categories were increased to 10, including 8 types of component defects, nest, and foreign body. Based on the region proposal network, a defect detection network for transmission line defects was constructed. The best network model was selected by comparing the recognition accuracy of different iteration times and base networks. The experimental results show that the detection model achieves the mAP of 91.1% and the false rate of 0.68% on the Wire_10 dataset. It is also verified that the model is robust to aerial images under complex backgrounds and different light intensities. This method has practical engineering application value.
The method proposed has been put into use, and the detection effect of transmission line defects can meet the practical application requirements. However, due to the limited data obtained through UAV, there are some false detections and missed detection problems in winter and non-rural areas. We will gradually expand and improve the Wire_10 dataset in future work. At the same time, the current dataset is coarse in the classification of 10 categories, and the specific classification of transmission line components is very intricate in practice. We will refine the types of various component defects and standardize the dataset further to enable transmission line inspection to obtain more accurate and useful information.