An Improved AlexNet for Power Edge Transmission Line Anomaly Detection

Since most outdoor transmission line equipment suffers from harsh disasters, they are prone to wire breakage, tower collapse and insulator flashover. When anomaly occurs, too much time is required for the State Grid Corporation to fix it manually. To reduce the inspection burden, many methods have been presented in the past to diagnose and locate anomaly. In this paper, we propose an improved AlexNet model for anomaly detection. In the aspect of feature extraction, the proposed model extracts the characteristics of transmission line equipment through a deep convolutional neural network (DCNN). In the aspect of the recognition algorithm, by referring to the advantages of the traditional machine learning method and incorporating the advantages of the support vector machine (SVM), an SVM classification method incorporating deep learning is proposed. Finally, the improved AlexNet model and SVM classification method are used to classify images of various types of power equipment. The results show that the proposed methods can be effectively applied to the image recognition of various types of power equipment, and they greatly improve the recognition rate of power equipment images, which has great potential for future real-time transmission line monitoring platform design.


I. INTRODUCTION
With the rapid development of power electronics systems, anomaly detection and localization are the main concern of research efforts in the ubiquitous power internet of things (UPIoT) [1]- [6]. Because anomaly in electrical power systems cannot be avoided, enough information provided from the anomaly analysis is needed to recognize the cause and interpret the broken-down system. It is also necessary to restore the transfer of power as soon as possible, in addition to learning more about the system to reduce the occurrence of future anomaly if possible. Circuit breakers and other control components are designed to help protect the relay and to take appropriate action and thus minimize power loss and length of power disruption.
While the power load is increasing, which has promoted the scale of power systems to become increasingly large, the number of power transmission lines has also increased. These transmission lines are responsible for power transmission tasks between power plants and load centers and power The associate editor coordinating the review of this manuscript and approving it for publication was Lu Liu . exchange between different locations. Therefore, power transmission lines are important for the stable operation of the power system. Since transmission lines are set up over long distances and exposed outdoors long-term, they inevitably suffer from harsh environments such as high temperatures, rain, snow, lightning, and hail. Therefore, the occurrence of failures is inevitable. With the rapid evolution of artificial intelligence and edge computing, it provides cloud computing and cache capabilities to reduce the maintenance cost of power equipment. Therefore, it is a novel architecture which integrates the elements of humans, smart devices, deep learning, and combines the power of mobile edge computing architecture and the large-scale sensing ability.
Transmission line component inspection plays a significant role in industrial processes, especially power system monitoring and protection. In traditional power systems, transmission line component inspection usually depends on human resources. Although these approaches have high inspection accuracy, they result in high costs and even unexpected safety issues. Smart devices, such as sensors, robotics, and unmanned aerial vehicles (UAVs), have made great contributions to enabling the development of an efficient unmanned inspection system for transmission line and tower monitoring applications [7]- [12]. In the terminal level power vision edge intelligence, the vision image of the sensing terminal can be processed by AI chip or small-scale edge computing device to complete the automatic allocation of inspection tasks, as shown in Figure 1. In particular, UAVs have become the most popular device to support automated transmission line inspections with easier operation conditions and acceptable expenses. This technology can obtain real-time videos and digital images for the monitored overhead transmission lines. With recorded inspection images and videos, system operators can apply digital imaging processing techniques, machine learning algorithms, and deep learning algorithms to detect the electrical components and therefore monitor the field more visibly. Most of the current inspections of transmission lines use aerial photography of drones. Maintenance personnel observe the video of aerial photography on the ground in real time and inspect the transmission lines by the naked eye. With the development of computer vision and pattern recognition, automatic inspection technology for aerial images has been developed [13]- [16]. Combining advanced technologies and methods in the inspection of high-voltage transmission lines, including image acquisition, digital image preprocessing, and final recognition, can not only greatly reduce the number of maintenance personnel required during inspections but also eliminate human involvement caused by subjective factors and improve efficiency to make the assessment of the state of the power grid more accurate and effective.
In recent years, unmanned aerial vehicles (UAVs) with onboard high-definition cameras have developed rapidly, and their applications are very extensive [17]- [20]. Due to their compact model, simple operation, superior human-machine interaction performance, and high resolution of the onboard camera, power researchers are gradually using drones for inspections of transmission lines. Taking advantage of the simplicity and portability of drones, workers began to use drone onboard cameras to inspect transmission lines and capture video images of transmission lines. A large number of transmission line images taken by drones provide a large quantity of research image data for staff who use image processing methods to study transmission lines. However, the number of video images of power transmission lines taken by aerial drones is extremely large, and the memory occupied is particularly large, which has caused staff to process video images extremely slowly in the later period. How to quickly analyze and process the transmission line images of aerial drones has become one of the important work directions for researchers.
Aerial photography is one of the important sources for detecting transmission line status. The image processing method is used to diagnose anomaly information such as icing status and bird net. The image detection model usually uses image processing to analyze, judge and classify the collected images according to the manually set standards. An online monitoring system can monitor the power transmission lines in real time. However, the online monitoring system cameras are often severely affected by the environment, making the camera easy to cover with ice, blur, and occlusion, resulting in low resolution of the captured images, which seriously affects the judgment accuracy. Therefore, when using image processing methods to judge the information of the transmission line images, the images collected by the online monitoring system cannot be used well. Therefore, how to acquire images suitable for image processing method research is a very valuable research question.
To achieve a fast and accurate overhead transmission line component detection, this work mainly focuses on creating a more profitable inspection image library and improving detecting algorithms. The ImageNet database, an image database organized according to the WordNet hierarchy, is often used as the inputs. for the object detection training model. However, in this database, under the ''cable, line, transmission line'' syn-set, only a total of 1.290 images can be found. This database cannot provide satisfactory results because the number of images for training is too small and some electrical elements, such as spacer rods, electric power fittings, and some insulators are not included. As a result, the detection accuracy will not be promising. This paper aims to create the first self-learning-based overhead transmission line inspection image library/database. This database contains the ground truth information of various electrical components in the image. They are a benchmark for the training process and can be expanded as more inspection images are included. Another gap is that existing deep learning algorithms still cannot meet the requirement of real-time detection regarding their calculation speed and detection accuracy. This paper improves the structure of the existing AlexNet model in both the convolutional layer and pooling layer to improve the detection speed and accuracy. Therefore, this paper proposes a transmission line component detection method based on a deep learning framework based on the improved AlexNet model [21], [22] to achieve higher efficiency, effectiveness, and reliability in the line inspection process. In terms of feature extraction, VOLUME 8, 2020 an improved model to extract the characteristics of power equipment through denoising and normalization is proposed. In terms of the recognition algorithm, after considering the advantages of traditional machine learning methods and the advantages of a support vector machine (SVM), we adopt the SVM classification method combined with deep learning. The improved AlexNet and SVM classification method is used to classify images of various types of power equipment, and the validity of the proposed method is verified. The rest of the paper is organized as follows. Section II briefly reviews some related works, and then we elaborate on the details of our improved AlexNet algorithm in Section III. Moreover, we present the comparative experimental results and analysis in Section IV. Finally, Section V gives the concluding remarks of the paper.

II. RELATED WORK
Anomaly detection in UPIoT is considered a system-level anomaly diagnosis in the dispatch center, where we monitor the status of various types and levels of protection devices and circuit breakers, the voltage and current measurement of electrical quantities [23]- [25]. According to qualitative and quantitative analysis, which are the two main directions for constructing stroke prewarning systems, possible anomaly components and fragile locations are inferred, and the relevant smart decisions are provided to the State Grid. When the grid fails, precise, rapid, automatic emergency maintenance has important significance for the rapid recovery of the power grid. Power systems are prone to faults at times. When anomaly occurs, consumer supply is disrupted, thereby leading to a loss of economy as well. The objective of any electrical power system is to maintain continuity of supply so anomaly is undesirable. Although many anomaly identification methods exist, the use of SVM is almost a novel approach in this concern. A common question arises regarding what the system needs to identify when various signaling systems are already installed. It is just another system for the same purpose, which uses learning techniques to learn anomaly characteristics and apply math to determine whether a new input is faulted or healthy.
The transmission line is the most likely element in the power system to be the location and reason for anomaly, especially when their physical dimension is considered. Intelligent techniques that have their bases on artificial intelligence are under investigation to increase the consistency, rate and accuracy of present digital relays. Traditional digital image processing methods, including Haar wavelet transformation, the scale invariant feature transform (SIFT) algorithm, and saliency detection and image segmentation, are usually based on image binarization and edge detection [26]- [29]. All the features are predetermined and manually extracted to detect the electrical elements. In reality, it is often hard to obtain an inspection image with satisfactory quality due to the impact of the environment, light, shooting angle, and other uncontrollable conditions. The distortion and mixing of noises in the images will result in reduced processing speed and detection accuracy. As a development, histogram of oriented gradients (HOG) classifiers are used for object recognition by providing them as features to machine learning algorithms such as Bayesian networks and the AdaBoost algorithm. These machine learning algorithms have been widely applied in the electrical component detection process. These methods improve the detection accuracy and reduce processing speed [30], [31]. However, the determination of detection window parameters and possible feature sets rely on prior knowledge and therefore make them hard to apply to any given inspection image set. With the increase in deep learning algorithms, the new trend is to replace the HOG classifiers in machine learning algorithms with more accurate convolutional neural network (CNN)based classifiers. Several deep learning algorithms have been widely applied for object detection. You only look once (YOLO) and the single-shot multibox Detector (SSD) are the most common single-stage detectors [32]- [36]. For YOLO, detection is a simple regression problem that takes an input image and learns the class probabilities and bounding box coordinates, SSD runs CNN on the input image only once, calculates a feature map and predicts the bounding boxes and classification probability [37]- [41]. Although these detectors have a leading advantage in calculation speed, their recognition performances are not satisfactory, especially when the objects are small or close to each other. The multistage detectors include the regional convolutional neural network (R-CNN), and the improved algorithms, spatial pyramid pooling (SPP-net) and Fast-RCNN. The methodological improvement compared to the single-stage detector is that region proposals are first collected through a selective search (SS) algorithm. From the obtained region proposals, feature extraction and object identification using the CNN model are then accomplished. Furthermore, Faster R-CNN and Mask R-CNN have been designed to improve the detection accuracy as well as reduce the image processing speed. Table 1 compares the advantage and disadvantages of existing CNN-based detecting algorithms.

III. AN IMPROVED ALEXNET FOR LINE COMPONENT DETECTION
For an image of a transmission line, the staff can easily determine the content in the image and can clearly analyze whether the conductor is broken, whether the insulator is falling off, and whether there are bird nests. However, in actual aerial inspection work, there are many aerial images at one time. Manually processing these images is very time consuming, and it may delay the repair time for more serious anomaly. Aiming at this problem, this paper uses the popular method of deep learning to process the image of transmission lines, thereby solving the problem of manual workload. Therefore, the staff's work efficiency is improved to a certain extent, and it has practical application effects. Deep learning is an end-to-end machine learning system that uses raw data as a data input source. Deep learning can automatically learn the deep features of the input data layer by layer. Compared with manual features, deep features are more abstract and expressive. Due to the use of complex models, deep learning can effectively reduce model bias. It can also use a scalable gradient descent algorithm to solve large-scale optimization problems and calculate model parameters through large-scale training data, thereby improving the recognition accuracy. This paper addresses the shortcomings of traditional methods and the practical problem of reducing the workload of workers. It introduces deep learning algorithms to identify the features of transmission line images and uses computers to automatically identify the features of transmission line images to the reduce the work burden of personnel. The network used in this article is AlexNet designed and proposed by Krizhevsky et al. [21]. The network structure is shown in Figure 2.

A. ALEXNET MODEL
As seen from the AlexNet network structure, Figure 2, the network model is divided into eight layers: the first five layers are convolution layers, and the next three layers are fully connected layers. Each convolution layer contains the excitation function ReLU and local response normalization (LRN) processing and needs to undergo downsampling (pool) processing. First, a color image is input to the first layer, and this image is extracted by 96 convolution kernels with size specifications 11 × 11 × 3. Each convolution kernel generates a new pixel. After detailed calculation, it finally generates two sets of 55 × 55 × 48 convolved pixel layer data and then uses the ReLU activation function to ensure that the value of the feature map is within a reasonable range. These pixel layer data are processed by the relu1 unit to obtain two sets of active pixel layers. Then, the ReLU activation function is used to ensure that the value of the feature map is within a reasonable range. Then, the generated pixel layer is processed by a maximum pooling operation to obtain 27 × 27 × 96 pixels. Finally, a pixel layer with a size of 5 × 5 is obtained through normalization processing. The size of the pixel layer formed after the operation of the first convolution layer is 27 × 27 × 96, which corresponds to 96 convolution kernels and is divided into two groups, each of which is operated on an independent GPU. In the second convolutional layer, 256 5 × 5 filters are used to perform the second feature extraction on the feature map input by the first layer. After going through a process similar to the above, two groups of 13 × 13 × 128 pixel layers are obtained, corresponding to two groups of 128 convolution kernels. Then, the feature map of the second layer is transferred to the third layer and then to the fourth layer until the 4,096-dimensional vector data output by the seventh layer is fully connected with the 1,000 neurons in the eighth layer. After training, the output is the training value. The operations performed by each convolution layer are different, and the specific calculation process is shown in Figure 3.
The process of the AlexNet model can be described as follows: take a sample (X , X p ) from the sample set, where X is the input image, X p is the category of X , and X is also the input value of the whole network, and the output is calculated by where W (n) (n = 1, 2 . . . Due to the particularity of the transmission line image application scenario, the published images can hardly be used directly as the training set of deep convolutional neural networks (DCNNs). Similarly, due to the lack of quantity, the output effect of the DCNN network is not representative or is even wrong. These advantages are greater than those of other networks. By using the AlexNet network, the reliability of the output results can be increased to a certain extent, and the feature of data enhancement also makes the data-poor transmission line image training model more effective.
In this paper, considering the particularity of transmission line feature extraction, the output of the original AlexNet network is improved. The last output layer sigmoid of the original network is removed and dropout, normalized processing and other operations are added to improve the overall performance of the transmission line image feature recognition system. The Figure 2 shows the effect obtained by convolving the image using the AlexNet network. The figure on the left shows that the input image is convolved by 96 convolution kernels. The nucleus has different responses, which are manifested in the light and dark of the black and white color of the feature map. The Figure on the right shows that the same input image is convolved by 256 convolution kernels. As can be seen, the images have different responses to the 256 convolution kernels. However, after comparison, it can be found that when using 96 convolution kernels, the response effect of the image is better, so this paper uses the output effect of 96 convolution kernels.
The DCNN-based transmission line recognition model proposed in this paper is divided into two phases: the training phase and the testing phase, as shown in Figure 4. The training phase randomly selects 1,000 images from the dataset established in this paper as the training set. The training set contains several types of power transmission line images that have been labeled. This training set is then preprocessed by cropping and segmentation, cropped to a size of 227 × 227, and then the cropped image is input to the DCNN to extract depth features. After a series of convolution calculations through deep learning, a 4,096-dimensional feature vector is output. These vectors are actually the probability values of the features in each category of the training image. Then, the feature vectors obtained above are input to the SVM multiclassifier to obtain the final image recognition model of the transmission line. In the test phase, the transmission line feature recognition model obtained above can process images of any size. The smaller the image is, the faster the processing speed, and, in contrast, the larger the image is, the slower the processing speed. Considering the completeness of the image information of the transmission line used for testing, this article chooses to reduce the speed slightly to improve the reliability of the system judgment.

B. FEATURE RECOGNITION FUNCTION
A qualified model should be able to identify the features of the test image based on the features of the training set. In this paper, the features extracted by DCNN are used as the features of the training data set, and then the image retrieval method obtained in the previous work of this article is used to obtain the user preference model.
Assume U = {Iu1, Iu2, . . . , Ium} is a manually selected and labeled image set, where I u represents a labeled image, and then extract the features of the entire training set to obtain a model . For a given image I u , extract its visual features and retrieve similar images. We use formula (2) to retrieve images with similar features in the joint visual space, where ψ is the adjacent visual space of the marked feature category.
For each transmission line image used as training one, the same strategy is adopted. m images are dynamically 97834 VOLUME 8, 2020 retrieved from with similar features, and these images are connected into a retrieval result S. Once the entire retrieval result is obtained, it can be used to learn a function for transmission line feature recognition. Finally, when an image of a transmission line is again input to the transmission line feature recognition function, it can be automatically determined which category the image belongs to, thus saving staff energy.
During the process of similarity feature retrieval, formula (3) is used to reduce the error in the training phase.
The ceiling function indicates that the retrieved value result jumps to a higher priority. For example, in the labeling process described, a transmission line image contains multiple components, as shown in the bird's nest Figure 5. This figure contains wires, insulators, and a bird's nest. However, when marking, because the wires and insulators are normal and the bird's nest is a kind of anomaly, the bird's nest has higher priority than the wires and insulators in this image, so the features with lower priority should obey the features with higher priority. Thirty thousand images were used as training samples, and the remaining 2,000 were used as testing samples. Figure 5 displays several inspection image labeling examples in the library. The anchor box info and component label successfully present the ground truth information of all the images. To avoid overfitting in deep learning neural networks, we artificially enlarged the dataset using label-preserving transformations. Transformed images are usually generated from the original images with little computational cost, so these transformed images do not necessarily take up memory space, as shown in Figure 6, 7. In our experiment, these transformed images were produced in Python on a CPU while the The GPU was trained on the previous batch of images. Therefore, the data augmentation method is free of computational resources.  Since the drone airborne cameras currently used are very advanced and the images taken are also very clear, the drone aerial images actually taken are often very large. However, VOLUME 8, 2020 the oversized image does not meet the requirements of DCNN input and cannot be directly input into DCNN for learning, so the initial input photo needs to be cropped and segmented to meet the DCNN network input image size requirement. The AlexNet network used in this article requires that the size of the input image is 227 × 227, so before inputting the DCNN, you need to add the step of compression and cropping. For the convenience of calculation and complete use of the entire image, the input photos were uniformly compressed to 1, 135 × 681, and then the image was cropped into 227 × 227 small pieces. This meets the requirements of DCNN input and it can also expand the dataset, and images that do not contain any labeled features after cropping were automatically skipped by the system. Here, it is important to note that the cropped small blocks still retained the original information of the original images, such as the coordinate position information of aerial images. This is very important information. If the detected image is abnormal, then the staff of the maintenance standards need to be notified through the original coordinate position of the imaging equipment. After the image was cropped, these small blocks were stored uniformly as part of the dataset, and then the dataset was labeled with a labeling tool.

B. PERFORMANCE EVALUATION
After several experiments, we trained these datasets for classification. The average precision of classification is shown in Table 1. Table 1 shows the results of several types of images in the datasets. According to the table, the precision of the insulator and the bird's nest were relatively high. The main reason is that the characteristics of the insulator and the bird's nest were relatively obvious and easy to detect compared to the transmission line.
Furthermore, the overall average precision of the transmission line proposed in this paper is calculated as 0.8355, which is a very impressive result in multiclass experiments. Because there is little research on image classification of transmission lines, to illustrate the reliability of the feature recognition algorithm proposed in this paper, we compare it with the other methods that specialize in image classification, as shown in Table 2. Compared with the best detection work, the proposed model has slight limitations. However, compared with similar methods, it still has certain advantages. The classification method in this paper is mainly for the specific images of transmission lines, while the better performance methods of these predecessors' work were aimed at landscapes. There are many studies on landscape-type datasets and improved algorithms. Based on this large number of works, these classification experimental results outperformed our proposed method. To illustrate the reliability of the transmission line feature recognition algorithm in this paper, we attempted to reproduce the algorithm in the literature [7]. After we repeated the experiments using the transmission line dataset instead of the landscape image, the average precision was only 75.56%, 7.99% lower than our algorithm, which demonstrates that the proposed method for transmission line feature recognition is effective and reliable.

V. CONCLUSION
For transmission line feature extraction and anomaly diagnosis, this paper proposes a detection method based on AlexNet as a solution for the diagnosis of anomaly in power transmission components. The experimental results demonstrate that our algorithm has great advantages in the classification performance and wider generalization ability even in a small group of samples in UPIoT. The proposed framework includes the inspection image library collection, data augmentation, improved AlexNet model generation, and performance analysis. The highlights are collecting a transmission line component inspection image datasets and improving the convolutional and pooling layer structure of the model. The proposed improved AlexNet-based framework showed almost 84% mAP includes an insulator, wire, fitting, and tower detection. Through image recognition of the various types of power equipment, the conclusions are as follows: • The proposed method in this paper can be effectively applied to the image recognition of various types of power equipment, and the obtained accuracy is high.
• The image features extracted by AlexNet have a high abstraction degree and strong expression ability. Compared with the single CNN, our model can obtain richer image features.
• Compared with the other methods, the accuracy of the image recognition by the model classifier is higher.

DATA AVAILABILITY
The data used to support the findings in this study are available from the corresponding author upon request.