Detection and Monitoring of Power Line Corridor From Satellite Imagery Using RetinaNet and K-Mean Clustering

Monitoring of electrical transmission towers (TTs) is required to maintain the integrity of power lines. One major challenge is monitoring vegetation encroachment that can cause power interruption. Most of the current monitoring techniques use unmanned aerial vehicles (UAV) and airborne photography as an observation medium. However, these methods are expensive and not practical for monitoring wide areas. In this paper, we introduced a new method for monitoring the power line corridor from satellite imagery. The proposed method consists of two stages. In the first stage, we used the existing state-of-the-art RetinaNet deep learning (DL) model to detect the locations of the TTs from satellite imagery. A routing algorithm has been developed to create a path between every adjacent detected TT. In addition to the routing algorithm, a corridor identification algorithm has been established for extracting the power line corridor area. In the second stage, the k-mean clustering algorithm has been used to highlight the VE regions within the power line corridor area after converting the target satellite image into hue, saturation, and value (HSV) color space. The proposed monitoring system was able to detect TTs from satellite imagery with a mean average precision (mAP) of 72.45% for an Intersection of Union (IoU) threshold of 0.5 and 85.21% for IoU threshold of 0.3. Also, the monitoring system was able to successfully discriminate high- and low-density vegetation regions within the power line corridor area.


I. INTRODUCTION
Electricity is of major importance to modern life, where a power outage can pose risk to livelihood and economy. Electrical transmission lines carry electrical energy to substations where the transmission lines form the basic infrastructure for the electrical power transmission system. The process of transmitting electrical energy from generating stations needs to be uninterruptible. However, several natural challenges such as forest fires and vegetation encroachment may cause interruptions that can lead to power outages [1]. There are many remote sensing methods which are used for monitoring transmission lines such as periodic manual inspection, usage of light detection and ranging data, synthetic aperture radar (SAR) data, aerial photography data from unmanned aerial vehicles (UAV), and airborne photography. Most of The associate editor coordinating the review of this manuscript and approving it for publication was Shiqi Wang. these methods are time-consuming, expensive in relation to the coverage area, and unsuitable for rough terrains [2], [3].
Using high-resolution satellite imagery is a promising solution for monitoring power transmission lines. Satellite imagery provides data for a wide coverage area with relatively low cost compared with other optical remote sensing (ORS) methods. Although satellite images can cover wide areas, they may lack in data accuracy [4], [5]. Most of the previous works in power transmission lines detection and monitoring depend on high-resolution UAV and aerial images that cannot rapidly cover wide areas compared with satellite images [6]- [12]. The previous studies on transmission line monitoring and detection from ORS images can be categorized into four main categories: • Vegetation index (VI)-based methods, which depend on the natural behavior of plants, where the near Infrared (NIR) spectrum is reflected and the red color spectrum is absorbed. Although these methods can be effective for detecting vegetation encroachment, the VI-based methods have no ability to detect transmission towers (TTs).However, VI-based methods can be integrated with digital elevation modelling technology to detect the locations of the TTs [13]- [15].
• Stereo matching-based methods, where a 3D image is constructed from two different ORS images that are pointing to the same ground truth location. The main advantage of these methods is the ability to estimate the heights of objects surrounding the corridor area. Stereo matching-based methods depend on the availability of at least two satellite images from two different observation perspectives to form the disparity map [16]- [20].
• Object detection-based methods, where a set of image pre-processing steps need to be applied to detect the TTs locations and the corridor area. Object-detection based methods, involves manual setting of a set of filters for every different type of satellite image which make these methods ineffective [21], [22].
• Machine learning (ML)-based methods where most of these studies used ML algorithms for inspecting the integrity of the TT components and identifying possible failures from low altitude UAV images [23]- [31].
The automatic vegetation encroachment detection from satellite images have not been adequately studied. In this paper, we introduced a new automatic technique to monitor the power line corridor right-of-way from satellite imagery using the RetinaNet deep learning (DL) model and k-mean clustering.

II. BACKGROUND
This section provides an overview of the RetinaNet model and K-mean clustering algorithm.

A. RETINANET
RetinaNet, which is introduced by Lin et al. [32], has been mainly used to solve image detection and classification problems. This DL model consists of two stages, as shown in Figure 1: • The first stage is the backbone network, which consists of two components. The first component is the base convolutional neuron network (CNN), which reduces the size of the input image through several convolutional layers from bottom to top to generate the feature map. Many CNN can be used as a base model for RetinaNet, such as ResNet 33] VGG [34], and mobileNets [35]. The second component of the backbone network is the feature pyramid network (FPN) [36], where the last layer of the CNN will be up-sampled into higher scales from top to bottom to enable object detection at different sizes.
• The second stage is the prediction network, which consists of two components. The first component is the classification subnet which takes the combined output from every level of the FPN and predicts the output class k corresponding to every anchor A through a fully connected convolutional network (FCN) that has a width W and a height H . The second component is the regression subnet, which also consists of an FCN that takes the combined output of each level of the FPN and predict the anchor coordination 4A around the TT. According to the original work of RetinaNet [32], the performance of the RetinaNet model was compared with other single-stage and two-stages state-of-the-art DL models such as Faster R-CNN+ + + [33], Faster R-CNN with FPN [36], Faster R-CNN by G-RMI [37], Faster R-CNN with TDM [38], YOLOv2 [39], SSD513 [40], [41], and DSSD513 [41], and the best mean average precision (mAP) achieved by RetinaNet with 59.1%, where the Intersection of Union (IoU) threshold was 0.5.

B. K-MEAN CLUSTERING
K-mean clustering is an unsupervised ML algorithm. The main idea behind k-mean clustering is to group different input data points based on the distance from the nearest centroid value, where the number of the desired centroids are related to the value of k. The automatic clustering process begins VOLUME 9, 2021 by setting an arbitrary centroid value; then, the location is adjusted so that each group of input data is associated with the nearest centroid [42]. The same approach also can be used to group similar neighbor values of image pixels into a unified pixel value, such as shown in Figure 2.

III. RELATED WORKS
ML algorithms have been implemented in a wide range of satellite imagery applications, such as land mapping and object detection [43]- [47]. However, the classification performance of the ML algorithms can be affected by the type of the extracted features, which can be a challenge in selecting the most related features. DL algorithms can be used to avoid this downside by taking the whole image as an input to the deep neuron network. This technique can automatically create a feature map through a set of deep CNN layers. The objects in satellite images are small in size with respect to the whole image, making objects detection localization in satellite images a challenging issue for ML algorithms. For instance, Malof et al. [48] used random forest ML classifier to detect small-scale solar photovoltaic panels from highresolutions satellite images. The authors used a set of image pre-processing operations after the classification process to enhance the object detection and localization processes. The current state-of-the-art DL algorithms can solve both classification and localization problems that encouraged the choice of using DL algorithms for object detection in satellite imagery. Many works on using DL models for detecting small objects from satellite imagery showed promising results. For instance, Yun Ren et al. [49] have modified the R-CNN DL model to detect small objects in satellite imagery by reducing the size of anchors in the region proposal network (RPN) and adding a single high-level feature by designing a top-down and skip connection to the R-CNN architecture. The proposed modified R-CNN method achieved an average precision (AP) of 72.9% in ship detection.
Wang et al. [50] used RetinaNet to detect ships from SAR imagery. The study compared the performance of different DL models where the RetinaNet achieved the best mAP of 96.9% in comparison with SSD, Faster R-CNN and FPN.
One of the most used method to detect vegetation activity from satellite images is the normalized difference vegetation index where the vegetation observation process depends on the availability of the NIR band and the red green and blue (RGB) bands [51]- [53]. However, this method involves multispectral satellite images.
Several studies used the hue saturation and value (HSV) color space to directly detect the vegetation activity from the image pixels without using the multispectral satellite bands. For instance Hassanein et al. [54] used the hue color value as a main feature to discriminate between vegetation regions by utilizing only the RGB values. The proposed segmentation method achieved a mean accuracy of 87.29%. A similar study by Xiao et al. [55] proposed a vegetation segmentation algorithm based on the distribution of the hue channel and the roughness of the image which depends on the distribution of the saturation channel, where the proposed algorithm was able to identify vegetation areas from satellite images.

IV. METHODOLOGY
This section provides details about the proposed vegetation monitoring system. The monitoring system has been divided into two parts. The first part includes the TT detection, TT routing process, and corridor extraction process while the second part includes the K-mean clustering process.

A. DATASET CREATION AND LABELING
Training a DL model heavily depends on the amount of the prepared dataset. Ample data is required to ensure that the model will be able to handle the new cases.
To train, evaluate, and test the RetinaNet model, several satellite images containing transmission lines were collected from different sources, such as Google Map, ESRI Imagery, Google Earth, the Malaysian Space Agency (MYSA) and the electric transmission and distribution infrastructure imagery dataset [56]. The RGB color space bands of the dataset have been extracted and cropped into 1300 × 1300 pixel, where Figure 3 shows the histogram of the pixel intensity distribution of the dataset. The histogram distribution shows a very high values for the green pixels and lower values for red pixels which describes the behavior of the vegetated areas. The output of the labeling process was 2498 labels, where 2014 labels were extracted for training and 484 labels, were used for testing. Since the TTs are difficult to be observed from satellite image, the shadows of TTs are incorporated with their body structure and labeled as one unit to provide more information for the CNN.

B. TRAINING
After the labeling process, the RetinaNet model was trained with the ResNet-152 as a backbone for the CNN. The ResNet-152 was pretrained using the COCO dataset [57] to reduce the initial error loss value and the training time. The training process was performed using Fizyr Keras python framework [58], [59]. The total training period was about 18 hours and 41 minutes. The number of epochs was 200, and the minimum error loss was obtained at epoch number −148. A Graphical Processing Unit (GPU) was used to accelerate the training process. Table 1 shows the main specifications and configurations used in the training process, and Figure 4 shows classification loss, regression loss, and total loss per epoch. The derivation of the RetinaNet loss can be described by the following equations [32], [60]: where CE is the cross-entropy loss, y is the class output and p represents the probability of the class where p ∈ [0, 1]. The probability p of the output class y will be 1 for every positive sample. The same equation in (1) can be rewritten such as in (2): where CE (p, y) = CE (p t ) = log(p t ).
The final notation of the RetinaNet classification loss is the focal loss (FL) as described in equation (3):  where α is a weight factor which is used to balance the output class, and γ is a focusing parameter where γ ≥ 0. On the other hand, the regression loss can be calculated as follows: L reg t u , v = i∈{x,y,w,h} where smooth L1 (x) function was used to reduce the effect of the outliers, L reg is the regression loss, t u represent the prediction of the anchor, v is the required anchor value, and {x, y, w, h} represents the anchor coordination. The final loss VOLUME 9, 2021 L can be written as follows: where λ is the control parameter of the loss balance. The overall power line monitoring method goes through several steps as shown in Figure 5. First of all, the size of the input image should be reduced to an appropriate size. Wherein this study, all the training and testing samples have been cropped into less than or equal to 1300 × 1300 pixels. In the second step, we used the trained RetinaNet model to predict the locations of the TTs. After the detection process a path creation algorithm has been used to create a path between every detected adjacent TTs. Before the monitoring step, a corridor extraction algorithm has been used to extract the power line corridor area from the background image.

C. TRANSMISSION TOWER ROUTING AND CORRIDOR EXTRACTION
The path of the power line can be predicted through the locations of the TTs especially, when the locations of the TTs form up a straight line. The purpose of identifying the power line path is to create a virtual path that describes the trajectory of the power line as illustrates in Figure 6. This process aimed to create a visual inspection mechanism that helps to observe whether the vegetation encroachment form a risk to the power lines. Algorithm 1 provides pseudocode of the TT routing process, which automatically draws a line through all the TTs in the image in order to establish the path of the transmission line. The algorithm starts with the detection process in the while loop (line 1-4) to produce the bounding box (BB) coordination, where box [b 0 , b 1 , b 2 , b 3 ] is the coordination list. The centers of the BB are computed by finding the middle point (line 2) of the BB. Since the RetinaNet algorithm randomly detects TTs due to the FPN effect, the centers of all BBs are consequently sorted in ascending order according to their locations in the image (line 5) starting from location (0,0) to location (1300,1300). In the last part, the algorithm will draw lines between every adjacent center draw_line((centers[i]), (centers[i + 1])) (line 10). Therefore, the final path between all TTs in the image will start from  the location with the least ranked to the highest ranked in the coordination list. However, if the TTs are closed to each other or there is a wrong classification of the TT, then the created path will be affected, and a wrong connection will be LISTING 2. Corridor detection algorithm which detects the power line corridor area by establishing two lines parallel to the routing lines then drawing a closed surface that isolates the corridor area from the background image. The distance between the center of the TT and the parallel lines is m where m can be determined by the user based on the surrounding environment.
generated. The purpose of the corridor extraction process is to confine the monitoring process within the corridor area. The power line corridor detection process is vital to power line safety. For example, power line corridor detection can be used to inspect the integrity of the power lines [8].
Algorithm 2 is the pseudocode of the developed corridor extraction algorithm. The algorithm starts by getting all the centre points of the TTs (lines 1-3). Then, for every detected adjacent TTs pair in the image, the algorithm extracts the center coordination of the adjacent pairs (x 1 , y 1 ), (x 2 , y 2 ) and draws two lines parallel to the routing path (lines 4-11) as shown in Figure 7. This aims to make the corridor detection process dynamic in which the corridor extraction algorithm can follow any sudden turns in the corridor path. The parallel lines represent the corridor border where m is the distance between the centre of the routing path and the surrounding environment. The distance m value governs by the type of the surrounding environment and can be predefined by the user.
In order to discover the closed corridor area, the algorithm draws two parallel lines (lines 12-17) to create a closed surface as shown in Figure 7. After determining the corridor area, the algorithm stores all the closed surface points (lines [18][19][20][21], sorts the points in a clockwise manner, and draws a contoured surface around the corridor area (lines [23][24][25][26].
This method is valid for all positions except in case if the centre points reside at the same alignment at 180 • the image should be rotate at any positive or negative angle to create an offset along the x axis.

D. VEGETATION MONITORING
To detect vegetation along the power line corridor, a color clustering-based technique has been used to distinguish between high-and low-density vegetation regions. The monitoring process was directly applied after the detection process. The proposed monitoring technique depends on the properties of the HSV color space, where the HSV channels can differentiate the VE regions. The direct conversion from RGB to HSV color space, will produce a very sharp colors in some regions, as shown in Figure 8. Therefore, the k-mean clustering algorithm was used to reduce the sharp colors by staging the colors into only five groups.
Experimentally, using visual inspection, we found that the visual vegetation-dense discrimination will be more precise when the number of color clusters k = 5 and k = 4. The conversion from RGB to HSV can be described in the following equations [63]: VOLUME 9, 2021  where R , G and B represent a rescaled RGB range that converts the color range from [0-255] to [0-1].

V. RESULTS
This section discusses about the evaluation results and the computation used to find the mAP. The evaluation test cases are 171 images that contain 484 TTs. The hardware specifications used in the training process were the same as those used in the evaluation process.

A. mAP
The mean distribution precision as known mAP is used to evaluate the performance of the detection model as in equation (16). The mAP can be calculated using the following equations (12) - (16) [64]- [66]: p = detected items detected items + undetected items (12) r = detected items detected items + false detected items (13) The relation between precision p and the recall r can be formed in the shape of a curve where the recall accuracy divided into 11 equal intervals, and the area under the curve represents the AP as in equation (14): The Pinterp(i) is the interpolation of the precision at the i instance of the recall where the interpolation can be calculated as in equation (15).
where p(ĩ) is the precision at a givenĩ instance.
where N is the total number of given samples and i is the instance sample.

B. INTERSECTION OF UNION
The IoU is the area shared between the ground truth BB and the predicted anchor over the total union area, as shown in equation (17), where A is the ground truth area and B is the predicted area [67].
The accuracy of the detection process depends on the threshold value of the BB.

C. EVALUATION RESULTS
The evaluation of the TT detection model was performed on 171 images that contain 484 TTs, where a mAP of 0.7245 was obtained for IoU ≥0.5 and 0.8521 for IoU ≥0.3. Table 2 shows the outcome of the mAP and IoU and. The average inference time was 0.14718 seconds, where the inference time is the time taken by the hardware to drive the result of a new sample.

D. DETECTION RESULTS
After evaluating the performance of the TT detection model, the detection system was tested on several new samples where the system was able to achieve the following, as shown in Figure 9: • Identifying the TTs in the image. • Identifying the transmission line right-of-way. • Extracting the spatial coordination for every TT in the image. The performance of the RetinaNet has been compared with others state-of-the-art deep learning models such as YOLO, SSD, and Faster R-CNN as shown on Table 3. All the models were pretrained on the COCO standard dataset and fine-tuned using the custom TT dataset.
However, DL accelerators such as GPUs can generate false hardware error that can propagate to the software calculations and effect the accuracy of the model [68].

E. MONITORING RESULTS
The proposed monitoring system has been tested on several images that contain a power line corridor crossing area. The k-mean clustering of the HSV conversion showed a good ability to discriminate between high-and low-dense vegetation regions, as shown in Figure 10.

VI. CONCLUSION AND FUTURE WORK
In this paper, we introduced a new method of monitoring the power line corridor through satellite images using RetinaNet and k-mean clustering algorithms.
The proposed monitoring method can automatically detect TTs with a mAP of 0.7245 for IoU ≥0.5.
Beside this, the monitoring system able track the power line where a routing algorithm has been developed to route all the TTs in the image. In addition to the routing algorithm, an automatic corridor extraction algorithm has been developed to extract the region of interest. We also have successfully shown that the monitoring of the VE near the TTs can be carried out by discriminating between high-with low low-density vegetation areas using the k-mean clustering algorithm on the HSV conversion of the satellite image.
This study has some limitations that can be improved in the future works. The suggested improvements are listed below.
• Improve the mAP by selecting the most related samples for training, which can also improve the false routing issue related to the false and miss classifications.
• Improve the routing algorithm to enable parallel line detection.
• Estimate the vegetation height, which can provide a better monitoring solution.
• Improve the monitoring method to enable monitoring in panchromatic satellite images.