Panel Segmentation: A Python Package for Automated Solar Array Metadata Extraction Using Satellite Imagery

The National Renewable Energy Laboratory (NREL) Python panel-segmentation package is a toolkit that automates the process of extracting accurate and valuable metadata related to solar array installations, using publicly available Google Maps satellite imagery. Previously published work includes automated azimuth estimation for individual solar installations in satellite images [1]. Our continued research focuses on automated detection and classification of solar installation mounting configuration (tracking or fixed-tilt; rooftop, ground, or carport). Specifically, a faster-region-based convolutional neural network Resnet-50 feature pyramid network model was trained and validated on 862 manually labeled satellite images. This model was used to perform object detection on satellite imagery, locating and classifying individual solar installations' mounting configuration and type. Model results showed a mean average precision score of 77.79%, with the model strongest at detecting fixed-tilt ground mount and fixed-tilt carport installations. The object detection model and its outputs have been incorporated into the panel-segmentation package's automated metadata extraction pipeline, which returns the mounting configuration and azimuth for individual solar arrays in satellite imagery [2]. The complete image dataset with labels has been released on the U.S. Department of Energy (DOE) DuraMAT DataHub, to encourage further research in this area [3].


I. INTRODUCTION
T HE United States solar industry has expanded rapidly over the past several years, with 23.6 GW of solar installed in the United States in 2021 alone, a 19% increase from 2020 [4]. This rise in investment has consequently led to a rise in solar industry acquisitions. According to a report released by Mercom Capital Group, over 34 corporate solar acquisitions were recorded in Q2 of 2021, an increase from 13 acquisitions in Q2 of 2020 [5]. During acquisitions, keeping accurate timely solar metadata, which includes installation tilt, azimuth, and mounting configuration, is particularly an issue, as data for solar installations may be lost during transference between owners. Furthermore, solar fleet owners generally record site information manually, which can lead to transcription errors. Operators may rely on costly site inspections to keep their data accurate and up to date. Having this accurate metadata is paramount for accurately predicting solar yield and degradation results. The National Renewable Energy Laboratory (NREL) panelsegmentation package was created to rectify missing or incorrect solar site information, by automating metadata extraction using preexisting, readily available Google Maps imagery [1]. Current package functionality includes automated detection and pixelby-pixel segmentation of solar installations in satellite imagery via a region-based convolutional neural network (R-CNN) architecture. Additional package functionality includes calculating individual solar array azimuth via an unsupervised computer vision (CV) pipeline, which includes connected components clustering, Canny edge detection, and a Hough line transform. This work is further described in [1]. Our previous efforts build upon a body of published research, which focuses on solar array detection in satellite imagery [6], [7], [8], as well as array characterization following detection [9], [10], [11].
In spite of this growing body of research, no known attempts have been made to characterize the mounting configuration of solar arrays via satellite imagery analysis. Correct classification of mounting configuration is important, as previous research has shown that degradation rates may be affected by mounting configuration type [12]. Specifically, Jordan et al. [12] demonstrated that for a location with multiple installations of the same module type but varying mounting configurations, installations on large metal roof sections have higher degradation rates than installations with carport and rack mounting configurations. This occurs because metal roof types are correlated with higher operating temperatures, which can lead to faster degradation rates. Furthermore, knowing if an installation is fixed-tilt or single-axis tracking is paramount for modeling system performance accurately, as single-axis tracking systems have a higher expected energy yield, higher capacity factor, and different daily performance profile than fixed-tilt systems [13].
This research focuses specifically on the detection and classification of mounting type in satellite imagery via object detection, as well as integration of the resulting model into the metadata extraction pipeline. Not only do we present our object detection model results, we also outline the newly updated This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ metadata extraction pipeline, which includes mounting configuration detection. These updates are publicly available via the NREL panel-segmentation package [2].

A. Datasets
Satellite images at varying latitude-longitude coordinates were collected using the Google Maps application programming interface (API). The NREL photovoltaic (PV) fleet performance data initiative database, which contains data for over 1200 different solar sites across the United States, was used to build the dataset [14]. This database, which supports the DOE-funded PV fleet initiative, houses site metadata and time series data, which is then aggregated to perform fleet-scale degradation analysis. Available metadata includes site latitude-longitude coordinates, azimuth, tilt, and mounting configuration information (rooftop, carport, or ground mount; tracking or fixed tilt). Although most of this data are NDA-protected, a subset of these systems are publicly available via the DOE-funded open energy data initiative (OEDI).
Satellite images of site locations were generated using the same parameters outlined in [1] (zoom level of 18, 0.596-m/pixel resolution, and 640 × 640 pixel dimensions). A of total 862 individual images from differing locations were collected. Of the 862 total images, 90 did not contain any solar installations. It is important to note that many of the satellite images with arrays contained multiple installations with differing mounting configurations.
The satellite image dataset was then labeled using the Python LabelImg object detection package [15]. Using LabelImg, boxes were drawn around solar installations in satellite imagery, and assigned specific mounting classifications. Solar installations were classified into one of four categories as follows.
3) Ground single-axis tracker mounts. 4) Ground fixed-tilt mounts. Example labeled images for each of these categories are shown in Figs. 1-3.
After labeling, images were split into training, test, and validation sets, with 540 images in the training set, 216 images in the validation set, and 106 images in the test set. To balance the training dataset, which had an unbalanced class distribution, data augmentation was explored on images containing minority classes. The AutoAugment function in PyTorch was used for data augmentation, using the ImageNet policy [16]. The Au-toAugment process is further described in [17]. However, no significant model improvements occurred when augmentation was performed on the dataset, so for simplicity only the original satellite imagery was used. Final training class representation was 16%, 22%, 44%, and 18% for the rooftop-fixed, groundfixed, carport-fixed, and ground-single-axis tracker classes, respectively.

B. Object Detection Model
For the object detection task, a faster-RCNN Resnet-50 feature pyramid network (FPN) was used as the model backbone. This model, introduced in [18], was selected for a few key reasons. First, it is readily available via the Pytorch package, allowing for easy adoption and training via transfer learning [16]. Second, Resnet-50 FPN has been previously demonstrated to perform well at semantic segmentation tasks on high-resolution satellite imagery when detecting solar installations [19]. Although the goal in [19] is different from ours, the dataset used is similar (it is also satellite imagery with labeled solar arrays). Because the Resnet-50 FPN is a FPN, it is robust at identifying PV arrays at various scales [19]. In addition, the Resnet architecture ensures that the training accuracy is not diminished as convolutional layer size and subsequent model complexity is increased [20].
Transfer learning was used to train the model, with the original model trained on the Microsoft Common Objects in Context (MS-COCO) dataset [21]. This dataset contains 80 categories of approximately 160 K images, collected via the web [21]. The transfer learning model was trained and validated on the labeled satellite imagery datasets described in the datasets section above.
Images were preprocessed before model training. Specifically, each image was converted to a Pytorch tensor object, and its associated pixels were normalized.
Model hyperparameters were modified to optimize model performance on the test set. Final model hyperparameters were as follows.
1) Learning rate of 0.005 2) 10 epochs 3) Decay factor of the learning rate, gamma, of 0.1 4) Batch size of 2 5) Momentum of 0.9 6) Weight decay of 0.0005 The Python Detecto package was used as a wrapper for training the object detection model [22]. The Detecto package uses a Pytorch backbone and contains basic functionality for training object detection models using transfer learning, as well as generating model predictions.
Detecto functionality was used to generate predictions on the test dataset, with modifications. Specifically, the acceptance threshold for each class was set to 0.65, so any predictions with a probability score below 0.65 were discarded. The value 0.65 was empirically derived.
To aid in image visualization, overlapping object detection boxes were combined using nonmaximum suppression (NMS). NMS is a technique for selecting the optimal box in a series of overlapping boxes. Using NMS, lower scoring object detection boxes are iteratively removed when their intersection-over-union (IoU) score with other, higher scoring boxes is above a selected threshold. IoU is the area of overlap divided by the area of union between two boxes. In this case, the IoU is calculated between two prediction boxes. For NMS, an IoU threshold of 0.4 was empirically derived. A preexisting NMS function in the Pytorch TorchVision package was used in the data pipeline, after generating the associated predictions [16]. An example satellite image without NMS and with NMS is shown in Fig. 4. As you can see, one of the boxes overlaps with the other box in the non-NMS image, and is removed when NMS is applied.

A. Mounting Configuration Model Performance
Performance on object detection models can be measured a variety of ways, most notably mean average precision (mAP) and average precision (AP) scores.
A few key metrics are used to generate AP scores and overall mAP score. First, the IoU between the predicted bounding box and the ground-truth bounding box must be calculated.
Precision and recall are calculated based on the IoU value, at a specific IoU threshold. For the results presented in this article, a standard IoU threshold of 0.5 is used. Precision is a measurement of the positive predictive value, or the number of predictions made by the model that are correct. Recall is a measure of model sensitivity, or the ability of the model to detect true positives. Average precision is defined as the area under the precisionrecall curve. In this research, a 11-point interpolation for AP is used, where the shape of the precision-recall curve is the average of the maximum precision values at 11 equally spaced recall levels [23]. 11-point interpolated AP is defined in the following equation [23]: where P is the maximum precision value at each value of R. After calculating an AP score for all model classes, overall model performance can be evaluated via the mAP score. The mAP is the average of the AP across all classes, and is described via the following equation [23]: where N is the total number of classes, and AP is the average precision of each class. The calculated mAP scoring method established in [23] was used to benchmark overall model performance, and the AP score method was used to benchmark individual class performance. The associated Python implementation of [23] was used. These results are displayed in Table I, with each class's precisionrecall curve shown in Fig. 5. These results indicate the model's strengths and weaknesses on a class-by-class basis. The model is best at detecting carport and fixed-tilt ground mount installations, and weakest at detecting rooftop installations. Poorer model performance when detecting rooftop installations is likely because of the significant variation in what these installations look like, particularly their shape.
The overall model mAP score is 77.79%. As a point of comparison, previously published literature on the MS-COCO dataset has several published models' mAP scores ranging between 50% and 60%, with the current state-of-the-art measuring at 63.3% [24]. Because solar satellite imagery research has not previously focused on mounting configuration object detection, but rather on general solar array detection, a direct comparison is not possible. Still, the overall mAP score can be compared with previous literature that focuses on general solar array object detection, as a point of reference. He and Zhang [19] achieves a maximum mAP score of 95.66% and 80.31% using Resnet-50 FPN and Segnet architectures, respectively, for general solar array object detection. Although these are higher mAP scores than our model, it is important to note that [19] only has one object detection class, whereas our model has four. Consequently, our model has a more complex object detection task, where it needs to not only identify the solar array, but also classify what type of mounting configuration the array has. This added complexity makes our model subject to higher error.

B. Model Drawbacks
In spite of high overall model performance, results on the test set showed repeated instances of the object detection model   Fig. 8 shows the pipeline for extracting solar metadata, available in the panel-segmentation package. The only required inputs are for running the pipeline are site latitude-longitude coordinates and a Google Maps API key. Using these inputs, a satellite image at the respective location is taken and saved locally. This satellite image is then used as an input for both the mounting configuration object detection model and the semantic segmentation model, which is described in [1]. The generated object detection model boxes are then used to cluster individual arrays in the semantic segmentation mask output. Each individual array is then run through the azimuth estimation routine, described in [1]. Final pipeline output is a dictionary, containing the mounting configuration and azimuth of each solar array detected in the satellite image.

C. Model Pipeline Integration and Dataset Release
This newly updated pipeline requires both the object detection model and the semantic segmentation to identify a solar array in a specific region of a satellite image, ensuring fewer false positives in the metadata extraction pipeline.
To provide additional data transparency, as well as to promote further research in this space, the entire dataset used to train the model is available via the DOE DuraMAT Datahub [3]. All data have been anonymized, with no location metadata provided (including latitude-longitude coordinates). Each satellite image with its associated objection detection labels is provided.

IV. CONCLUSION
To supplement current panel-segmentation package functionality, we plan to add additional functions for calculating array size and expected energy output, ground-coverage ratio (GCR), and terrain classification. We are also investigating tilt estimation via 3-D data sources, such as Google OpenStreetMap and light detection and ranging technologies, using methods adapted from [9]. This research has been leveraged extensively by the NREL PV Fleets project, as many of the data sources provided by our industry fleet partners have metadata gaps. Quantifying the financial impact of these gaps has been difficult, as there is no known financial cost for having incorrect solar metadata. In addition to our current efforts to fill in these data gaps, our group plans to investigate the financial and operational costs of incorrect or incomplete solar fleet metadata, with the intent of drawing more attention to issue.