Visual Vocabulary Based Photovoltaic Health Monitoring System Using Infrared Thermography

Photovoltaic (PV) systems have gained global acceptance in terms of green, replenishable energy resources to meet energy demand with no emissions. However, PV systems are susceptible to operational and environmental stresses. Moreover, PV panels monitoring is necessary to keep their performance and efficiency intact due to their lack of supervisory control. Therefore, this study monitors PV panels based on health into three sub-classes: healthy, hotspot, and faulty through infrared thermography. First, Thermographs key points are selected using an $8\times 8$ uniform pixel grid, and speed-up robust features (SURF) are extracted from grid intersection points. Afterward, due to its simplicity, the k-mean clustering algorithm creates single-level clusters based on actual observations similarities and similar observations closeness within-cluster and dissimilarity to other clusters observations are used to transform features into visual words. Finally, shallow classifiers are utilized because of low training time and high prediction speed. After extensive testing and compressive analysis, the proposed approach was found economical, fast, and showed high testing accuracy of 97% through a multi-class shallow classifier (support vector machine) with low computational complexity and less storage size. Thus, this approach can monitor megawatt PV systems with high accuracy and keep performance and emissions mitigation potential high while lowering payback time.


I. INTRODUCTION
Energy is an essential entity in the modern era. While its major production technologies involve dreadful gas emissions such as CO 2 , NO x , water vapors, etc., also known as greenhouse gases (GHG), they are responsible for global warming and climate change. Therefore, green renewable energy resources such as solar photovoltaic (PV) systems, wind energy systems, etc., have gained wider acceptance across the globe as the next energy security resources owing to energy needs and climatic concerns. Moreover, fossil fuels are limited and depleting fast. On the other hand, green renewable energy resources, such as PV systems, have a global potential of 1500-50,000 EJ per year [1]- [4]. Since then, the energy sector has become the leading contributor to global climate change, with 60% of global The associate editor coordinating the review of this manuscript and approving it for publication was Diego Oliva . GHG emissions [5]. Hence, research has focused on green renewable energy resources in multiple aspects.
However, despite the tremendous green energy potential offered by the PV system, its output varies with multiple parameters such as solar radiation, orientation, geographical location, temperature, etc. [4], [6]. In addition, the PV system, due to its no supervisory operation (unlike fossil fuel plants) [7], its performance is susceptible to operational and environmental stresses such as arcs, line-to-line, shortcircuiting, open circuiting, glass breakage, interconnection faults, bird dropping, dust accommodation, shadowing, etc., which introduce defects in the PV system such as hotspots, faults, and mismatches, etc. [1], [3], [4], [6], [8]- [10]. More detail on PV system faults is provided at [10]. Therefore, the literature categorizes faults at three levels; celllevel ( [2], PV cells under stress, such as bird dropping, operate in the reverse region and offer more resistance to healthy cells current. As a result, these reverse regions heat up (Joule's effect [11]), and a localized heated point appears, known as a hotspot [2], [4], as shown in   Some of the PV panels' issues/defects are identifiable with the naked eye, such as protection glass breakdown, cracks, corrosion, etc. [4]. Moreover, for a large-scale PV system, automation is necessary. Literature has utilized multiple approaches such as thermography, electrical signals, electroluminescence, photoluminescence, etc., using different machine learning and deep neural network-based classification approaches to classify and differentiate PV systems based on health, defects, stresses, etc. [2], [4], [9], [12]- [14]. Electroluminescence images are high resolution but can damage PV cells due to the current injection. While the electrical signal-based approach necessitates sensors, increasing cost and complexity, the VI characteristics curve indicates an anomaly; locating the source or fault in the string is difficult. In contrast, thermographs show abnormal behavior of PV panels suffering from defects and are non-invasive thermal approaches [15], [16]. In addition, the thermographic approach is reliable, fast, economical, and has vast outdoor applications [4], [16]. Thermographs of PV panels are shown in Fig 2. More detail on the PV system's fault diagnosis approaches is available in [10].

A. PV SYSTEM FAULT DIAGNOSIS-LITERATURE REVIEW
Literature has extensively analyzed the impact of PV system defects/faults on energy production. For instance, 1 megawatt PV system with 20% defective panels reduce approx. 306,081.5 kWh energy potential per annum if defective panels are not timely identified and treated. And to produce this loss energy using fuel mix in Pakistan approx. 151.12 tons of CO2 equivalent are produced and released into the atmosphere as per a theoretical study [3]. Therefore, literature has proposed multiple approaches (including machine learning, deep learning etc.) for PV system health monitoring and fault diagnosis.
Authors in [1] identified PV module mismatch faults through thermographs, which can be incorporated with a maximum power point tracker to adjust to a new operating point. Li et al. [17] proposed a six PV module defects diagnostic model through convolution neural network-based deep learning using unmanned aerial vehicles on a large-scale PV farm with high accuracy. Ali et al. [18] used PV panel thermographs and color image descriptors as features to train machine learning algorithms (k-nearest neighbor) into three health-based classes.
While Ali et al. [2] extracted thermographs texture features, mean of the histogram of oriented gradient (HOG), local binary pattern (LBP), and red-green-blue (rgb) color features to classify PV panels based on health using shallow classifiers. Niazi et al. [12] also extracted thermograph texture features and applied principal component analysis (PCA) on HOG features to reduce dimensionality and classified PV panels based on health using shallow classifiers (machine learning approach). Finally, Ahmed et al. [4] used deep convolutional neural networks (isolated, transfer-learned trained isolated, and transfer learned pre-trained) and classified PV panels based on health and defects.
Deng et al. [15] proposed a subtraction network with a voting strategy to differentiate abnormal panels from normal panels having abnormal panels' features. Turhal et al. [13] proposed a common vector approach for fault detection. The authors of [6], proposed a computer vision approach to detect soil and dust on PV panel surfaces based on gray-level co-occurrence matrix features extraction. Zyout et al. [8] proposed a deep convolutional neural network-based approach to characterize the surfaces of PV panels and defect identification.
While Ebner et al. [14] used thermographs, electroluminescence, and photoluminescence (PL) approaches to find the production failures/defects in PV cells and modules, the authors [7] proposed a simple linear iterative clustering, a super-pixel technique for PV system hot spot detection using thermal imaging. Jaffery et al., in [16], proposed PV panel early fault diagnosis through thermograph-based detection and fault classification through fuzzy logic. Artificial intelligence-based PV system health monitoring and fault diagnosis through thermographs are provided in Table 1. More details on image features are provided at [19].  Therefore, considering the limitations of literature such as manual features extraction (texture, HOG, RGB and LBP etc.), the introduction of the new dataset with abnormalities (rotation, visuals, memory etc.), deep learning approaches requirements of GPU, storage, architectural complexity, and mainly excess time consumption, a fast and least limited automatic PV monitoring approach is needed to classify the thermographic images of PV panels with high accuracy.
Concisely, overgrowing environmental concerns about energy production through conventional technologies with limited and fast depleting resources, the world has focused on alternative energy resources such as PV systems due to their cleanliness, tremendous potential, and availability for the next millions of years. However, due to operational and environmental stresses, the PV system suffers from low performance, reducing GHG mitigation potential and payback time. Moreover, considering the limitations of the machine and deep learning approaches, this study employs infrared thermography to categorize PV panels into three sub-classes based on health: healthy (working fine), hotspot (temporarily arose due to bird drop, etc.), and inoperable (faulty). Timely identification will result in required cleaning of PV panel suffering from the hotspot and revert its status as healthy and faulty panels can be replaced on time. Infrared thermograph features are extracted via speeded-up robust features (SURF) (captures the rotation information as well, unlike HOG, LBP etc.). Furthermore, the features vector size is reduced by rejecting redundant features to decrease memory requirement, which increases the training and testing speed of the classifier.
Moreover, all thermograph features are categorized based on health using the k-mean clustering algorithm because of its dependence on actual observations and simplicity to achieve high accuracy and fast response.
Moreover, it is important that this approach can be extended to other-dimension issues such as using medical images/radiographs (such as Alzheimer's, tuberculosis, etc.) to perform least dependent-features based classification into no, mild, strong (demented), etc. conditions with high accuracy and fast response, since feature-based classification in the medical field is already being utilized to detect diseases with higher accuracy and reduce individual analysis, hence saving time and effort. Hence, feature-based disease diagnosis is provided at [24].
The rest of the study is organized as follows: Section II focuses on the approach, Section III on the feature extraction mechanism, Section IV on the results, Section V on the discussion, and finally Section VI on the conclusion and future prospects.

II. MATERIAL AND METHODOLOGY
Infrared thermography was obtained from a 42.24kW PV (240Watt PV panels, 22 panels in a string and eight strings, 240 × 22 × 8) located in Pakistan's city Lahore, using  FLIR VUE-Pro 640, 8-bit thermal image bit depth and 640 × 512/pixel spatial resolution. Thermographs were captured with 32-40 Celsius temperature, wind speed of 6.9m/s, and irradiance level of 700W/m2. Detail of the experimental setup is provided at [12]. Total 315 infrared thermographs are used in this study which is segregated into three classes based on health: healthy, hotspot, and faulty PV panel thermographs. Table 2. presents the in-depth detail of PV panels based on health, while a simplified proposed approach is presented in Fig 3. The dataset was segregated into training and testing datasets with an 80:20 ratio randomly and with an equal proportion of each class for proper training and testing. Moreover, shallow classifiers were used due to their fast training and validation response [2], [4]. In addition, 5-fold cross-validation approach was utilized to avoid over-fitting.

III. FEATURES EXTRACTION
In literature, thermograph features are extracted to train and test (shallow and deep) classifiers using a manual approach or deep neural networks [2], [4], [12], which have their corresponding advantages and disadvantages [19]. In this study, a visual vocabulary is created from the thermographs by extracting features descriptors from each health-based class of thermographs; finally, a bag of features/ feature histograms are formed. More detail is provided at [26].
A uniform 8×8 size pixel grid is defined as point selection to extract the thermograph features, where grid junctions define locations for feature extraction. Variation in grid size changes the size of the feature. Moreover, a four-element [32, 6, 96, 128] vector is used in which each element relates to the size of a square block from which the upright speeded-up robust features descriptors are extracted. It has the advantage that it can also capture the rotation information if required (limitation of LBP and HOG etc, features). More detail on SURF is provided at [27]. Using this information, the features from the training dataset are extracted.
Afterward, the most redundant parameters/features (20%) are reduced from each class of dataset to reduce the vector size to 80% strongest points, lessen memory requirement and increase the execution time (All classes' features are balanced based on class with least strong features to improve clustering). Finally, the K-means clustering algorithm is used for features segmentation because of its simplicity and dependence on actual observations rather than the dissimilarity between every pair of observations in the data, creating a single level of clusters. In addition, it has the advantage that observations within each cluster are as close to each other as possible and as far from objects in other clusters as possible. Using k-mean clustering, 500 visual word/feature histogram is created.
More detail on K-mean clustering is provided at [28], [29]. Finally, transformation is provided in Fig 4; further explanation about transformation can be found in [25].

IV. RESULTS
For visual vocabulary, 3167829 training features were extracted, and 80% of features from all classes considering the least strong features (782262) of a healthbased class, feature vectors were formed with a total of 2346786 features.
Afterward, K-means clustering was used to create a 500-word visual vocabulary, and clustering took approx. 2 minutes. A reduced representation of thermographs with visual words occurrences in the form of the histogram for healthy, hotspot, and faulty are provided in Table 3. The frequency of occurrence in visual words for PV-system healthbased classification is different; healthy, hotspot, and faulty occurrence frequency illustrates the difference detected by the proposed approach for their classifications.
Results were processed using MATLAB 2021a using core i7, 7 th generation, 16 GB RAM, NVIDIA GeForce GTX 1060, 1.5TB SSD, and 64-bit operating system specifications. The performance of various shallow classifiers such as a tree, linear discriminant, Naïve Bayes, SVM, KNN trained using the visual words approach are provided in Table 4. True positive rate (TPR), positive predictive value (PPV), false discovery rate (FDR), and false-negative rate (FNR) are used as an accuracy measurement metric.  After extensive training and testing, the multi-class SVM shows the highest training accuracy, as illustrated in Table 4. To further validate the testing accuracy for the proposed approach, a new images dataset is fed to the trained SVM model to check the performance against online testing; the online testing results are provided in Table 5.

V. DISCUSSION
PV panels' importance in terms of green replenishable energy resource and effective tool against GHG emissions mitigation in global climate change perspective has increased the global penetration of PV systems in the main grid (fossil fuel dominant). However, stresses such as environmental (bird dropping, shading, snow, dust etc.), operational (DC side, MPPT etc.), aging, and other phases (installation, transportation, manufacturing etc.) introduce defects and failures causing pre-mature aging and early retirement of PV panels. This concisely impacts the PV system performance, energy potential, GHG mitigation potential, and payback time. Therefore, catering to this issue, monitoring PV panels is essential. Moreover, stresses on PV panels are not identifiable through the naked eye. Therefore, an economical, fast, accurate approach is preferable, hence literature has focused on different approaches.
In literature for PV panels, classification based on health includes different machine and deep learning-based approaches, but these approaches also have associated limitations. Literature classifying PV panels based on health into three categories: healthy, hotspot, and faulty, with their accuracy in Table 6. In literature, Ahmed et al. [4] used an isolated deep convolutional neural network for health-based classification and attained 96% accuracy but took several minutes of models training and validation.
In contrast, Niazi et al. [12] and Umair et al. [2] used texture, HOG, LBP features-based machine learning approaches and achieved 94.1% and 92% accuracy, respectively, but these features fail in low visual images rotation of images, inappropriate cell size etc. Another machine learning approach presented by Umair et.al. [18] used rgSIFT descriptor and achieved 98.66% accuracy. The authors defined the limitation that it requires optimal non-overlapped image division-size 71 × 71 pixels. A single deflection to left or right reduces the system accuracy.
The proposed approach transformed images into visual words/features thermographs. Extracted features of images through SURF by defining an 8 × 8 uniform grid step and transformation using a k-mean clustering algorithm, the details about the feature extraction is provided in Section 3. Changing the grid step is a trade-off between the features descriptor and clustering time. At the same time, 80% strongest features increase the classification accuracy by reducing unwanted dimensionality and lowering transformation time; further reduction in features impacts the accuracy of the classifier. The proposed approach resulted in 97.6% training accuracy and 97% testing accuracy on the new dataset for online testing of the SVM multi-class trained model (see Tables 4 and 5). Furthermore, the proposed approach is fast, with adequate memory storage, no high operating system requirements, highly accurate and efficient in detecting PV panels based on health into its subclasses, i.e., healthy, hotspot, and faulty. Therefore, the proposed approach can be easily employed to monitor high-power PV plants.

VI. CONCLUSION
The PV system is susceptible to multiple environmental and operational stresses with tremendous potential, outdoor operation, green nature, and without supervisory control (unlike fossil fuel-based systems). Therefore, timely monitoring of PV system health avoids capital energy output loss while maximizing GHG emissions mitigation potential. This study transformed PV panel infrared thermography into visual words through the SURF feature detector and k-mean clustering algorithm. The proposed scheme classified PV panels into three health-based sub-classes: healthy, hotspot, and faulty class. After extensive training and validation, it was found that the extracted featured trained SVM model has the highest training accuracy of 97.6 % to classify the PV panel into their respective class based upon their health condition. Furthermore, the proposed models showed similar accuracy (97%) against the new dataset, which was not used for the model's training. Therefore, the proposed approach can be utilized for the field monitoring of large PV plants.

(Waqas Ahmed and Muhammad Umair Ali are co-first authors.)
SHAIK JAVEED HUSSAIN (Senior Member, IEEE) received the B.Tech., M.Tech., and Ph.D. degrees in electronics and communication engineering from Jawaharlal Nehru Technological University Hyderabad. He has worked on brain-computer interfacing funded project by CSRI with the MITS Research Laboratory, Department of Science and Technology. He is working on a project robotics in health care systems funded by TRC, Oman. He is currently working as the Head of the Electrical and Electronics Department, Global College of Engineering and Technology, Muscat, Oman. He is a reviewer and a technical committee member of several IEEE journals, conferences, and other reputed journals. His research interests include machine vision, robotics, human-computer interface, medical imaging, applied machine learning, and renewable energy.
AMAD ZAFAR received the Ph.D. degree in intelligent control and automation from Pusan National University, Busan, South Korea, in 2019. He is currently working as an Associate Professor with the Department of Electrical Engineering, Ibadat International University, Islamabad, Pakistan. He has more than 12 years of experience in research and academia in the field of electrical and brain engineering. He has authored more than 45 scientific peer-reviewed journals, conferences, and book chapters. He taught numerous courses in the field of electrical and brain. His research interests include the modeling, estimation, prediction, machine learning, optimization, and brain-computer interface.
SULAIMAN AL HASANI received the B.Eng. degree (Hons.) in electrical and computer systems engineering and the Ph.D. degree in electrical and computer systems engineering with the main focus on signal processing and magnetic resonance imaging (MRI) application from Monash University, Melbourne, Australia. He worked on projects with the Monash Centre for Synchrotron Science, Monash Biomedical Imaging, and the University of Glasgow. He worked with the Department of Electrical and Computer Systems Engineering, Monash University, and then with the Global College of Engineering and Technology, Muscat, Oman, as a Lecturer. He is currently working with the Global College of Engineering and Technology as the Deputy Dean (A&R). His research interests include signal processing, image processing, compressive sensing, deep learning, medical imaging, and biomedical signal acquisition and reconstruction.