Dropout and Pruned Neural Networks for Fault Classification in Photovoltaic Arrays

Automatic detection of solar array faults reduces maintenance costs and increases efficiency. In this paper, we address the problem of fault detection, localization, and classification in utility-scale photovoltaic (PV) arrays using machine learning methods. More specifically, we develop a series of customized neural networks for detection and classification of solar array faults. We evaluate fault detection and classification using metrics such as accuracy, confusion matrices, and the Risk Priority Number (RPN). We examine and assess the use of customized neural networks with dropout regularizers. We develop and evaluate neural network pruning strategies and illustrate the trade-off between fault classification model accuracy and algorithm complexity. Our approach promises to elevate the performance and robustness of PV arrays and compares favorably against existing methods.


I. INTRODUCTION
Faults in utility-scale solar arrays [1]- [4] often lead to increased maintenance costs and reduced efficiency. Since photovoltaic (PV) arrays (Figure1.(a)) are generally installed in remote locations, maintenance and annual repairs due to faults incur large costs and delays. To automatically detect faults, PV arrays can be equipped with smart electronics that provide data for analytics. Smart monitoring devices (SMDs) [5] (Figure1.(b)) that have remote monitoring and control capability have been proposed [6] to provide data from each panel and enable detection and localization of faults and shading. The presence of such SMDs renders the solar array system as a cyber-physical system [7] that can be monitored and controlled in real-time with algorithms and software. Figure 1 shows a cyber-physical 18 kW PV testbed described in [8].
Even with the presence of SMDs, fault detection and classification is challenging and requires statistical analysis of The associate editor coordinating the review of this manuscript and approving it for publication was Giambattista Gruosso . PV data. Traditional methods such as the Support Vector Machines (SVM) [4], decision tree based approach [9], and the Minimum Covariance Determinant (MCD) distance metric [6] were proposed to identify fault conditions in PV arrays. Real-time fault detection in PV systems was studied in [10], wherein a threshold based approach was developed to identify faulty panels. Another statistical method in [11] proposed a 3-sigma rule for detecting faults in PV modules. Methods to detect partial shading in PV systems have been addressed in [12]. An unsupervised monitoring procedure for detecting anomalies in photovoltaic systems using a one-class SVM was shown in [13] and a semi-supervised graph approach for fault detection and classification was proposed in [14]. Although the above methods provide encouraging results, they are based on aggregated data and generally cannot localize and distinguish between electrical faults and shading in PV systems. The ability to classify faults accurately and automatically with various PV array connection topologies is still an open problem [15].
While neural networks (NNs) have been used in the past for fault detection and classification tasks [4], [16], the set of hyper-parameters to be chosen and the type of architecture is a challenge. Our vision for research monitoring and optimizing a large-scale PV array is summarized in Figure 2. As shown in Figure 2, the array can be used to collect data in real time. Data collected from the array can be used for fault detection and classification studies. Switches with remote access also allow for dynamic topology reconfiguration. In this paper, we use an autoencoder machine learning framework [17] to perform fault detection. An autoencoder is used to learn efficient representations (also called encodings) of the data through unsupervised dimensionality reduction. A decoder can then reconstruct the original input from the learned encoding. This unsupervised machine learning approach can be used to identify faults. We then implement fully connected NNs and dropout NNs [18] trained specifically for fault classification in PV arrays. In our results section, we discuss performance based on accuracy and computational complexity in terms of weighted accuracy for various architectures. To reduce computation and redundancy and to customize the NN, we also perform network pruning using the lottery ticket hypothesis optimization process [19] to design sparse NN architectures. We achieve a 2× reduction in the size of the NN. Along with custom hardware, which enables monitoring voltage, current, temperature, and irradiance at the module level [20], a custom NN with reduced parameters and high accuracy will be beneficial for the development of compact and specialized hardware for fault classification in PV arrays. Furthermore, we study the faults and their diagnosis from an operations' and management perspective, as described in Section II.B and provide efficient representations of the neural network for deployment on hardware.

A. STATEMENT OF CONTRIBUTIONS
We consider the problem of detection and classification of faults occurring in utility-scale PV array systems. First, we train an autoencoder for fault detection. More specifically, we use our custom features to train a 3-layer autoencoder to detect faults. We use the reconstruction error from the autoencoder to create an error histogram, which is used to identify faults. Next, we train a NN for PV fault classification using dropout and concrete dropout regularizers. We compare NNs against the standard machine learning (ML) classification algorithms described in reference [17], such as SVM, K-nearest neighbor (KNN), and random forest classifier (RFC). Additionally, we associate the performance of the classification algorithms to the hardness of data separation in PV arrays. We perform dimensionality reduction using the state-of-the-art Distributed Stochastic Neighbor Embedding (t-SNE) algorithm [21] and visualize clusters of faults which are inseparable. Our results show that the 2× pruned networks perform better than standard ML classifiers and concrete dropout has the best performance among all methods examined.
The rest of the paper is organized as follows. In Section II, we describe the type of faults considered in this paper and the dataset used for this study. Furthermore, we explore the practical perspective by studying faults based on their safety category and the Risk Priority Number (RPN). In Section III, we study fault detection using autoencoders and then illustrate the use of an unsupervised machine learning approach for fault detection. In Section IV, we use neural networks for fault classification, more specifically, we explore the use of dropout neural networks to address overfitting and the use of pruned neural networks to optimize fault classification. We present the results from a series of fault detection and classification using these algorithms in Section V and provide our conclusions in Section VI.

II. FAULT CLASSES IN PV ARRAYS
In this section, we review the standard test conditions and the commonly occurring faults namely, shading, degraded modules, soiling, and short circuits. We consider the approach of fault detection and classification by monitoring the electrical signals such as maximum power point tracking (MPPT) parameters, which are discussed in Section II-A1. Standard Test Condition Irradiance (STC) values correspond to the measurements yielding maximum power under the irradiance values of a particular instance. A module is shaded if the irradiance measured is considerably lower than STC, usually caused by overcast conditions, cloud cover or building obstruction. As a result, the power produced by the PV array is significantly reduced. Degraded modules are a result of modules aging or regular wear and tear of the PV modules. Consequently, the degraded modules affect the entire string of the array as it includes both good and degraded modules owing to the lower values of either open-circuit voltage V oc and short circuit current I sc . Since PV modules are exposed to the environment, modules get soiled due to dust, snow, bird droppings and other particulate matter accumulating on the PV module. While the irradiance measured remains the same as STC, the power produced drops significantly. The final fault type considered in this paper is the short circuit. This not only causes significant power loss but can also create potential fire hazards and cause severe damage to the modules. We addressed some of these faults using clustering techniques for fault detection in our previous work [20]. However, to improve the efficiency of PV arrays and prevent safety hazards, we need to identify and localize these faults automatically.

A. THE PVWatts DATASET
In this section, we briefly discuss the data used in our experiments. We use the National Renewable Energy Laboratory's (NREL) PVWatts Calculator [22] which estimates the cost and amount of energy produced by grid-connected photovoltaic energy systems worldwide. The dataset available from PVWatts includes 4 commonly occurring faults, as well as the standard test conditions of PV arrays. Faults are classified in terms of the following categories: shaded modules, soiled modules, short-circuited modules, and degraded modules. The data was obtained for a period of one year (January to December 2006) at a sampling duration of one hour. Data points include irradiance, temperature, and maximum power (P mp ) measurements along with a time stamp, amounting to 4000 hours of data per class. In total, we have 20000 data points for all 5 classes.

1) INPUT FEATURES
We consider a set of 9 custom input features, which includes maximum voltage (V mp ), maximum current (I mp ), measured irradiance, temperature, fill factor (FF), V oc , I sc , P mp and Gamma (γ ) -the ratio of power over irradiance. These features are derived from the IV-curves of the NREL's PVWatts Calculator dataset [22]. In order to understand the data, we perform t-SNE to visually show that the data has overlapping faults as shown in Figure 4. This method projects the input 9-dimensional feature matrix into two dimensions by minimizing the Kullback-Leibler divergence of the data distributions between the higher and the mapped lower dimensional data [21].

2) DATA LABELING
The data points were labelled as belonging to one of the five classes (i.e., standard test conditions (STC), shaded, soiled, short circuit and degraded) based on the input feature vector. The data points were labeled as: 1) STC: If the measured irradiance was 1000 W /m 2 or has an ambient temperature of approximately 25 • C. 2) Shaded: If the irradiance was lower than STC by 25% (i.e., lower than 750 W /m 2 ) or more. 3) Soiled: If the measured irradiance was as per STC (i.e., 1000 W /m 2 or 25 • C) but the power output was less than 25% of power output under STC conditions. 4) Short circuit fault: If the irradiance and the temperature were as per STC, but the measured maximum current I mp was less than 25% of measured maximum current I mp at STC.

5) Degraded module:
If the measured open circuit voltage V oc or, short circuit current I sc were lower than the rating of the PV module by 25% or more.

B. OPERATION AND MANAGEMENT OF PV ARRAYS
To provide a practical perspective, we studied the nature of these faults from an operations and management perspective.
Faults in PV arrays can be classified into a list of three safety categories, as shown in  Faults in this paper including shading, degradation and soiling can be considered as type A faults while short circuits are considered a type C (f,e,m) fault. The corresponding risk priority numbers are shown in Table 2.
We study faults with RPN as mentioned in Table 2. Since faults with high RPN possess a greater safety threat as shown in Table 1, detection and classification of these faults is critical.
In the next section, we demonstrate the use of an unsupervised machine learning algorithm to detect faults. An autoencoder ML algorithm is used to detect faults based on the histogram reconstruction error.

III. PV FAULT DETECTION USING AUTOENCODERS
We propose the use of an autoencoder for fault detection. An autoencoder is an unsupervised learning algorithm designed to identify faults based on reconstruction errors. An autoencoder consists of an encoder and a decoder. A simple schematic of an encoder can be seen in Figure 5. The encoder maps the input to a lower dimensional embedded space also called latent space and the decoder maps the latent space to the original input space. The difference between the original input and the reconstructed output can be used to identify anomalies in the data [28] and hence detect the presence of faults.  We used an autoencoder for fault detection that consists of an input layer, the first hidden layer, a second hidden layer (latent space), the third hidden layer and an output layer. The input layer and output layer have 9 neurons. The first and third hidden layer consists of 8 neurons. The latent space or the second hidden layer has 2 neurons. All hidden layers use a sigmoid activation function. The autoencoder was trained for 50 epochs to minimize the mean squared error between the inputs and the reconstructed inputs at the output layer. The nine dimensional input feature matrix is given as an input to the autoencoder. The autoencoder is trained on STC irradiance data while the fault data is treated as anomalous and is used to test the algorithm. The latent space consists of two neurons and the decoder maps the latent space to the original input dimensions. As seen in the error histogram in Figure 6, we detect faults based on reconstruction errors. We observe that while STC irradiance data have low reconstruction errors, fault data have higher reconstruction errors. Using this method, we can identify anomalous data from observed measurements and hence detect the presence of faults.
In the next section we address the problem of fault classification. More specifically, we use a neural network with dropout for fault classification. Dropout neural networks have been shown to prevent overfitting [18]. We also discuss the use of Lottery Ticket Hypothesis [19] for pruning the neural network for fault classification.

IV. PV FAULT CLASSIFICATION USING CUSTOM NEURAL NETWORKS
In our previous work [16], [29], we demonstrated the use of a feed-forward fully connected neural network for fault classification on simulated solar fault data generated using Simulink models. In this paper, we propose the use of a concrete dropout and compare the results with uniform dropout [18] neural network architecture for fault classification using PVWatts.

A. DROPOUT NEURAL NETWORK
Dropout in each layer randomly sets a certain percentage of weights during the forward pass and gradients during the back-propagation to 0 [18]. This mechanism acts as a regularizer and reduces the problem of over-fitting. In dropout neural network, for the l th layer, we select a dropout ratio p ∈ (0, 1) and sample a vector of Bernoulli random variables β (l) with a probability p of being 1 and 1 − p of being 0. In both forward pass and back-propagation update, we mask the weights of neurons by computing element-wise product of z (l) and β (l) . Masking these weights during the update regularizes the network resulting in smoother decision boundaries.

B. CONCRETE DROPOUT NEURAL NETWORK
Since p is a hyper-parameter, the problem of selecting p for a given dataset is crucial and performing a brute force search on a continuous variable p is computationally expensive. To address this issue, concrete dropout was introduced in [30], in which the dropout ratio p is optimally selected for each layer by auto-tuning p. Since gradients cannot be computed for the Bernoulli distribution, concrete dropout replaces the Bernoulli distribution during training by a Gumbel-Softmax distribution, so that a reparameterization trick [31] can be used to compute gradients with respect to dropout probabilities.

C. PRUNED NEURAL NETWORKS FOR PV FAULT CLASSIFICATION
Unlike the masking mechanism in the Dropout training, pruning the neural network involves removing a certain percentage of neurons that have the least contribution towards the output, which helps in reducing the complexity and improve speed. Pruned neural networks on embedded hardware have been shown to greatly improve computational performance and reduce memory requirements with only a slight reduction in the model's accuracy [19]. Consider a fully connected NN with N neurons in each layer initialized by weight matrices After training this network for t epochs, the resulting weights of the network are W t . Next, compute a mask M [19] by pruning p% of the weights closer to zero by taking the absolute value. Reinitialize the network with W 0 masked by M. The network training and network pruning process is iterated until 2.5× compression is achieved, beyond which the networks performance degrades due to underfitting of the data. Figure 7 gives a general overview of the process NNs are trained using the Lottery Ticket Hypothesis.
In the next section, we report our results using the algorithms described above. We report the weighted accuracy results using the RPN and compare with standard machine learning algorithms.

V. SOLAR ARRAY EXPERIMENTS AND RESULTS
We considered a set of 9-dimensional unique custom input features for the neural networks. These nine input features are known to provide high accuracy for fault classification on simulated data [16]. The dataset contains a total of 22000 samples. We feed the 22000 × 9 feature matrix to the NN. We use a 3-layer neural network with 50 neurons in each layer, as in [4], with tanh as our activation function for each layer. This architecture was fixed for all the NN simulations, to avoid any bias which may occur during training and testing. We consider multiple uniform dropout architectures with dropout probabilities p ∈ (0.1, 0.2, 0.3, 0.4, 0.5). All the networks were trained for 100 epochs to minimize categorical cross entropy loss using an Adam gradient descent optimizer.
Along with dropout neural networks for comparison, we performed fault classification using traditional machine learning classifiers, as reported in Table 3. In addition, Table 3 shows accuracy and run time for various algorithms. We also compare against results with fully connected neural networks (baseline) [4], [16]. We ran a Monte Carlo simulation on all the architectures mentioned to obtain estimates for training and testing. The training (70%) and testing (30%) dataset were sampled randomly in each run of the Monte Carlo simulation. We observe that dropout architectures perform quite well in terms of accuracy and run time. In fact, concrete dropout provided the best results. Among all the dropout architectures, we see an improvement of 0.5% when using a concrete dropout architecture in comparison to the fully connected neural network.
We also compared NNs performance with standard machine learning algorithms such as RFC, SVM and KNNs, and the results are reported in Table 3. For the ML algorithms, we performed a grid search over a range of parameters and chose the best configuration, by 3-fold cross validation on training data. Grid search is used to determine the optimal hyperparameters of a model which results in the most 'accurate' predictions. For the RFC classifier, we considered maximum depth in {10, 25, 50, 100} and number of estimators in {5, 10, 25, 50} and found that the best parameters were max depth of 25 and 50 estimators, with the best accuracy of 87.35 on the validation set. For the KNN classifier, we considered the number of Neighbors in {5, 10, 25, 50, 100, 200} with Euclidean distance measure and found that the hyperparameter associated with the best accuracy of 86.18 on the validation set was obtained with the number of neighbors being 25. For the SVM classifier, we considered soft margin parameter C in {1, 10, 100, 1000} and kernel in {linear, radial basis function} and found that the best parameters were C of 100 and linear kernel, with the best accuracy of 84.23 on the validation set. We observe that techniques such as the RFC overfits the training data, while other classifiers such as the SVM and the KNN perform poorly compared to NNs.
In order to evaluate the model's ability to classify the data points belonging to the group with higher risk factors, we compared performance of different models based on RPN weighted accuracy. The RPN weighted accuracy (RWA) is calculated by summing the products of normalized RPN scores with its class-wise accuracy, written as, where, A 1 , A 2 , A 3 , A 4 , A 5 are class-wise accuracy's of standard test conditions, soiling, shading, degraded and short circuit faults, respectively. The coefficients are obtained using Table 2. We observed that the concrete dropout has superior RWA performance over the other models, as well as the best  overall test accuracy, thus it is consistent in correctly classifying all faults classes considered in PV array monitoring systems.
For the network pruning experiments, we consider NNs with 3 hidden layers each with N = {50, 100, 200, 500, 1000} neurons. All NNs were trained for 150 epochs and at every pruning iteration 10% of the remaining weights were pruned. We used the ReLU activation function for all the neurons and the network was trained to minimize categorical cross entropy loss using a mini-batch gradient descent optimizer. We find that smaller networks achieve greater compression of about 62% for a drop in accuracy by 4%, as shown in Figure 9. The performance of larger networks degrades by up to 40% after pruning the network. We observe that our pruned neural network algorithms converge faster. This is because there are fewer parameters in our pruned network and hence less misadjustment error. This can be useful for the development of custom hardware for fault classification. We also observe FIGURE 11. The accuracy plot of the neural network with pruning. We observe that pruned neural network algorithms have an accuracy within 2% of the fully connected neural network algorithm for a 40% reduction of the weights of the neural network. that our pruned neural network algorithms have an accuracy within 2% of the fully connected neural network algorithm for a 40% reduction of the weights of the neural network. Interestingly, we also find that the overlapping points shown in Figure 4 correspond to the incorrectly classified points in the confusion matrix, shown in Figure 8, which includes approximately 10% of the data.

VI. CONCLUSION
In this paper, we propose and characterize efficient neural network architectures for fault detection and classification in utility scale solar arrays. We study the faults and their diagnosis from an operations and management perspective to offer an experimental perspective. We first use an autoencoder to detect faults. We detect faults based on the histogram reconstruction error. We then customize and optimize neural network architectures with concrete dropout mechanisms for fault classification in PV arrays. We examine the fault classification accuracy for each class. We characterize algorithms in terms of performance and complexity and more specifically we compare the proposed concrete dropout method with fixed dropout and fully connected NNs. We also compare our work against standard machine learning algorithms. We observe that concrete dropout outperforms other methods with a classification accuracy of 89.87% as shown in Table 3 and has the fastest run time on the test dataset. In order to reduce complexity, we also explore the use of pruned neural networks. Using Monte Carlo simulations, we demonstrate that the test accuracy of a network pruned by 50% (a significant reduction of weights) reduces only by 3%. The pruned network, represented by half the number of parameters, will be useful for the development of customized and efficient fault detection hardware and software for PV arrays. In addition, we evaluated faults using their RPN and their corresponding safety category. Some of the faults considered in this paper have a high RPN as shown in Table 2. We also perform a weighted class average and examine the class wise accuracy of these faults. Since the RPN associated with these faults is high and poses a greater safety threat, the detection and classification of such faults is critical. wireless communications, estimation and equalization algorithms for wireless systems, multi-antenna communications, filter banks and multirate systems, orthogonal frequency division multiplexing, ultra-wideband systems, and distributed detection and estimation. He was a recipient of the 2001 National Science Foundation (early) Career Grant. He has served as an Associate Editor for several IEEE TRANSACTIONS, including the IEEE  TRANSACTIONS ON COMMUNICATIONS and the IEEE SIGNAL PROCESSING LETTERS. DEVARAJAN SRINIVASAN (Member, IEEE) was born in Baroda, India, in 1970. He received the B.Tech. degree from the Regional Engineering College, Calicut, India, in 1992, and the M.S. and Ph.D. degrees from Arizona State University, Tempe, in 1997 and 2002, respectively. He currently works as a CTO at Poundra, LLC. He oversees all technology engagements of the company encompassing execution to product and services strategy, roadmap definition, system architecture, design, and production. He also drives the research and development efforts at POUNDRA, LLC, besides managing all customer technical engagements. His research interests include dry-band arcing in fiber-optic cables, power systems, HVDC systems and converters, computer-aided geometric design (CAGD), computer graphics, and VLSI design.
(MANI) GOVINDASAMY TAMIZHMANI is currently the Founder and the Director of the Photovoltaic Reliability Laboratory, Arizona State University (ASU-PRL), Mesa, AZ, USA (PVreliability.asu.edu). He was the Former Director of ASU Photovoltaic Testing Laboratory (ASU-PTL), Mesa, and the Former President of TUV Rheinland PTL (TUV-PTL), Tempe, AZ, USA. He is also the President of SolarPTL, Tempe (SolarPTL.com). He has published more than 175 papers in peer-reviewed journals and conferences, and delivered more than 150 presentations around the world.
ANDREAS SPANIAS (Fellow, IEEE) is currently a Professor with the School of Electrical, Computer, and Energy Engineering, Arizona State University. He is also the Director of the Sensor Signal and Information Processing (SenSIP) Center and the Founder of SenSIP Industry Consortium (now, an NSF I/UCRC site). He is the author of two textbooks, such as Audio Signal Processing and Coding (Wiley and DSP) and An Interactive Approach 2nd Edition. He has contributed to more than 350 articles, 11 monographs, 11 full patents, ten provisional patents, and 12 patent pre-disclosures. His research interests include adaptive signal processing, speech processing, and sensor systems. He was recently elected as a Senior Member of the National Academy of Inventors (NAI). He was a co-recipient of the 2002 IEEE Donald G. Fink Paper Prize Award. He received the 2018 IEEE Phoenix Chapter Award from the IEEE Region 6 Director for significant innovations and patents in signal processing for sensor systems, and the 2018 IEEE Region 6 Outstanding Educator Award (across 12 states) with citation: For outstanding research and education contributions in signal processing. He served as the General Cochair of IEEE ICASSP-99 and the IEEE Signal Processing Vice-President for conferences. He served as an Associate Editor for IEEE TRANSACTIONS ON SIGNAL PROCESSING. He is also a Series Editor of the Morgan and Claypool lecture series on algorithms and software. He served as a Distinguished Lecturer for the IEEE Signal Processing Society, in 2004.