Neural Network-Based Classification of String-Level IV Curves From Physically-Induced Failures of Photovoltaic Modules

Accurate diagnosis of failures is critical for meeting photovoltaic (PV) performance objectives and avoiding safety concerns. This analysis focuses on the classification of field-collected string-level current-voltage (IV) curves representing baseline, partial soiling, and cracked failure modes. Specifically, multiple neural network-based architectures (including convolutional and long short-term memory) are evaluated using domain-informed parameters across different portions of the IV curve and a range of irradiance thresholds. The analysis identified two models that were able to accurately classify the relatively small dataset (~400 samples) at a high accuracy (99%+). Findings also indicate optimal irradiance thresholds and opportunities for improvements in classification activities by focusing on portions of the IV curve. Such advancements are critical for expanding accurate classification of PV faults, especially for those with low power loss (e.g., cracked cells) or visibly similar IV curve profiles.


I. INTRODUCTION
Failures impact photovoltaic (PV) performance objectives and can cause serious safety concerns [1]. Monitoring activities can reduce the impact of failures by improving understanding of the current system state and defining failure characteristics to inform proactive maintenance activities [2]. Multiple monitoring techniques exist for diagnosing failures in PV systems, including electroluminescence (EL) images [3]- [5], infrared (IR) images [6], maximum power point tracking data [7]- [11], and tracing of current-voltage (IV) curves. This study focuses on improving classification of failure modes using IV curves.
The associate editor coordinating the review of this manuscript and approving it for publication was B. Chitti Babu . Compared to energy yield data, IV curves provide more information about the electrical condition of the PV system [12]- [14]. Instead of monitoring only the maximum power point (MPP), the IV curve provides a full spectra of current values across a voltage domain. Thus, IV curves are a particularly common technique for PV failure diagnosis and performance monitoring [12], [15]- [21].
To date, a majority of IV curve-based diagnosis methods rely on feature extraction, which involves summarizing the IV curves into a set of parameters, such as short-circuit current (I sc ), open-circuit voltage (V oc ), MPP voltage (V mpp ), MPP current (I mpp ), fill factor (FF), equivalent thermal voltage, inflection factor (for mismatch trends), and equivalent series resistance (R s ) among others [12], [15], [16]. The parameters are then evaluated using thresholds (with either a statistical approach [15], or fuzzy logic [12]) or a support vector machine [16] to classify failures. However, analogous to time series classifications, distilling series details within IV curves into a small set of parameters causes a loss in information and could influence the performance of the algorithm in production [17].
To date, classification techniques that can discriminate between failures with similar IV curve profiles have been understudied. As mentioned previously, many implementations calculate a set of features representing an IV curve; while these methods can pick up on obvious profile changes, determining minor differences with these approaches would be difficult. Techniques that address this issue utilize the entire IV trace in analytical frameworks, ranging from automated feature extraction using principal component analysis [18], [19] to full representation of the IV curves in the analysis using either 2-dimensional (2D) convolutional neural networks (CNNs) [20] or long short-term memory (LSTM) networks [21].
Although the neural network (NN) implementations have demonstrated success on simulated data [20], [21], implementation of these techniques on physically-induced failures has not yet been explored. This work addresses this knowledge gap by investigating the performance of CNNs and LSTMs on physically-induced failure data (i.e., partial soiling and cracked PV cells inside a module) collected from a PV system at the Florida Solar Energy Center (FSEC). Specifically, the procedures discussed in this paper utilize neural networks, which thrive in pattern recognition tasks to observe and classify failures which have minor impacts on the IV curve (cell cracking; Figure 2) while retaining a high accuracy (99%+). This methodology contributes to the line of research focused on improving failure classifications through the development of smart technologies (i.e., combiner boxes and inverters) that can deploy these algorithms in the field.

II. METHODOLOGY
A. DATA ACQUISITION The FSEC PV system consists of two strings of 12 multicrystalline aluminum back-surface field modules facing South and tilted at 30 • , each of which is connected to a separate MPPT channel on an SMA TriPower inverter; additional details about the modules are provided in Table 1. The inverter monitors direct current (DC) inputs for both the control string (CS) (i.e., modules with no induced faults) and faulted string (FS) (i.e., modules with induced faults). A capacitive load is utilized to capture both string-level and module-level IV curves every 30 minutes with an entire scan (of both strings including modules) taking an average of 8 minutes to complete. During each scan, 26 traces were conducted (two at the string-level and 24 at the module-level). Module temperature readings were collected every 5 seconds while plane-of-array (POA) irradiance data was measured every 1 second and averaged to 1-minute intervals. Interpolation of the temperature and irradiance data was required to align the times of these data with the IV trace measurements. Two major failure modes were applied in the FSEC system: partial soiling and cracked cells. Partial soiling was done from late March through late April in 2019 while cracked was conducted from early May through late August in 2019. A total of 3,000 IV scans (defined as a collection of IV curves every 30 minutes) were taken for baseline, cracked cells, and partial soiling failure modes in 2019.

1) PARTIAL SOILING (PS)
The partial soiling failure mode was induced by placing a semi-transparent polymer film over six of the modules within the FS. Power losses from partial soiling ranged between 4-9% ( Figure 1) and caused a visible mismatch in the IV curve as indicated by a step in the current ( Figure 2).

2) CRACKED (Cr)
To create and open cracks, a sequence of increasingly damaging thermomechanical loads was used. After a light soaking and baseline measurements of IV parameters, microcracks were initiated in four modules by means of a single cold exposure at −40 • C [22]- [24]. Microcracks were subsequently propagated into full cracks using a uniform front VOLUME 8, 2020 side static load, ranging from 2400-5400 Pascals. Performance loss for the cracked failure mode is significantly lower than soiling-related losses, with slightly higher power being observed in some of the strings after cracking ( Figure 1). Compared to baseline, lower current values are observed for cracked cell failures in the proximal section of the IV curve, defined as the section on the left side of the MPP; the small mismatch in the IV curve is due to the presence of partial soiling on the unfailed modules within the fault string ( Figure 2). The impact of the cell cracking is observed as a small increase in the series resistance, R s .

B. DATA FILTERING & PROCESSING
Data quality filters were implemented to exclude noisy data and ensure analyzed data met quality control standards. IV curves with missing data over 5 consecutive minutes, flatline data (i.e., zero standard deviation over 15 consecutive minutes), or out-of-range data (i.e., parameters were greater than three standard deviations from the mean) were excluded from analysis. Stable sky filters were used to remove curves with irradiance data containing deviations > 15 W m 2 from the previous minute. Finally, an irradiance threshold of 700 W m 2 was employed to confidently remove low-irradiance IV profiles; more details about the irradiance filter are presented in Section III. After the data filtering and processing activities, 446 string-level IV curves (from 223 scans) remained for inclusion in the analysis.
The remaining IV curves were processed to: 1) correct for irradiance and temperature to enable comparisons across different test conditions [25], 2) remove voltage values less than 10V to avoid inductive ringing, 3) normalize by each curve's maximum I and V values to remove local variations between times the curves were collected, and 4) resample evenly spaced 5V point intervals, ranging from (max(V 0>10V ), min(V oc )) of all curves, resulting in a length of 82 points per IV curve. Irradiance and temperature corrections were derived from [25] as and where β is the V oc temperature coefficient and α is the I sc temperature coefficient ( Table 1). The G act is the measured POA, which is derived from a reference cell; After processing the data, additional parameters were calculated to explicitly capture the mismatch of profiles between the CS and FS IV curves and through associated first order differences (i.e., between consecutive observations) ( Table 2). For example, the FS parameter, which evaluates consecutive, pairwise differences in I FS , was inspired by a calculation of shunt resistance R sh , an important diagnostic feature [8].

C. NEURAL NETWORK ARCHITECTURE
Three NN architectures were examined in this analysis: 1) 1D CNN, 2) single-headed LSTM, and 3) multi-headed LSTM. The NNs were implemented for the entire curve as well as for the proximal section (i.e., the region preceding the MPP; Figure 2) to evaluate the impact of different regions of the IV curves on the algorithm accuracy. For all three NNs, the last dense layer in each model's fully-connected layers utilizes a logistic regression-based classifier to map the output into a probability vector containing the model-estimated probabilities of each classification for a given curve. Finally, back propagation using an adaptive movement estimation optimizer is utilized to modify the weights across all three NN architectures [26].

1) 1D CNN
While 2D CNNs are commonly implemented in computer vision tasks, 1D CNNs have become a popular method for pattern analysis in a multitude of applications [27]. The input into the CNN consisted of a tensor comprising the four predictors ( Figure 3). A convolution layer digests the input data into optimal feature vectors via learned 1D-kernels. Each convolution kernel (or ''convolution window'') moves across the length axis of the data to generate filtered matrices. Through training iterations, the kernels converge, detecting specific features of the input data. The implemented algorithm used 256 filters and a kernel size of 7.   An identical convolution layer is placed after the first convolution layer to incorporate more complex trends. After the convolutional layers, a dropout layer is used to remove 50% of the neurons to help prevent overfitting. A 1D max pooling layer is then employed to improve the generalizability of the model by reducing the feature dimension to consider only the maximum of every two values. The data is then flattened and fed into a set of dense layers to map onto an output vector representing the different classifications.

2) SINGLE-HEADED LSTM
In contrast to CNNs, LSTMs retain information over a sequence [28]. The LSTM layer processes the input data by iteratively passing each vector through a LSTM block, which computes and propagates cell states and hidden states to a proximal LSTM block. The input data for the single-headed LSTM (i.e., single LSTM capturing all 4 parameters) has a shape of curves, points, parameters. Similar to 1D CNNs, the last hidden state (outputted by the last LSTM block) is pushed through a dropout layer and fed through dense layers to map onto a vector of different classifications (Figure 4).

3) MULTI-HEADED LSTM
In contrast to a single-headed LSTM, a multi-headed LSTM contains separate LSTM layers for each of the predictors [21]. Prior to feeding the data into the model, each parameter is divided into 10 arrays of equal size and then processed in its own LSTM network as a set of sequences ( Figure 5). The input structure into each LSTM layer is of size {curves, 10, 8}. The last hidden states from all four LSTM layers are pushed through a dropout layer and then concatenated and processed into dense layers to map onto the output vector of different classifications.

D. MODEL TRAINING & EVALUATION
The dataset was balanced to ensure an equivalent number of samples within each failure mode were present prior to being split into 80%-20% train and test sets. A stratified TABLE 3. Neural network accuracies with varying sampling regions and predictors. Gray-colored cells indicate the most accurate configuration within a given architecture.

FIGURE 6.
Model results during one iteration of the 1D CNN NN architecture; the colors bars indicate the probability of a sample belonging to each failure class. The bars show that partial soiling (PS) has very high confidence, while baseline (BL) and cracked (Cr) differentiation has lower confidence. In one instance (sample #8), the model misclassifies a cracked failure mode as a baseline.
(i.e., across failure modes) 5-fold cross validation was utilized during model training to reduce model overfitting. To reduce memory requirements, the training data set was further broken up into batches of 8 rows.
The models were evaluated using accuracy scores, that compared the model-predicted classifications for the test data to the associated true labels. Model-predicted classifications were generated by transforming the probability vectors from the model into a failure classification based on the highest probability value. An accuracy score of 1 indicates a 1-1 match between the predicted and actual value for each IV curve (and vice versa for 0). An overall model iteration accuracy was calculated by summing accuracy scores for each IV curve and dividing it by the total number of curves evaluated in the test dataset. The NNs include an element of stochasticity in the training so to ensure the findings were robust, the model implementations were iterated 20 times for each architecture and associated summaries (mean and standard deviations) were tabulated for comparisons.

III. RESULTS AND DISCUSSION
The multi-headed LSTM and 1D CNN architectures produce accuracies greater than 99% on average (Table 3). Although 1D CNNs marginally outperform the multi-headed LSTMs, the latter took approximately half the time to train as the CNNs. Accuracies also varied within classes, with the partial soiling failure mode being accurately predicted in 100% of the test cases (Table 3). A visualization of predicted probability vectors indicates that partial soiling show near 100% confidence, while the probabilities for baseline and cracked classifications show less certainty, displaying in one instance a misclassification of a cracked failure mode as a baseline ( Figure 6). These errors in the NNs reflect challenges associated with accurate classification of similar IV curve profiles and low power loss (Figures 1 and 2). The classifications of the cracked failure modes could have also been improved by the presence of partial soiling within the samples, which resulted in a slight mismatch due to partial soiling.
Seeing that a unique trend was apparent in the proximal regions of the IV curve for all failure modes, failure classifications using the NNs based on only using the points before the knee were evaluated. The single-headed LSTM model rendered significantly better results, reporting around a 27% accuracy improvement (Table 3). These improvements from focusing on particular regions of the IV curve is likely due to reduced noise observed after the knee, shown by the overlap of the curves (Figure 2). In contrast, the multi-headed LSTM and 1D CNN models slightly regressed in accuracy when focused on proximal regions, which may be due to these models' ability to hold more information regarding parts of the curve.
The parameters used in the analysis contributed significantly to the accuracy of the NNs. Generation of the parameters relied on use of control data (i.e., I CS ). Specifically, the inclusion of δ I and FS improved the accuracy of the 1D CNNs by 30.9% and reduced the associated standard deviation by 26.6% compared to the 1D CNN with only the I CS and I FS parameters ( Table 3). The FS was constructed to capture changes in slope values near the I sc (an important diagnostic tool [8]) while the δ I was constructed based on knowledge of mismatch profiles in IV curves. These results show that calculating parameters based on domain knowledge allows for more stable and accurate results. If string-level control field measurements are not available to calculate δ I , control IV curves can be generated through either a diode model or Gaussian process regression [29], [30].
One of the data processing steps involved using an irradiance threshold to filter the curves. An evaluation of NN accuracy as a function of irradiance threshold indicates that the irradiance threshold of 700 W m 2 used in the analysis coincides with the optimal threshold value ( Figure 7). A U-shaped pattern in accuracy with irradiance was identified, with drops in accuracy above 700 W m 2 . Lower accuracy and high variance in IV curves at higher irradiance thresholds could be due to fewer representative samples and associated overfitting issues.
These findings highlight a few considerations when expanding the NN implementations to additional failure modes in future work. Namely, this analysis shows that differentiating between failure modes that show equivalent IV profiles and low power loss can be difficult. However, improving these classifications can serve as an important precursor to more serious failures, as in the case of cracked cell modules, which may allow oxygen and water vapor penetration, causing future reliability and power loss issues [31]. In certain cases, these misclassifications can be very dangerous. For example, both soiling and encapsulant delamination cause shifts in the I sc but the latter is also known to cause severe safety conditions [32], [33]. As noted above, accurate detection of such failure modes in IV curves could be improved by concentrating on certain regions of the IV curve. The addition of an attentive layer to the LSTM can also be used to specify the network's focus on certain characteristic areas in IV curves for certain failures [34]. In recent work, this team evaluated the effectiveness of PCA and a random forest classifier on the classification of the same IV curves evaluated in this paper [19]. The results showed similar accuracies (~99%). Both frameworks demonstrate the success of processing entire IV curves instead of a set of condensed features describing a curve (e.g., FF, I sc , V oc , etc.). However, the NN implementations will likely scale better when involving more failure modes because of the dense information-summarization abilities and the potential to deepen within these methods. This hypothesis will be evaluated in a future simulation study.
The NN analytics pipeline of IV curves could also be coupled with other datasets to improve fault classifications. For example, the NN outputs could detect and classify a fault to a single failure or a group of possible failures having similar signatures. The latter can be used as a basis for triggering further characterization (e.g. through EL images or IR scans) of the faulty module to characterize the specific failure and inform repair and replacement activities. Meteorological data could also potentially be used as an input to the NN to differentiate between soiling and cell delamination based on recent rainfall patterns.
Finally, being a network of many components, a PV system can experience multiple failures simultaneously or as a cascading result of an initiating fault. For example, regular partial shading can lead to bypass diode failure and hot spots, which can cause material wearout and other problems. However, physically inducing every known combination of failure mode is not only difficult but also impractical. Therefore, physics-based techniques will need to be leveraged to simulate multiple combinations of failures and generate a large training corpus for future work. The continued validation of simulation-based approaches with physically-induced failure modes (as demonstrated in this analysis) should be conducted based on available field data.

IV. CONCLUSION
This analysis demonstrates the capabilities of neural networks for the classification of IV curves collected from field data. Specifically, multi-headed LSTMs and 1D CNNs were able to classify with high accuracy (99%+) even with a relatively small dataset (~400 samples). Although both have comparable accuracies, the training time of the multi-headed LSTMs was about half of the 1D CNNs'. The high performance of both NNs is attributed to the domain-informed parameters used in the classification that contributed to both model stability and accuracy; irradiance threshold values also influenced the overall model accuracy, with 700 W m 2 serving as an optimal filter. The analysis also revealed that in certain architectures (e.g., single-headed LSTMs) focusing on certain portions of IV curve could lead to marked improvement in classification activities. Future areas for exploration include improved data processing and algorithm design to improve classification of different failure combinations, including those that demonstrate visibly similar IV curve profiles.

ACKNOWLEDGMENT
The views and opinions expressed herein do not necessarily state or reflect those of the United States Government, any agency thereof, or any of their contractors.