A Novel Fault Identification Method for Photovoltaic Array via Convolutional Neural Network and Residual Gated Recurrent Unit

Under the background of the large-scale construction of photovoltaic (PV) power stations, it is crucial to discover and solve module failures in time for improving the service life and maintaining the normal operation efficiency of modules. Based on analyzing the difference of I-V curves of PV arrays under different fault states, the I-V curves, temperatures and irradiances are taken as input data, and a fusion model of convolutional neural network (CNN) and residual-gated recurrent unit (Res-GRU) is proposed to identify the PV array fault. This model consists of a 1-dimensional CNN module with a 4-layer structure and a Res-GRU module. It has the advantages of end-to-end fault diagnosis, no manual feature extraction, strong anti-interference ability, and usable in the absence of irradiances and temperatures. Moreover, it can not only identify a single fault (e.g., short circuit, partial shading, abnormal aging, etc.), but also can effectively identify hybrid faults. Experimental results show that the classification accuracy of the proposed method is 98.61%, which is better than the ones of the artificial neural network (ANN), the extreme learning machine with kernel function (KELM), the fuzzy C-mean (FCM) clustering, the residual neural network (ResNet), and the stage-wise additive modeling using multi-class exponential loss function based on the classification and regression tree (SAMME-CART). In addition, in the absence of temperatures and irradiances, the classification accuracy still reaches 95.23%, which has a broad application prospect in the online fault diagnoses of PV arrays.


I. INTRODUCTION
Photovoltaic (PV) power generation is a technology that converts solar energy into electric energy and is the most direct way of energy utilization. It has the advantages of not being restricted by geographical conditions, flexible in scale, safe and reliable, clean and environmentally friendly, and occupies a prominent position and proportion in renewable energy. A report [1] published by the 21st Century Renewable Energy Policy Network shows that since the growth rate of PV surpassed other renewable energies for the first time The associate editor coordinating the review of this manuscript and approving it for publication was Lefei Zhang . in 2016, new installed capacity has been increasing continuously. In 2018, the global new installed capacity of PV power generation was 100GW, with a total capacity of 505GW, and the total installed capacity of global PV power generation is expected to reach 1TW in 2021 [2]. In many countries, the PV has become an important and growing power generation.
The solar cell is the core component of a PV power generation. It is a P-N semiconductor, and its essential characteristics are similar to diodes. Its equivalent circuit is composed of a photo-generated current source, diodes and resistors (shunt and series resistances) [3]. Since the power of a single cell is very small, the module is usually composed of multiple cells in series and parallel connections, and then the array VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ is composed of modules in series and parallel connections. Therefore, the failure of cells or modules will affect the power generation performance of the entire system. Although most PV power generation systems have achieved real-time monitoring of the system's operating status, they can only display statistic operating data and events of a power outage. In general, the fault identification only can be realized through field equipment testing or data interpretation and analysis by experienced engineers, and limited human resources cannot meet the demands of a massive PV market. Therefore, with the development of technology and the upgrading of the industry, the PV system fault detection and diagnostic technology will become the driving force for the sustainable and healthy development of the PV industry, which is of great significance for promoting the scale expansion of PV power generation worldwide. Current PV fault diagnosis methods can be divided into two categories including the visual & thermal method and the electrical method. For a healthy PV module, solar radiation will make its surface temperature evenly distributed. When the module fails or accumulates dust, the affected cells will be forced to use part of its energy, causing the cells to overheat and causing the temperature of the module to be irregularly distributed. Therefore, a thermal imaging camera is a useful PV fault diagnosis tool [4]- [6]. Heraiz et al. [6] proposed a PV state assessment method by combining the thermal image and the convolutional neural network (CNN). The researchers investigated the unmanned aerial vehicles (UAV) to obtain PV thermal imaging data, and adopted the CNN algorithm to automatically detect and identify the relatively high-temperature region on the PV panel. Thus, it finally determined the hot spot location with high precision. In short, the visual & thermal method only can identify the abnormal heating caused by the fault, but cannot reliably identify the reason of the fault. Under normal circumstances, due to the different installation angles and manufacturing processes, each cell will have a temperature difference, which will affect the identification accuracy of the visual & thermal algorithm.
The electrical method can be further divided into the performance comparison method and the signal processing method. The performance comparison method distinguishes the normal state or the failure state by comparing the parameter characteristics of a PV array. The PV parameters include external parameters (open-circuit voltage V oc , short-circuit current I sc , the voltage V m and current I m at the maximum power point), internal parameters (photo-generated current I ph , ideal factor A, parasitic diode saturation current I o , series resistance R s and shunt resistance R sh ), the I-V curve, and the P-V curve. When a PV module encounters different faults, its internal and external parameters or the slope and shape of the I-V/P-V curves will change [7], [8]. Therefore, through analyses and derivations, the characteristic expression forms and rules to describe various faults can be found. Pei and Hao [8] investigated a new method to diagnose PV faults by observing and evaluating voltage and current changes. Voltage and current indexes were defined by the voltage V m and current I m at the maximum power point, the open-circuit voltage V oc , and the short-circuit current I sc . By comparing the index value and the threshold value collected in real-time, the fault of the system can be judged. Due to the influence of power station scale and module parameters, the method of judging fault state by the corresponding threshold values is weak in generalization. When the module performance is aging, the corresponding threshold values require to be reselected again.
The signal processing is a method to identify and locate faults by using the waveform signal decomposition, which is often used to solve the problems of line-to-line fault, dynamic shading, and arc fault in multi-series systems [9]- [13]. The line-to-line fault is also called as the mismatch fault, whose fault characteristics are very similar to the partial shading, and cannot be easily identified by the performance comparison. Instantaneous waveform decomposition and frequency transformation are often investigated to extract fault features and identify fault types due to the backflow of fault current in the line-to-line fault. Pillai and Rajasekar [12] adopted the uniqueness theory of the rightmost peak power (RPP) in PV array output characteristics to detect the line-to-line and line-to-ground faults. To this end, the global perturbation & observation method was used to track the RPP. Experimental results in [12] showed that it could effectively detect the faults regardless of the mismatch level, the system type, or the system rating. It was also effective at the conditions of low irradiance and partial shading. Kurukuru et al. [13] performed the wavelet decomposition of voltage and current signals to extract the energy, the entropy, the peak power spectral density, and the kurtosis as features, and then trained a radialbasis-function (RBF) neural network to identify 14 kinds of faults. Most of these online signal processing methods are characterized by the amount of signal mutation at the faulty time. Moreover, the corresponding signals are decomposed, and the rules before and after the change are discovered. By this way, they can identify faults without shutting down the system operation and avoid human-made power loss. If the fault occurs at night or during the system shutdown period, no waveform data can be extracted such that these methods in [9]- [13] cannot work.
In general, the application of artificial intelligence (AI) technologies based on data-driven mechanisms [14] helps to construct automatic fault classifiers and improves the efficiency and accuracy of faulty diagnoses. Recently, the methods of artificial neural network [15], probabilistic neural network [16], random forest [17], parity relation [18], observer [19], fuzzy inference system [20], extreme learning machine [21], and support vector machine [22] have been widely used in fault diagnoses for industrial physical systems and PV arrays. Jiang et al. [18] proposed a parity relationship algorithm based on residual generators to deal with the diagnostic problem of a Hammerstein nonlinear system, and verified it in the hot rolling mill fault identification. Wu et al. [19] designed an initial failure evaluation method based on the descriptor estimation to evaluate the status of high-speed railway traction devices. Then, a robust observer was also designed to early detect faults in the induction motor system of high-speed trains [23]. In order to identify five common fault types, Belaout et al. [20] dimensionalized the fault characteristics through neuro-fuzzy classifiers to extract the optimal combination of feature expressions for each fault type, and trained five classifiers to identify them one by one on the basis of considering all the feature parameters that can be extracted. Unfortunately, it requires using a normal value in the same environment for the feature normalization, and the normal value used in [20] should be obtained via numerical simulations, which leads to great difficulties in practical applications.
The basic framework for the application of traditional AI technologies is as follows. Firstly, it extracts the corresponding characteristics from the voltage, the current, and the power of a PV system, and then conducts the standardized processing. Finally, the AI algorithm is adopted to mine the difference of characteristics under different fault conditions for realizing the fault diagnoses. At present, more advanced AI algorithms have been developed to mine the differences between curves or images actively and recognize them, such as CNN [24], residual network (ResNet) [25], adversarial generative network [26], transfer learning [27] etc. Compared with traditional methods, these methods in [24]- [27] do not require the multi-step processing of the signal and realize the fast diagnosis. To address the problem of the lack of measured arc signal samples, Lu et al. [26] used the domain adaptive of deep convolutional generative adversarial network (DA-DCGAN) to realize the enhancement of arc samples from laboratory simulation to actual measurement environment. Then, the CNN was performed to identify serial arc fault. The deep learning model in [26] requires complex networks and deep structures, which brings difficulties to the selection and adjustment of hyper-parameters.
On the basis of reviewing and summarizing previous research results, this study integrates a variety of machine learning models, and takes the waveform of the I-V curve as input directly without extracting the internal and external parameters of a PV system. Moreover, it constructs an end-to-end shallow fault diagnosis model, and realizes the identification of single and hybrid faults of PV arrays. The framework of the proposed method is illustrated in Fig. 1. The first part is two-dimensional input data composed of I-V curve, irradiance, and temperature. Then, the second part is a multi-layer one-dimensional CNN network, whose role is to mine the shape characteristics of the I-V curve. Moreover, the third part is the residual-gated recurrent unit (ResGRU) network, which is responsible for the memory of time and space information of the input features. In addition, the fourth part is the output layer, which implements the fault classification. The main contributions of this research are summarized as follows.
(1) The convolutional neural network and the gated recurrent unit (GRU) are fused and modeled, and the input information is mined from the space and time dimensions, which improves the diagnostic accuracy.
(2) The ResGRU is used to replace the ordinary GRU for solving the problem of the network degradation, and the time correlation of input information is well preserved.
(3) A modeling method aimed at mining the difference of I-V curve shape under different faults is proposed. Further analyses show that the operation status of the PV array still can be accurately determined in the absence of meteorological information.
This study is organized into seven sections. Section II introduces the I-V curve characteristics of a PV system under different faults. Section III briefly describes the algorithms used in this study. Section IV elaborates on the framework and diagnosis process of the constructed model. Section V verifies and analyzes the proposed method through numerical simulations and measured data. Section VI emphasizes the advantages of the proposed method by comparing with other methods. The summaries are given in Section VII.

II. I-V CURVE CHARACTERISTICS OF PV FAILURES
A photovoltaic (PV) array is usually composed of multiple modules combined in series and parallel connections. The output dc voltage and current level of the PV system will be determined by series and parallel connection amount. The roof PV power generation system of a family generally adopts the power supply structure of single series PV modules. Each string is fed into an independent control unit for the maximum power point tracking (MPPT) to achieve the maximum energy output. In commercial power stations, from the perspective of operation scale and cost control, energy is collected through the MPPT control after connecting many series modules in parallel. This study mainly focuses on the fault diagnosis research in the single-series PV system, and takes 13 modules as an example to construct a simulation model based on the I-V curve test circuit in [21]. Under standard test conditions (STC, G=1000W/m 2 , T=25 • ), the I-V curve performance of the PV array operated at various states, e.g., the normal state, the short-circuit (SC) fault, the partial-shading (PS) fault, the abnormal aging (Aa) fault, hybrid faults, will be analyzed.
The fault of SC refers to an accidental connection between two points of different potentials in the PV array. In a single-series PV system, the short-circuited modules have no energy output, and the array will lose the energy of this part modules. Compared with the normal state, the characteristic of the I-V curve is that all the open-circuit voltage V oc and the maximum power point voltage V m are reduced, and the voltage-loss is proportional to the number of short-circuited strings. The curve remains smooth, and the short-circuit current I sc and the maximum power point current I m basically remain unchanged, as shown in the curve 2 of Fig. 2(a). The fault of PS means that the performance parameters of PV modules connected in series are different due to objects or shadow occlusion, and the shaded modules are overheated or even burned due to performance degradation. According to the conduction of the bypass-diode of the fault string, the PS fault can be divided into two types including the partial-shading with bypass-diode on (PSBO) and the partialshading with bypass-diode reversed (PSBR). When the PSBO failure occurs, the covered part of the cells becomes load, resulting in no current output of the cell string where it is located. At this time, the I-V curve of the array presents the double-peak shape as shown in the curve 3 of Fig. 2(a), and the maximum power point is at the left peak. When the PSBR failure occurs, the output current of the array is restricted and is equal to that of the fault modules. The I-V curve also shows the double peak as shown in the curve 4 of Fig. 2(a), and the maximum power point is at the right peak. From the appearance of external parameters, the open-circuit voltage V oc , and the short-circuit current I sc at both the PSBO and PSBR faults are the same. However, the voltage V m at the maximum power point of the PSBR is greater than that of the PSBO, and the current I m at the maximum power point of the PSBR is smaller than that of the PSBO. In addition, although both failures will have double peaks, the heights of the peaks differ significantly.
The fault of Aa can be divided into the abnormal in series resistance and the abnormal in parallel resistance. In these cases, the probability of abnormal aging of series resistance is relatively large, and it is easy to simulate, which is the focus of this research. The abnormal aging of series resistance means that the series resistance inside the cell suddenly becomes very large, making the voltage drop on the series resistance greater than the output voltage of the entire string of cells. This result causes the bypass-diode conducted, and the output curve appears abnormal or even multi-peak. From the perspective of external parameters, the fault of Aa is the same as that of the PSBO, which is prone to misjudge (as shown in the curve 5 of Fig. 2(a)). Fortunately, near the point of opencircuit voltage, the slope of curve changes abnormally. Thus, the slope of the open-circuit point (R oc ) was used in [28] to characterize the aging degree of the module.
The hybrid-fault waveform of SC and PSBO (SC&PSBO) is depicted in the curve 6 of Fig. 2(b), and the external parameters are the same as the SC fault. However, the curve of the SC&PSBO fault has double peaks. The hybrid-fault waveform of SC and PSBR (SC&PSBR) is depicted in the curve 7 of Fig. 2(b). All the values of V oc , I m and V m are reduced, and the short-circuit current I sc remains unchanged. Moreover, the curve presents a double-peak shape, but the second peak is higher than that of the SC&PSBO fault. By this way, the fault identification can be performed regardless of external parameters or the shape of I-V curve. The hybrid-fault waveform of the SC and Aa (SC&Aa) is depicted in the curve 8 of Fig. 2 The external parameters are also the same as the SC fault, but the slope at the end of the curve is abnormal. The hybrid-fault waveform of Aa and PSBO (Aa&PSBO) is depicted in the curve 9 of Fig. 2(b). The external parameters are the same as that of the PSBO, and the slope at the end of the curve is also abnormal. The hybrid-fault waveform of Aa and PSBR (Aa&PSBR) is depicted in the curve 10 of Fig. 2(b). The external parameters are the same as that of the PSBR, but the slope at the end of the curve is abnormal. The hybrid-fault waveform of PSBO and PSBR (PSBO&PSBR) is depicted in the curve 11 of Fig. 2(b). The voltage V m and current I m at the maximum power point are decreased, which are inconsistent with other fault types. The open-circuit voltage V oc and the short-circuit current I sc remain unchanged, but there are more than three peaks on the curve, which have obvious differences.
For a single fault, the external parameters at different fault types under the STC are obviously different, which is conducive to identify the fault type. As for hybrid faults, the external parameters are no longer sufficient, and other indicators are required to assist. From the foregoing analyses, it can be seen that for different fault types, the shape of the I-V curve has obvious differences, and the external parameters are also included in the I-V curve. Therefore, it is the simplest and most direct way to diagnose with the I-V curve as the input features. Moreover, it can be seen from the I-V characteristic equation in [14] that temperatures and irradiances have a great influence on output voltages and currents. These effects are mainly manifested in the amplitude, rather than the shape of the curve. For the sake of prudence, temperatures and irradiances are also used as input features, but the impact of the lack of them on the accuracy of the algorithm will be further discussed. In short, the traditional method of mining or calculating key node indicators from the I-V curve is abandoned by this research. The I-V curve, temperature and irradiance are directly used as the input data, and an end-toend shallow machine learning model is established to realize simple and fast fault diagnoses.

III. METHODOLOGY A. CONVOLUTIONAL NEURAL NETWORK
Convolutional neural network (CNN) is generally used for the machine vision and the natural language processing. It has characterization learning capabilities and can extract high-level features from the input data. One-dimensional CNN (1-D CNN) can be used for the time-series data processing, and two-dimensional CNN (2-D CNN) can be used for the visual processing such as the image recognition [29]. The structure of the 1D-CNN is depicted in Fig. 3, which is composed of the input layer, the convolution layer, the pooling layer, the fully connected layer, and the output layer. The convolution layer extracts the features through convolution kernels of different sizes, and the pooling layer reduces the dimensionality of information by compressing data. The convolution layer and the pooling layer alternately appear for effectively extracting and retaining data features. The fully connected layer flattens the distributed features extracted from the different spaces to achieve regression or classification. The CNN focuses on the local feature extraction and reduces the number of weights through the parameter sharing, which greatly reduces the calculation parameters of the network.

B. GATED RECURRENT UNIT
Gated recurrent unit (GRU) and long short-term memory (LSTM) can be regarded as variants of recurrent neural networks (RNN), which are usually used to deal with the sequence problems. They can solve the long-term memory in the traditional RNN and the gradient explosion problem in the back-propagation algorithm [30]. The GRU and the LSTM use a gate structure to replace the hidden unit in the standard RNN structure, which can selectively memorize important information and forget unimportant information. Compared with the LSTM, the GRU replaces the input gate, the forget gate and the output gate of the LSTM with the update gate z t and the reset gate r t [31]. Under the condition of the prediction accuracy of the GRU to be not lower than the one of the LSTM, the training parameters can be reduced to achieve a faster convergence speed. The structures of the traditional RNN and the GRU are depicted in Fig. 4.
Intuitively, the reset gate r t determines the combination of new input with the previous memory, and the update gate z t defines the number of previous memories saved to the  current time step. The larger the value of z t , the more state information from the previous moment can be retained to the current moment. The smaller the value of r t , the more state information from the previous moment will be forgotten [32]. Therefore, the working principle of the GRU can be summarized as follows. The first step is to calculate z t and r t according to the input state information x t at the current moment and the hidden layer information h t−1 memorized at the previous moment. The second step is to use the reset gate to determine the number of new information stored in the nodeĥ t . The third step is to calculate the hidden layer output at the current moment through the update gate. The following formulas are applied to describe the calculation process of the GRU.
where σ is the sigmoid function; W z , W r , W h , U z , U r and U h are weight matrices; b z , b r , b h are the bias values;ĥ t is the sum of the input state x t and the hidden layer output h t−1 at the previous moment; h t is the output of the hidden layer at the current moment; ⊗ is the hadamard product.

C. RESIDUAL-GATED RECURRENT UNIT
Residual neural network (ResNet) can solve the problem of model performance degradation and non-convergence caused by network depth [33]. In the structure of residual accumulation layer (as shown in Fig. 5(a)), it is assumed that the input is x and the characteristic learned by the network is H (x). It is hoped that the network can learn the residual F(x) = H (x)-x, so that the original learning feature of the network is increased VOLUME 8, 2020 to F(x) + x. When the residual is zero, the accumulation layer only plays the role of identity mapping to avoid the redundancy generated by the redundant network layer. As for the case of gradient descent, the network can effectively deal with performance degradation. However, the residual is often not equal to zero in practice. It will make the accumulation layer learn new features based on the input features, thereby having a better performance. An ordinary GRU can solve the problem of gradient explosion to a great extent. However, once the amount of input data increases, the GRU will also cause network degradation to some extent, resulting in the loss of some characteristics of input information. To address this problem, a network of ResGRU is proposed in this study. The GRU module is adopted in the residual block to extract the features of the time series. The structure of the ResGRU is depicted in Fig. 5(b), and the dashed line represents the dimensions matching of the input and the output. Structurally, the output of the residual block is equal to the sum of the output of the last layer of the GRU and the input x. Assuming that the last layer output of the GRU is y, the output y R of the residual block can be expressed as where relu(·) represents the relu activation function; BN (·) is the batch normalization function; γ and β represent two learnable variables in the function; g(·) is the adjustment function, making x t and h t with the same dimension. Through the residual connection, the ResGRU network can better remember the correlation between the information before and after the time-series data, and improve the classification performance of the network while retaining the characteristic information of the original data.

IV. NOVEL CNN-RESGRU MODEL
The analyses in Section II show that the I-V curve of the photovoltaic (PV) array has significant differences under different faults. By learning the shape of the I-V curve, single and hybrid faults of PV arrays can be identified. As for I-V curves, traditional identification methods usually only extract key indexes, such as the voltage V m and current I m at the maximum power point, the open-circuit voltage V oc , and the short-circuit current I sc as fault characteristics. By this way, it will destroy the timing sequence of I-V curves, and the rules contained in the curve cannot be retained and reflected. The convolutional neural network (CNN) can fully mine the relationship between various local features, and the ResGRU can memorize the mined temporal dynamic features, making it easier for the model to capture the characteristics of correlation and dependence among I-V curves. Therefore, a novel PV fault identification method by combining the CNN and the ResGRU is proposed. The fault characteristics and rules contained in the I-V curve are mined and extracted effectively based on the end-to-end machine learning model, which simplifies the diagnosis process and improves the identification accuracy.

A. MODEL ARCHITECTURE
The proposed diagnostic model is depicted in Fig. 6, including the 1-D CNN module, the ResGRU module, and the fully connected module. Specifically, it is mainly composed of the input layer, the convolution layer, the pooling layer, the Res-GRU layer, and the output layer. The information of voltage, current, irradiance, and temperature are extracted through the convolution layer, and then the features are reduced by the pooling layer. After that, the ResGRU module is used to memorize and digest the laws contained in the previous features, and finally it is classified via the fully connected layer. Suppose S 0 is the input sequence matrix, and S i is the i th output sequence matrix. The layer-wise description of the model is expressed as follows.
(1) Input layer. The information of voltage, current, irradiance, and temperature are used as the input data for the input layer. In order to enable the convolution layer for acquiring the information of voltage, current, irradiance and temperature simultaneously when scanning the data, the irradiance and temperature of a single point are expanded to the vector with the same length of voltage and current, and integrated into an n×4 matrix. S 0 = [V , I , T , I rr ], where V is the voltage vector, I is the current vector, T is the temperature vector, I rr is the irradiance vector, and the length of each vector is n.
(2) Convolution layer. The function of the convolution layer is to extract local features from the sequence data. The filter sliding direction of one-dimensional convolution is along the time axis, so the shape of the convolution kernel is generally designed as a rectangle. In the CNN, the convolution layer and the pooling layer usually appear alternately. It is assumed that S i (i is odd) is the output matrix of the convolutional layer, which can be described as where w i is the weight of the i th layer, b i is the bias of the i th layer, and f (·) is the activation function.
(3) Pooling layer. The pooling layer is mainly used to compress the features extracted by the convolution layer to reduce the information dimension and decrease the probability of network overfitting. Maximum pooling or average pooling is the most common method. In practice, the performance of the former is better than the one of the latter. In this study, a rectangular maximum pooling function (z × 1) is used, and the maximum value in the pooling core is retained as the output feature to reduce the computational complexity from the upper hidden layer. The output matrix S m (m is the even number starting from 2) of the pooling layer can be expressed as where Y refers to the maximum pooling function. The size of S m is p/z × q, in which p and q are the scale of features of S m−1 layer, and z is the scale of the current pooling layer.
(4) ResGRU layer. The ResGRU layer learns the feature vectors extracted by the CNN, and remembers the internal rules of different features. The residual block is composed of two GRU modules. After each GRU module, the batch normalization is performed, so that each GRU module has independent parameter adjustment ability to speed up model convergence. Moreover, the batch normalization of the first module is connected to the activation layer with the ReLu as the activation function.
The 1-D CNN module outputs a two-dimensional feature, including dimensions of time and spatial (channel). Specifically, the function of the ResGRU is explained as follows. The first GRU memorizes and mines spatial dimension information to return data of complete time dimension. The output results are transposed into the second GRU to realize the memory and mining of time dimension information. However, the result of this processing is that the input and output dimensions of the residual block do not match. Therefore, in the identity mapping part, dimension matching needs to be completed. The processing method of the ResNet is to use 1 × 1 convolution kernel to achieve channel dimensionality reduction. However, simple convolution cannot be achieved in this study due to changes in the time scale of the output data. For this reason, a GRU module is used in identity mapping to achieve dimension matching. Compared with ordinary GRU units, the ResGRU not only has an independent residual learning function, but also proposes features of different time and spatial scales from the original information through multiple connected pipelines.
(5) Output layer. It is essentially a full-connected layer, and its role is to classify. Therefore, the softmax is selected as the activation function. At this layer, the model calculates the probability of each type of input sample, and then obtains a new expression (y predict ).
where f (·) represents the softmax activation function; l i represents the probability that the input sample belongs to the i th category; W represents the weight, and b is the bias value. As for the model training, random numbers are generally used to initialize the weight matrices and the bias values to ensure that each parameter is not repeated and the difference is not large. In this study, the he_normal in [34] is used to achieve this work, so that the data has a good constant variance when input to the first convolution layer. Moreover, the L2 regularization in [35] is selected at the output layer to accelerate the convergence speed of the network and prevent the network overfitting. In addition, the cross-entropy is adopted to process the output probability model.

B. LOSS FUNCTION
According to the aforementioned description, the training and diagnostic process of the PV fault diagnosis model in this study can be summarized as follows. S1: Collect the I-V curve when the PV array is off-grid, and record the temperature and irradiance at the time of measurement.
S2: The temperature T and the irradiance I rr are expanded and constructed together with the I-V curve into an n × 4 matrix as the input data for the diagnostic model.
S3: Database is divided into the training set, the validation set, and the testing set.
S4: Put the data in the training set into the CNN-ResGRU model and train model parameters according to the random search algorithm. Moreover, the data in the validation set is used to make a preliminary assessment of the capability of the model to determine whether to re-calibrate or restructure the model. S5: Test the trained diagnostic model with the data in the testing set to evaluate the diagnostic accuracy of the model. VOLUME 8, 2020

V. VALIDATION AND ANALYSIS A. NUMERICAL SIMULATIONS 1) MODEL CONSTRUCTION
The MATLAB/SIMULINK software is used to build a simulated platform for capturing numerical simulation data. Its structure is depicted in Fig. 7. The photovoltaic (PV) array is made up of 13 modules in series connection, each module is made up of 60 cells in series connection, and every 20 cells are connected in parallel connection with a reverse bypass-diode. The parameters of the module are summarized in Table 1. The output voltage is controlled to change linearly from 0-V oc through a controlled voltage source, and then the voltage and current oscilloscopes are performed to record the output of the PV array to draw the I-V curve. The voltage and current sequence length obtained from this simulation system are 1000.  The data of four single faults including the short circuit (SC), the partial-shading with bypass-diode on (PSBO), the partial-shading with bypass-diode reversed (PSBR) and the abnormal aging (Aa), and six hybrid faults composed of arbitrary two single fault combinations can be obtained by numerical simulations. As shown in Fig. 7, the number of short circuits of the module is controlled by short-circuiting wires. By adjusting the temperature and irradiance of a single cell in the PV array, the faults of the PSBO and the PSBR can be generated. By adjusting the amplitude of the aging resistance, the Aa fault can be simulated. Finally, a total of 1320 samples can be extracted as the data set. The fault types and the corresponding sample numbers are summarized in Table 2, where the SC fault includes one to three modules short circuit, respectively, and each type has 40 samples. The ratio of the training set, the verification set, and the testing set is determined as 6:2:2 in this study.

2) SELECTION OF HYPER-PARAMETERS
The PV fault diagnosis model is built according to Section IV, and all the work are done in the Keras platform. The resources of server are XEON W-2123 CPU, 2 * GTX 1080 Ti GPUs, 32G RAM. One performs the random optimization for the parameter adjustment on the constructed network, and the finally selected network hyper-parameters are summarized in Table 3. During the training process, the maximum epoch of the network is 1000; the initial learning rate of the Adam is 1e-4, and the batch is 32.

3) FEATURE VISUALIZATION
In order to verify the effectiveness of the proposed method in the feature extraction and the rationality of model parameter design, the t-distributed stochastic neighbour embedding (t-SNE) [36] is used to visually show the model's distribution effect on the features extracted from the samples in the training set. In the visualization process, the principal component analysis (PCA) is used for dimensionality reduction. Figure 8 shows the two-dimensional visualization results given by the t-SNE scheme.   Figure 8(b) is the result of visualization of the data after the 1-D CNN module. At this time, there is still no obvious distinction between the data, but clusters are slowly formed. At the same time, it also shows that the features extracted by the CNN is not clear enough and needs to continue mining. The data shown in Fig. 8(c) has undergone residual block processing, and it can be found that different types of data gradually have a clear distinction. Although the classification effect is gradually clear, the same type of data is not closely clustered in Fig. 8(c). Figure 8 (d) is the visualization result of the final output data. It can be clearly seen that the distinction between different types of data is very high, and similar data forms clusters. Except for individual discrete points, the rest of the data are successfully and accurately classified. The above change process shows that the proposed method gradually enhances the feature recognition through feature extraction layer-wise.
With just a few layers of network, the features with high recognition can be extracted, which reflects the high-quality feature mining ability of the proposed method.

4) ANALYSES OF TRAINING AND TESTING RESULTS
Accuracy and recall are commonly used as evaluation indicators in the field of machine learning. The accuracy reflects the proportion of samples that are correctly identified, and the recall reflects the recall ability of the algorithm, that is, the number of positive samples that are correctly predicted. The calculation formulas [37] of the accuracy and the recall can be expresses as where TP is the true positive category, FN is the false negative category, FP is the false positive category, and TN is the true negative category. Figure 9 (a) and (b) are the accuracy curve and the loss curve after 1000 iterations. The final training accuracy of the model reaches 100%. It can be found that when the number of iterations is less than 100, the accuracy and the loss change  rapidly and converge quickly. When the number of iterations exceeds 100, the accuracy and the loss gradually stabilize. According to the results in Fig. 9, the epoch can be set as 500.
The trained model is used to predict the categories of the testing set. In order to evaluate the classification results more intuitively, a confusion matrix is used to express the relationship between the prediction result and the true label, and the generated confusion matrix is normalized by row to obtain a standardized one. The corresponding result is depicted in Fig. 10. The values in Fig. 10 represent the probability that the actual value is predicted by a certain label, and the diagonal is the recall of each category. Obviously, the model has a recall of 1 for each category, and there is no sample of missed judgment. The testing result fully reflects the high recognition accuracy of the proposed method.

B. EXPERIMENTAL VERIFICATION 1) EXPERIMENTAL PLATFORM
In order to further verify the performance of the proposed method in practical applications, a set of PV power generation system is built to simulate a series of failures that the PV array may encounter. The photograph of the experimental platform is depicted in Fig. 11. The capacity of the PV array is 3.38kWp, which is composed of 13 modules connected in series, and each module is 260W. The parameters of the module are summarized in Table 4. The solar system analyzer named as PROVA-1011 is used to collect the I-V curve of the PV array, and the matching sensor is used to measure the solar irradiance and the back temperature of the module. Two Y-type taps are short-connected to produce the shortcircuit (SC) fault, i.e., the modules in the middle of two taps are short-circuited. Broken debris such as small bricks block a module to generate the partial-shading with bypass-diode reversed (PSBR) fault. At this PSBR fault, the shading area is small, and the bypass-diode of the faulty string is not enough to conduct. Some modules are blocked by film and paper to cause the partial-shading with bypass-diode on (PSBO) fault. At this PSBO fault, the shading is more serious, and  the bypass-diode of the faulty string is all in the conducting state. A sliding rheostat is used as an aging resistor in series in the module to simulate the abnormal aging (Aa) fault of the module. During the experiment, the irradiance range is about 150W/m 2 to 1000W/m 2 .
The mimic fault types in the experiment can be found in Table 5, where ten faulty states and one normal state are included and collected. The amount of 1892 samples is collected to form a data set, in which the length of the voltage and current sequence of each sample is 149. With a ratio of 6:2:2 for each category, i.e., 1136 samples are randomly selected as the training set, 379 samples are selected as the validation set, and 377 samples are the testing set. Preliminary tuning found that the established diagnostic model can be trained directly using simulation data to obtain better classification results. Because the two input data dimensions are inconsistent, the number of neurons in the input layer of the measured data diagnosis model is changed to 11, and the other hyper-parameters are the same as Table 3. Finally, the training set and the validation set are used to train the weight parameters of the measured data model.

2) FEATURE VISUALIZATION
The t-SNE is also used to analyze the feature extraction effect of the model on the measured training data set, and the result is depicted in Fig. 12. Figure 12 (a) is the visualization result of the input data. It can be found that the data of the SC&PSBO, the PSBO, and the PSBO&Aa have a certain degree of recognition from the beginning, and the data of other categories are still in a free state. The reason may be that the simulation data generation conditions are relatively ideal, and the range of weather conditions that can be covered is relatively wide, so the data is relatively scattered. The measured data is restricted by the collection conditions, and the meteorological conditions are relatively  concentrated, which leads to certain data with a certain degree of recognition. Figure 12(b) is the visualization result of the data processed by the 1-D CNN. Although the discrimination between different categories of data is not significantly improved compared to the one in Fig. 12(a), the distance between different clusters starts as the data becomes larger, the data of the same cluster begins to gather. The data shown in Fig. 12(c) has been processed by the ResGRU. Except for normal and Aa samples, the data of the other categories are clearly distinguishable. Figure 12(d) shows the distribution of the final output data, and the best discrimination between different categories can be achieved. However, it can be found that there is a small amount of overlap between the normal category samples and the Aa category samples, and a small number of PSBR category samples appear in the space of the normal category samples.
Independent observation of these confused samples found that a small number of problem samples appeared in the PSBR and Aa samples, and their I-V curves are depicted in Fig. 13 (b) and (d). Under normal circumstances, the PSBR category curve will have obvious double peaks (as shown in the curve 4 of Fig. 2(a)), but the double peaks of problem sample of the PSBR are almost difficult to observe (as shown in Fig. 13(b)). The short-circuit current of it is 6.2A, which means that the irradiance has reached 620W/m 2 , which is not a condition of insufficient sunlight. This curve is easily confused with the normal category sample (as shown in Fig. 13(a)). For the problem sample of the Aa fault shown in Fig. 13(d), the slope of the open-circuit voltage changes slightly. Careful observation reveals that there is only a slight mutation in the last two points. However, the slope of the open-circuit voltage of the Aa sample has a significant change (as shown in Fig. 13(c)). As a result, the probability of identifying the aforementioned problem samples as the PSBR fault or the Aa fault based on the shape and external parameters may be very low. During the experiment, the failure mode is fixed, and then as the irradiance changed, the solar system VOLUME 8, 2020 analyzer is used to collect data at intervals. Therefore, the failure categories of these problem samples should be correct.
Since it takes about 20s for the solar system analyzer to collect complete data once, there may be two reasons for the appearance of problem samples: (1) During the experiment, the waveform shape had changed due to the irradiance variation; (2) The analyzer adopts a non-equal interval collection mechanism (e.g., the front is dense, and the back is loose), and the data points collected near the open circuit are too few, resulting in the slope change un-obviously. In view of this, this research still puts problem samples into the data set to evaluate the recognition accuracy of multiple identification methods. It is worth noting that the I-V curve of the measured sample has a gradient drop process at the starting position of the short-circuit current. After testing, it is found that this process is caused by the hardware acquisition device. If other fast I-V acquisition instruments are used, the gradient would not be obvious. Fortunately, the proposed method can eliminate the adverse effects of this defect and make an accurate classification.

3) ANALYSES OF TRAINING AND TESTING RESULTS
The change curves of the accuracy and the loss of the model after 1000 iterations of the training set are depicted in Fig. 14. The final training accuracy is 99.65%. The accuracy and the loss of the training set converge quickly after 100 iterations, and the stability is higher than the numerical simulation model. The final epoch is also set to 500, which is the same as the numerical simulation model.
The data in the testing set is used to examine the trained diagnostic model, and the standardized confusion matrix shown in Fig. 15 is also applied to display the results. The diagonal elements of the matrix represent the recall of each category. The overall accuracy of the measured data is 98.41%. The recall of the normal category is 0.94, and the recall of the Aa category is 0.75. In other words, some normal samples are misjudged as the PSBR and Aa categories.  Similarly, some Aa samples are misjudged as the normal and PSBR categories. The reason for the normal sample to be misjudged as the PSBR category is that the double peaks of the above-mentioned part of the PSBR samples is not obvious and is close to the normal sample. The slope change of the open-circuit point of the Aa sample is not obvious, which is also close to the normal sample. Therefore, in the training process, the features excavated from these three fault categories will tend to be homogeneous, resulting in misjudgment in the test. It can be seen from Fig. 15 that the proposed method achieves 100% recognition of most fault types.
The measured data used for modeling was collected in May 2018. Generally speaking, the performance of PV modules will be affected by seasons and service life, resulting in differences data distribution at different times. For example, at noon, the solar irradiance in summer will generally be greater than that in winter, and the panel temperature will be significantly higher. As the operating time increases, the PV module will have a power attenuation of 1%-2% per year. For conventional machine learning algorithms, the expansion of data disparity will cause the model to gradually fail. In order to verify the reliability of the proposed algorithm, 270 samples collected from April 2019 to August 2020 by the same experimental equipment are further used to test. Experimental results in Table 6 show that the season and power attenuation have little effects on the identification accuracy. The reason is that the proposed method is characterized by the shape of the I-V curve rather than the value of the electrical quantity. The factors of the season and power attenuation may affect the value, but it will not change the shape of the curve.

4) IMPACT OF DATA MISSING
With the development of PV inverter technologies, more and more inverters have been equipped with I-V curve scanning and data recording functions. Therefore, the proposed method is expected to be applied to online fault diagnoses. At present, most PV power stations are not equipped with environmental sensors. Even if a small number of power stations are equipped with independent environmental sensors, they cannot achieve the synchronous collection of I-V curves and environmental information. In other words, under existing conditions, it is still difficult to obtain I-V curve, irradiance, and panel temperature information online simultaneously. In the case of only I-V curve data, whether the proposed method has the same recognition ability or not that needs to further verify. Because the short-circuit current and the open-circuit voltage of the array are respectively related to the irradiance and the temperature, whether the latter can be substituted is also worth exploring.
The cross-correlation analyses of the short-circuit current, the open-circuit voltage, the irradiance, and the temperature are carried out with the measured data set as the research object. The Pearson correlation coefficients of the above four variables are summarized in Table 7, which reflects the close relationship between different variables. It can be found that the short-circuit current and the irradiance have a strong correlation, with the correlation coefficient up to 0.99. The correlation between the temperature and the open-circuit voltage is weak at -0.39. In other words, the short-circuit current can be used instead when the irradiance is absent.
Therefore, this study sets four cases of missing and filling of input data, and analyzes the diagnostic effect via modelling verifications. By using the same data set and division ratio, it can be compared with the above data integrity modelling method. Four situations include the case 1-irradiance missing, the case 2-temperature missing, the case 3-all irradiance and temperature missing, the case 4-all irradiance and temperature missing, but the short-circuit current is used instead of the irradiance. Table 8 gives the test results of various modelling methods. It can be clearly found that the lack of the irradiance and/or the temperature has a limited impact on the proposed method. If only single information is missing, the accuracy is reduced by up to 2.12%. If both data are missing, the accuracy is reduced by 3.18%. If the case 4 occurs, the accuracy is improved by 1.32% compared to the case 3, which shows  that the utilization of the short-circuit current to replace the irradiance has a certain effect. In fact, the short-circuit current can be easily collected on the I-V curve. In addition, if only the temperature is missing, the recognition accuracy declines very little, indicating that the temperature plays a small role in the entire recognition system. In short, if the irradiance and temperature cannot be obtained, only using the I-V curve as input also can obtain a higher diagnostic accuracy. This conclusion further expands the application of the proposed method.

A. PERFORMANCE COMPARISON OF DIFFERENT FUNCTIONAL MODULES
In order to further analyze and compare the function of each module in the proposed method, three deep learning models are constructed, namely the CNN model, the ResGRU model, and the CNN-GRU model. The CNN model uses the LeNet-5 structure based on one-dimensional data, and the ResGRU model uses two residual block structures. The CNN-GRU model consists of a four-layer CNN network and a one-layer GRU [29]. The same measured data set and distribution ratio as before are used for the performance comparisons. Table 9 shows the execution results of four models. It can be clearly seen that the proposed fusion model performs best on both the training set and the testing set. Although the CNN model has the shortest execution time, the accuracy of the testing set is nearly 4% behind the proposed method. The accuracy of the ResGRU model for the testing set is close to that of the proposed method, but the training time of each epoch is very long, which increases the difficulty of tuning parameters. The accuracy of the CNN-GRU for the testing set is 13% lower than that of the training set, and there is undoubtedly an overfitting problem.   [15], [21], [25], [28] and [38].
In short, the CNN is able to fully explore the relationship between various local data features, but it requires a greater network depth. The ResGRU can memorize and model the temporal dynamic characteristics, making it easier for the model to capture the interrelation and dependent characteristics between I-V curves. However, its multiple residual modules make the training and execution time much longer. The proposed method combines the advantages of the CNN and the ResGRU, and uses the CNN to mine features. Through the relationship between ResGRU memory features, a good classification effect can be achieved with less network depth and shorter training time.

B. COMPARED WITH OTHER METHODS
In order to compare and evaluate the performance of the proposed method, the proposed method is compared with the other five methods used in [15], [21], [25], [28] and [38] from both qualitative and quantitative aspects. The qualitative analytic results are summarized in Table 10, and the quantitative analytic results are listed in Table 11. The motive of this comparison is not only to show that the proposed method has advantages in diagnostic accuracy, but also to show the optimal comprehensive performance of the proposed method by analyzing the similarities and differences of various methods. In Table 10, whether the diagnoses of multiple faults, the utilization of the weather data, and the low-irradiation factor consideration or not are compared. Moreover, the method type and the input data amount are also discussed in Table 10. Note that, one applies the corresponding methods from those works in [15], [21], [25], [28] and [38] to the same dataset used in this study, not the accuracy records from the references. In other words, the training set and the testing set are the same in Table 11 for fair comparisons.
As shown in Table 10, Chine et al. [15] adopted a twostage strategy to identify the PV faults. In [15], it calculated the difference between the actual power and the theoretical power to determine whether a fault occurred or not, and then used the threshold method and the ANN network respectively to determine the specific fault type. However, there are only few fault types in [15] to be diagnosed. When the fault types increase, additional thresholds and rules should be redesigned. Chen et al. [21] combined the random artificial bee pollination and the Nelder-Mead simplex optimization method to calculate the ideal factor (n n ), the series resistance (R s ) and the parameter estimation error (RMSE), and adopted the voltage (V m ) and current (I m ) at the maximum power point, the open-circuit voltage (V oc ), and the short-circuit current (I sc ) to form the input data set. Moreover, the extreme learning machine with kernel functions was used to identify the normal state, the abnormal aging fault, the shortcircuit fault, the partial shading fault, and the open-circuit fault. Chen et al. [25] also proposed the use of the deep residual network (ResNet) combined with I-V curves for PV fault diagnoses. The data of I-V curves, temperatures, and irradiances were used as inputs to the model in [25]. In view of the problem that the length of the input data is too long and the training time is affected, the data was firstly down-sampled non-uniformly to make the data points sparse and uniform, and finally a 4 × 40-dimensional input data was obtained. On this basis, a 34-layer ResNet model was established to identify single faults such as partial shading, abnormal aging, short circuit, and open circuit. By using the trust-region affine method, Huang et al. [28] optimally solved the nonlinear least square method and realized the standardization of external parameters of a PV system. With the normalized external parameters as the characteristics, the stage-wise additive modeling using multi-class exponential loss function based on the classification and regression tree (SMME-CART) algorithm was used to realize the PV fault classification. Zhao et al. [38] proposed a PV array fault diagnostic method based on the fuzzy C-means (FCM) clustering and the fuzzy membership algorithm. The power (P m ), voltage (V m ) and current (I m ) at the maximum power point, the open-circuit voltage (V oc ), and the short-circuit current (I sc ) were selected as the input data. The FCM was used to cluster fault samples, and then the fuzzy membership was investigated to determine the clustering center distribution of all fault samples. Finally, the threshold method was applied to realize the short-circuit and partial-shading fault identification. Although Chen et al. [25] uses the same input data as the proposed method, it has the same problems as [21] and [38], that is, it does not consider the identification of hybrid faults and does not discuss the impact of missing environmental data. In addition, [15], [21], [28] and [38] are typical methods of manually extracting indicators from the I-V curve as the input data.
When modelling, the structure and hyper-parameters of the ResNet model are consistent with [25]. Considering that the types of faults judged by Chine et al. [15] are relatively few, the features used cannot cope with single and hybrid faults identification mentioned in this study. Therefore, on the basis of the original features in [15], the power (P m ) at the maximum power point, the fill factor (FF), and the ratio of the voltage (V m ) at the maximum power point to the open-circuit voltage (V oc ) are added in [15] for fair comparisons. The test results of the six methods are shown in Table 11 and Fig. 16. The results show that the proposed method is significantly better than the other five methods in both the overall accuracy of the testing set and the recall of a single category, and the fault identification effect is the best one. Specifically, Chine et al. [15] and Chen et al. [21] have similar recognition accuracy for test samples, only about 85%. Chine et al. [15] can accurately identify the categories of the PSBR, the SC&PSBO, and the PSBO&PSBR. On the other hand, Chen et al. [21] can accurately identify the categories of the SC&PSBO and the Aa&PSBO. For other types of samples, both methods have misjudgments. The overall identification accuracy of the model proposed by Chen et al. [25] is 95.76%. For the problem samples mentioned in Section V, the ResNet model does not work well. Moreover, when it faces the SC failure samples, it will also produce misjudgements. Chen et al. [25] and the proposed method both use the I-V curve as the input data, and the fault features mined are more comprehensive. Thus, the diagnostic accuracy is relatively high. As for the method in [28], with the exception of the PSBR, the PSBR&Aa, the PSBO&PSBR faults, other types of faults all have misjudgements, with an overall accuracy of 91.53%, and finally ranked third.
Among all the methods, Zhao et al. [38] has the lowest recognition accuracy, only 67.64%. In the recognition of a single category, only the PSBO fault can be accurately recognized. Because the characteristics studied in [38] were obtained under a high irradiance and did not take hybrid faults into account, it is reasonable that the evaluation results were not ideal. In [38], the choice of input features has a greater impact on the results. When the features extracted in [28] were used as the input information for [38], its recognition accuracy can reach 96.55%.
In addition, the comparisons of test times of various methods for a single sample are also measured in Table 11. As can be seen from Table 11, the execution time of these methods can be divided into two categories, where Chen et al. [21] and Zhao et al. [38] are of the first category. Although the test times for [21] and [38] are less than 10 ms, their classification accuracies are poor. The remaining methods belong to the other category, that is, test times are measured from 28.1 ms to 38.2 ms. The proposed method in this study has the highest accuracy with the lowest execution time in this category. Due to the development of computer operational speed in recent years, such execution times are acceptable in practical applications.
In short, the proposed method uses I-V curves as the input data, and the model will mine the fault information from both global and detailed perspectives. Compared with the algorithm with the indicators as the input data, such as [15], [21], [28] and [38], its feature expression is good, and its diagnostic accuracy is significantly higher than the latter. In addition, compared with the method with a 34-layer structure in [25], the proposed method not only does not require the pre-processing of the input data, but also has fewer layers, i.e., a four-layer CNN and one ResGRU block, with far fewer super parameters to adjust than the requirement in [25].

C. COMPARISON OF ANTI-INTERFERENCE ABILITY
Due to the limitation of instrument precision and performance, the data obtained are often noisy. With the increase of the noise intensity, the adaptive feature extraction ability and the generalization ability of the proposed diagnostic model will be further examined. Noise exceeding a certain intensity will increase the probability of misjudgement on the network. By adding the Gaussian white noise with different signal-tonoise ratios (SNR) into the data, the anti-interference ability of the proposed algorithm will be analyzed. The intensity of added noise can be controlled by adjusting the SNR [39], the calculation formula can be expressed as SNR = 20 log 10 (1/ε) (11) where ε represents the percentage of noise. In order to examine the anti-interference ability with the addition of different degrees of white noise to the measured data set, the proposed method, [25], and [28] are implemented for comparisons, and the results are summarized in Table 12. The results in Table 12 show that as the SNR gradually decreases, the accuracy of the proposed method only shows a slight decrease. Until the SNR is 10dB, the accuracy only drops significantly. At this time, the noise percentage has exceeded 30%, and the actual noise interference usually does not reach this level. That is to say, even in the case of a low SNR, the fault classification performance of the proposed method is still at a good level, which reflects that the proposed method has a good anti-interference ability.
Chen et al. [25] also has a strong anti-interference ability, when the SNR exceeds 20 dB, the diagnostic accuracy is basically unchanged. Until the SNR is 10 dB, the accuracy rate drops by 4.51%, which is larger than the one by the proposed method, but the accuracy still can be maintained above 91%. Compared with the first two methods, Huang et al. [28] extracts the indicators from I-V curves as the input data, and the anti-interference ability is weak. It can be seen from the results that when the SNR is 20dB, the diagnostic accuracy drops by 3.4%, but when it reaches 10dB, the diagnostic accuracy drops by 14%. In short, the diagnostic algorithm with I-V curves as the input has a stronger anti-interference performance than the diagnostic algorithm with indicators as the input.

VII. CONCLUSION
In this study, a photovoltaic (PV) array fault diagnostic method via a fusion model with convolutional neural network (CNN) and residual-gated recurrent unit (ResGRU) is proposed by observing the differences of I-V curves under different fault conditions. First, the CNN is used to mine global and detailed features in the sequence, and then the Res-GRU is used to memorize mined time-series dynamic features to achieve the classification objective. Numerical simulations and experimental results verify that the proposed method has a good fault classification performance, and the identification accuracy reaches 100% and 98.61%, respectively. Through the performance analyses of the input data missing, it is found that the accuracy of the measured data still can reach 95.23%, even if meteorological data is not added. Moreover, when the signal-to-noise ratio is within 20dB, the corresponding accuracy remains unchanged basically, indicating that the proposed method has a strong anti-interference ability.
Although the proposed method has a high identification ability, the following three tasks are worthy to be continuously investigated in the future researches. (1) Application and implementation of the proposed method on the inverter: Fog computing is the current development direction of diagnostic technology. The diagnostic algorithm is directly performed at the bottom layer, and only diagnostic results are uploaded to the monitoring system. Thus, it can effectively solve the problem of massive waveform storage. The proposed method is a lightweight diagnostic model, and the corresponding execution time is fast. If it is combined with the I-V curve scanning function of an inverter, it will further expand its application prospects in PV array online fault diagnoses. (2) Identification of unknown types of faults: Most of diagnostic methods, including the one in this study, are based on the existing fault types in the learning library for fault identifications. With the aging of a PV system, the diagnostic model will encounter more unknown types of fault samples, and at this time, the model may fail. The fuzzy degree measurement and unsupervised clustering method mentioned in [40]- [42] can be considered to further screen out the samples that cannot be determined by fault type.
(3) Improvement of the generalization ability of diagnostic models: This study focuses on single-string PV arrays, which need to be re-trained when the model is extended to larger scale systems or multi-string systems. However, it can not ensure that sufficient fault samples can be collected in all popularized application systems. The transfer learning in [27] can help to realize the model extension in the new target domain.