Deep Learning-Based Fault Diagnosis of Photovoltaic Systems: A Comprehensive Review and Enhancement Prospects

Photovoltaic (PV) systems are subject to failures during their operation due to the aging effects and external/environmental conditions. These faults may affect the different system components such as PV modules, connection lines, converters/inverters, which can lead to a decrease in the efficiency, performance, and further system collapse. Thus, a key factor to be taken into consideration in high-efficiency grid-connected PV systems is the fault detection and diagnosis (FDD). The performance of the FDD method depends mainly on the quality of the extracted features including real-time changes, phase changes, trend changes, and faulty modes. Thus, the data representation learning is the core stage of intelligent FDD techniques. Recently, due to the enhancement of computing capabilities, the increase of the big data use, and the development of effective algorithms, the deep learning (DL) tool has witnessed a great success in data science. Therefore, this paper proposes an extensive review on deep learning based FDD methods for PV systems. After a brief description of the DL-based strategies, techniques for diagnosing PV systems proposed in recent literature are overviewed and analyzed to point out their differences, advantages and limits. Future research directions towards the improvement of the performance of the DL-based FDD techniques are also discussed. This review paper aims to systematically present the development of DL-based FDD for PV systems and provide guidelines for future research in the field.


I. INTRODUCTION
Photovoltaic (PV)-based electrical power generation has been a growing research area in the academia and industry fields [1], [2], where the grid-connected PV systems have witnessed the highest growth rate. Therefore, the highperformance/reliability operation of PV systems has become a top priority. PV systems' faults can be divided, according to their time characteristics, into three major categories: intermittent, abrupt, or incipient faults. Temporary or intermittent faults refer to faults that clear or change over time such as The associate editor coordinating the review of this manuscript and approving it for publication was Yu Wang . partial shading or environmental stress (e.g., dust or contamination). Permanent or abrupt faults refer to faults that occur instantaneously often as a result of a damage to the PV array such as line-to-line or line-to-ground short circuits, junction box faults, connector disconnection, open-circuit faults, and hot spots. Incipient faults are considered as the most challenging failures due to their small amplitudes and slow dynamics.
If not detected at an early stage, they can result in gradual damage to the PV cells leading to serious faults. Incipient faults can occur in both DC and AC sides. Examples of DC-side (PV modules and DC/DC converter) incipient faults are PV module defects such as yellowing and browning of the solar cells, delamination, bubbles, cracks, gaps, and defects in the anti-reflective coating. The AC-side (inverter and grid side) faults include Insulated Gate Bipolar Transistor (IGBT) faults, wiring degradation, aging, islanding, and overheating. Therefore, it is essential to develop enhanced FDD algorithms aiming at increasing the reliability and efficiency of PV systems [3]. With the rapid development of information and automation technologies, the demand and requirements for FDD algorithms are increasing, and data-driven process control methods are also being continuously developed and improved. Thanks to the powerful representation learning ability of DL algorithms, intelligent FDD become more automated and effective in the context of big data. However, PV systems are complex [4], generally including nonlinear [5], [6], uncertain [7], time-correlated [8], multimodal [9], multi-period [10], large-scale [11], or intermittent characteristics [12], resulting in the following problems with the collected data: (1) The characteristic dimension of samples under multi-sensor measurement is relatively high and has high relevance (the data relevance in both decision-making and decision-taking has exponentially increased); (2) Different sampling rates or random loss of data lead to missing observations of some sensors at a specific point in time; (3) Unbalanced data types, such as a limited number of faulty samples under extreme conditions; (4) The data distribution is not uniform: the information from different data sources may be inconsistent. In addition, the PV systems are uncertain with the influence of external disturbances, and the measurements collected are not represented with single values. These problems result in uncertain characteristics of the data and diagnosis spaces and severely limit the representation learning ability of DL algorithms. To improve the safe and stable operation of PV systems, the use of a DL framework for FDD still needs improvement. Generally, the main steps of DL-based FDD are the following: data preprocessing, deep network design, and decision-making. Data preprocessing problem includes (a) the small sample problem. In practical applications, due to problems such as difficulty in data collection and the high cost of sample labeling, the size of the training sample is not large or the amount of data is large but the effective information is insufficient resulting in the problem of small samples in the field of learning. Methods such as transfer learning and generative adversarial networks can be used to solve design problems of deep network-based diagnosis approaches in the case of small samples. (b) Big data storage and analysis: The basis of big data analysis is to extract useful values, suggest conclusions and/or support decision making, and focus on solving problems that cannot be handled within a limited time with existing methods. The priority can be established by several factors such as data preprocessing, fast response speeds, effective reduction of data size, data regularization, relative principal component analysis, etc. In addition, the performance of any diagnosis method depends on the quality of the available process data [13]. Practical measurements usually contain high levels of noise/auto-correlation and are infected with errors that mask the important features in the data and limit the effectiveness of any process monitoring techniques [14].
Regarding the deep network design, although DL tools have greatly promoted developments in the field of FDD, integrating professional knowledge will help the DL model. Thus, the representation learning of discriminative features can help in reducing the structure of the DL model, and the data regularization processing for specific tasks can help to improve the performance of the FDD. The application of reasonable professional knowledge and prior information helps to reduce the complexity of the monitoring model and improve the diagnosis performance. Besides, the performance of the DL-based FDD relies mainly on the historical data. The rapidity and accuracy of the analysis and effective simplification of the online data to achieve incremental learning of complex dynamic system models and parameter adaptation is a challenging and difficult point. In addition, several issues may have an impact on the diagnosis performance of the DL-based approaches. In general, they are built using default parameters and it is yet to investigate the way the parameter variations affect these approaches. Consequently, DL based on the selection of optimal parameters for FDD must enhance the diagnosis performance.
Faults can be divided into two types depending on their evolution: significant faults and minor faults. The design of a multi-level diagnosis framework in the deep network model will help achieving the real-time monitoring of significant faults and effectively improve the diagnosis of random faults. In addition, significant faults affect the system performance differently, and small faults are also very likely to cause considerable damage. Thus, it is important to develop DL-based algorithms that consider the fusion of faults from different manufacturing environments with different characteristics. Moreover, classical DL algorithms are generally utilized to model the dynamic nature of multivariable PV systems in both the offline training and online updating phases using the updated measurements. Instead, using online extensions of DL models for diagnosis in the first place may reduce the training and update time. In addition, complex PV systems often have problems such as uncertainty, multiple fault occurrence, and fault levels changing with time. If only a single FDD technology is used, the accuracy and generalization will be low. Thus, combining multivariate statistical analysis (such as: PCA [15]- [17], kernel PCA [18]- [20]), signal processing (such as: Fourier transform, multiscale representation [13], [21], interval-valued data representation [22], [23]), and other tools with DL models could improve the performance of the FDD and more specifically decision-making accuracy. It could also reduce the impact of noise, outliers [24], and uncertainties and estimate the severity of the fault location.
The DL-based FDD is mainly divided into three kinds of techniques [25]- [28]: (i) Data preprocessing (DP) →  methods for data preprocessing and FES, and then applies the DL tool for FC. This allows to reduce the model complexity and improve the diagnosis rate. (ii) DP → FES-DL → FC-DL: In this technique, the DL tool is used for FES, and the FDD is performed based on the extracted high-level features [29], [30], where multi-hidden-layer networks for unsupervised extraction of high-level abstract features are used. It does not require manual intervention or rely on prior knowledge. Combined with multivariate statistical analysis techniques, it is able to provide an efficient diagnosis performance [31]. (iii) DP → (FES and FC)-DL: In this method, the DL tool makes a direct use of the raw-data to perform the FDD. This method belongs to the ''End-to End'' family, which directly computes the output from the input [32]. The parameters for FES and FC in multi-hidden-layer networks can be optimized collaboratively, and the feature self-learning strategy is adopted to automatically extract the effective features from the large data set to perform the FDD. This paper mainly discusses the different DL-based FDD techniques for PV systems from the perspective of methodology and five basic architectures: stacked auto encoder network, deep belief network, convolutional neural network, recurrent neural network and deep transfer learning (see Figure 1). The network structure not only determines the effectiveness of feature extraction and selection, but also relates to the complexity of the solution. This paper will explore the research status of these five types of methods respectively, and study their development direction based on the existing problems in the DL-based FDD technology.
The following section presents the most occurring failures in PV systems.

II. COMMON FAILURES IN PV SYSTEMS
In PV systems, the produced PV power depends on various factors such as the nominal characteristics of the components, the power electronics interface, the weather conditions, and failures that may occur in the different stages during the operation (Figure 2).

A. PV MODULE FAILURES
The PV array is the main component of PV systems, where any deficiency associated to the module has a significant

2) BYPASS DIODE FAULTS
Usually, a diode connected in anti-parallel to a group of cells (bypass diode) is used in PV modules to prevent cells from shading. Generally, a bypass diode fault is represented by an impedance, short/open circuit, or inverted diode, which causes a mismatch in the I-V characteristics of the cell.

3) CONNECTIVITY FAULTS
Usually, the corrosion/decrease in contact adhesion between two modules lead to a lack of connectivity in PV strings.

4) GROUND FAULTS
A ground fault (GF) in PV modules can be considered as an accidental electrical short-circuit involving ground and one or more current-carrying conductors [40]. The GF may generate DC arcs (and even fire hazards) at the fault point, which raises serious safety concerns.

5) PARTIAL SHADING FAULTS
The operation of PV modules is highly susceptible to partial shading failures, where multiple peaks appear on the P-V characteristics due to the use of a bypass diode.

B. POWER ELECTRONICS INTERFACE FAILURES
Usually, the operation of PV systems is accompanied with failures at different stages. As one of the most important components in PV systems is the power electronics interface, it has been shown that most of the failures are due to the power semiconductor failures.
Many factors may lead to the fatigue of the power electronics components (transistors, diodes). The component fatigue affects mainly the time response and therefore may lead to additional switching losses. Besides, switching of power semiconductors might cause different types of faults.
The three most common power semiconductor failures are the wear-out, open-circuit, and short-circuit faults

C. GRID SIDE FAILURES
Islanding is one of the most important failures to address/ detect in grid-connected PV systems. Islanding occurs when a portion of the utility system remains energized while isolated from the grid. This phenomenon can cause safety problems to utility service personnel or related equipment [34].

III. RESEARCH STATUS OF FAULT DIAGNOSIS TECHNOLOGY BASED ON DEEP LEARNING
The DL-based FDD performance is based on the mathematical tool and process models of the plant [35]. The development of deep networks helps to extract high-level and abstract features from the data. When the effective feature representation in the data is relatively extracted, whether it is used for fault classification or regression, better results can be obtained.

A. CONVOLUTIONAL NEURAL NETWORK BASED FAULT DIAGNOSIS
Convolutional Neural Networks (CNN) are built using three types of layers: convolutional layer (CL), pooling layer (PL), and fully connected layer (FCL) [36], [37] (see Figure 3). The CL combines multiple convolution kernels to extract features from the input data or upper layer features, adds matrix element multiplication to the input features in the perceptual field, and adds the deviation [38]. The size of the convolution kernel in the CL controls the extraction of local spatial correlation features in the input information, which can enhance certain features of the original signal while reducing the impact of noise [36]. The PL is responsible for reducing the spatial size of the Convolved feature. It aims to decrease the computational power required to process the data using dimensionality reduction schemes [39]. Furthermore, it is useful for extracting relevant features that are rotationally and positionally invariant, thus maintaining the process of effectively training the model [39]. Adding an FCL is an effective way of learning nonlinear combinations of the high-level features, as represented by the output of the CL. The FCL is learning a possibly nonlinear function in that space [40]. CNN-based FDD has the following advantages: (i) Industrial system data has multi-source heterogeneity [41]- [44]. The input of CNN can be time series [45]- [47], spectrogram [48], [49], and images [50]- [52], which is suitable for multi-source information processing [41], [53]; (ii) Complex PV systems are often accompanied by random strong magnetic interference, high temperatures. The features extracted by CNN have translation invariance [54], [55], which increases the robustness of the diagnosis algorithm and improves the generalization ability of CNN; (iii) The data that can characterize the faults in PV systems is often submerged in massive real-time data. The generated countermeasure networks can generate samples based on the learning of the probability distribution of real data [56], which is suitable for small sample sizes. The authors in [57]- [60] presented a vector matrix containing statistical characteristics such as the root mean square of the frequency domain signal, the standard deviation, skewness, and kurtosis of the time domain signal as the input to CNN for classification purposes. In addition, in [61], [62], the authors used a Morlet wavelet decomposition tool to obtain the wavelet scale map of the signal, which was used as the input of the CNN for the classification phase. In the developed method, the Rectified Linear Unit (ReLU) was applied as an activation function and used to introduce non-linearity into the network. The work in [63] introduced an adaptive learning rate to construct a hierarchical framework consisting of two CNNs. Therefore, the size of the mode and the adjustment of the adaptive learning rate could promote the algorithm to accelerate the convergence in addition to the prevention of the gradient from disappearing. In addition, given that the traditional linear model cannot capture the complex relationship between sensor data and remaining effective life, the authors in [40], [64] used the time series of multi-channel sensor data. Evidently, one-dimensional (1D) time series can be also used directly as the input of the CNN. For this purpose, the authors in [65] developed a 1D kernel filter to convolve the signal. The proposed method aims to extract high-resolution features for fault detection. The work in [66] used 1D CNN to detect faults by merging feature extraction and post-processing of the raw signal. The authors in [67] presented a comparative study of CNN-based feature learning. The features include raw data, spectrum, and time-frequency data. In [68], a novel full closed-loop -based CNN method for power quality disturbances detection and classification was proposed. The developed approach was able to capture multiscale features and reduce overfitting. To address the problem of small samples, the authors in [69] used prior knowledge to convert normal data into coarse fault data combined with an improved generative adversarial networks (GAN). The proposed technique aims to refine the coarse fault data into data more similar to real faults. The authors in [70] applied a GAN to generate samples with similar distribution to the original signal and utilized a stacked denoising auto-encoder (SDAE) method to pre-train the network, extract fault features, and identify the authenticity of the samples. The developed approach was VOLUME 9, 2021 robust to noise and showed a good anti-noise capability in the case of small samples. The work in [71] used a GAN for oversampling the fault operation data to obtain missing fault data. The developed method was proposed to achieve high-precision classification of induction motor faults under different conditions. To repair the ''fuzzy'' range of the intermediate probability value and enhance the credibility of the reasoning for the fault overlapping area, the authors in [72] applied a GAN to the seismic image in which the feature extraction network was used to extract local and global features from the high-quality image. The reconstruction network then built a high-sensitivity image with a denser sampling rate while retaining the original data and frequency domain information.

B. RECURRENT NEURAL NETWORK BASED FAULT DIAGNOSIS
Recurrent Neural Networks (RNN) are network structures for which the inputs are time-series data and all the nodes are connected in a chain [73]. Unlike multi-layer perceptrons, the RNN have a sense of time and memory of earlier network states allowing them to learn sequences that vary over time [74] (see Figure 4). At present, the most commonly used RNNs are Long Short-Term Memory networks (LSTM) and Gated Recurrent Unit (GRU) networks. By introducing gates, each recurrent unit can adaptively capture the dependence of different time scales to avoid long-term dependence [75]. However, due to gradient exploding and vanishing, there is a length limitation when applying the RNN [76]. Subsequently, RNN variants such as LSTM [76] and GRU [76] neural networks have been developed to deal with long sequence prediction problems. The RNN-based FDD has the following advantages: (1) The inputs of the RNN are time-series data and the depth depends on the length of the input sequence, which is suitable for dynamic PV systems monitoring and prediction; (2) RNN are Turing complete, the chain connection mode is conducive to the extraction and representation of the dynamic nonlinear characteristics of PV systems; (3) The RNN is stable when the length of the learning and testing sequence are different (PV system control is often of variable length and the sampling is irregular). For the time-series signals of PV systems, the authors in [77]- [79] used monotonicity and correlation values to select features as the RNN network inputs, and verified experimentally the performance of the proposed RNN-based method. The work in [80] proposed an LSTM-based encoder-decoder architecture. The encoder structure converts the input sequence to a fixed-length vector and the decoder structure uses the vector to generate the target sequence and calculate the reconstruction error to use it later for decision making. In the case of multiple faults and large noise, the authors in [81] developed three RNN-based models (vanilla RNN, LSTM, and GRU) for FDD and showed that the LSTM and GRU models outperformed the vanilla RNN. The authors in [82] used GRU in the RNN model as it reduces the parameters by controlling the gate mechanism to alleviate the problem of gradient explosion or disappearance. The authors in [83] proposed sequential FDD based on an LSTM neural network. The developed method can directly classify the raw process data without specific feature extraction and classifier design. It can also adaptively learn the dynamic information in raw data. In [84], the proposed method applies LSTM networks for feature extraction and the selected features are fed into a softmax regression classifier for fault diagnosis. PV system data are characterized by the time relevance in addition to the spatial dependence in the measurement space. For this purpose, the authors in [85] developed a multiscale RNN model for learning both hierarchical and temporal representation. The authors in [86] combined the benefits of both CNN and LSTM to establish a convolutional bidirectional LSTM. In the developed approach, the CNN was used to extract the local features from the original data. Then, a bidirectional LSTM [87] was applied to extract temporal correlation and finally stack a fully connected layer and linear regression layer for predicting the remaining life.

C. STACKED AUTO ENCODER BASED FAULT DIAGNOSIS
Stacked Auto Encoder (SAE) networks are multi-hidden neural networks formed by stacking multiple autoencoder networks [88]. The output of one layer of the auto-encoder network is used as the input of the next layer [89]. Each auto-encoder network consists of two parts: an encoder and a decoder (refer to Figure 5). The encoder converts the network input into the hidden layer representation while the decoder returns the hidden layer representation to the original input [90].
In general, faults tend to occur in high-frequency information corresponding to the higher-order moments of random processes. From the perspective of Taylor expansion, the value of a function in the neighborhood of a point can be represented by infinite series composed of the value of the function at that point and the derivative values of each order. Although the coefficient of the higher-order term is small, it is different from the background details. In these cases, the features are difficult to characterize using traditional methods. The characterization of this information directly affects the performance of the FDD algorithms, especially for small faults that are difficult to detect. As a multi-level network structure model, an SAE network computes higher-order feature representations through multiple nonlinear mappings and expresses more effectively a larger set of functions than shallow networks [91]. FDD based on SAE networks has the following benefits: (i) Most of the data collected from PV systems are 1D signals, and the SAE network structure is simple and suitable for this kind of signal; (ii) In PV applications, data often has unlabeled characteristics and SAE networks are self-learning mechanisms suitable for unsupervised training; (iii) As PV system data contains complex information, the layer-by-layer training method of SAE networks helps to extract the high-order nonlinear features from the data samples and prevents dispersion of the deep network.
Traditional FDD methods are mainly based on the time-frequency analysis of the collected signals [92]. The work in [93] summarizes the traditional feature extraction methods based on frequency domain features. In the auto-encoder network, the depth features obtained using the frequency-domain features, such as low-level inputs, are more suitable for diagnosis systems using Support Vector Machines (SVM) as classifiers. The authors in [94] showed that the features extracted by the stacked denoising auto-encoder network are robust. The performance of the developed technique was assessed by evaluating the impact of the size of the input, the depth of the structure, and the constraint parameters such as sparsity and denoising. Considering that the frequency spectrum reflects the frequency distribution of the data, the authors in [95] introduced the time series frequency spectrum as the input to the SAE network. In [96], a novel SAE-based multiple FDD approach was developed. The proposed technique uses the signal analysis to construct hybrid features and obtain more distinguishing information to overcome the non-stationarity caused by multiple cracks. Finally, the final features are fed as inputs into the SAE network for multiple fault classification. Considering that fault information is mainly reflected in high-frequency mode, the parameters extracted from the previous four modes are used as the network input, which effectively improves the diagnosis performance while simplifying the computation [97]. Given the inadequacy of traditional auto-encoder networks for processing the original input signal, local features, and shifting features, the authors in [98] proposed a local connection network based on a regular sparse auto-encoder. The paper [99] proposed a weight regularization technique to learn weight-invariant facial representations using sparse-stacked denoising autoencoders and deep Boltzmann machines. Due to the limited training data and to prevent data from overfitting, the authors in [100] introduced the ''discard'' technique in the hidden layer of the auto-encoder network. The authors in [101] developed a classification tool using an auto-encoder network. The developed approach applies a prior distribution to the latent space and then uses the mutual information between the sample and predicted distributions for unsupervised clustering. The proposed solution is characterized by its robustness and cross-cutting of the extracted features in a noisy environment. For cross-machine FDD, the work in [102] proposed an approach that holds the potential to largely reduce the expensive labor in data collection for model establishment. Given the process faults under PV multi-modal operation, the authors in [103], [104] introduced the Maximum Square Difference (MSD) to estimate the non-parametric distance between two distributions and proposed a migration learning FDD framework based on sparse auto-encoder networks. The approach developed provided good results, more specifically when the distribution of the testing data was different from that of the training data. The authors in [105] proposed using a time series as the network input to improve the application of time-related information in dynamic PV systems. The work in [106] used a sparse SAE network to limit hidden layer information redundancy, which significantly improves the detection performance of minor faults. The authors in [106] conducted a statistical analysis on the hidden layer features extracted from the SAE network and achieved multi-level FDD based on high-order correlation. The direct use of normal data effectively avoids the imbalance between data categories. To address this issue, the authors in [107] considered the importance of online data diagnosis in the dynamic process and proposed a threshold based on an SAE network. Threshold-adaptive process monitoring technology performs well in diagnosis and reduces the cost and complexity of process modeling.

D. DEEP BELIEF NETWORK BASED FAULT DIAGNOSIS
A Restricted Boltzmann Machine (RBM) is a two-layer neural network composed of a visible layer and a hidden layer. It describes the high-order interaction between variables based on an energy function. The term 'Restricted' means that each edge in the bipartite graph must be connected to one visible unit and one hidden unit [108]. The RBM assumes that when the input data is given, the activation conditions of each hidden unit are independent. Conversely, when the hidden unit state is given, the activation conditions of the visible layer units are independent [109]. The RBM can be a sub-block of a Deep Belief Network (DBN) and a Deep Boltzmann Machine (DBM) (refer to Figure 6). DBN is a multi-hidden-layer probability generation model composed of multiple RBMs and an output layer (usually a classification layer). The joint distribution between observation data and labels is established through layer-by-layer training [110].  In contrary to the directed/unidirectional connection of the hidden layer in DBN, the DBM is a Boltzmann network with multiple hidden layers. The hidden layer transmits the information and conducts feedback adjustment from top to bottom [110]. Roux and Bengio proved theoretically that as long as the number of hidden units is large enough, the RBM can fit any discrete distribution [109]. FDD based on DBN/DBM has the following advantages: (i) The sample distribution does not necessarily obey restrictive assumptions. RBM uses generative learning to predict the probability distribution of input samples without restrictive assumptions (the PV system has random uncertainty). (ii) RBM expresses data as a probability model through unsupervised learning, which is suitable for sample generative expansion in the case of small samples in PV systems; (iii) The DBN creates activation value sets through feature grouping sequences, which is suitable for simulating and controlling multivariable nonlinear systems (as the PV system control is mostly an unstructured system). The authors in [111], [112] proposed a DBM-based approach considering different characteristics of multi-modal features. The developed approach aims to use the representation learning on time, frequency, and time-frequency domain features, and perform fusion diagnosis in the decision-making phase. In [113], the authors developed an improved convolutional DBN for FDD. First, they used an auto-encoder to compress data and reduce the dimension. Then, the deep model was built with Gaussian visible units to learn the representative features. The presented results showed that the developed strategy provided better accuracy compared to classical DL models. To improve the modeling, [114] added hidden units to the activation function of the sparse SAE network to build a DBN model and to extract the most representative features of the data. In addition, in [115], the authors presented a dual-tree complex wavelet packet method to design the original feature set and constructed an adaptive DBN to improve the network convergence speed and enhance the accuracy of the diagnosis. Paper [116] proposed an improved RBM with a new regularization term to automatically generate features that are suitable for predicting remaining useful life. In [117], the authors proposed a Teager-Kaiser energy operator to estimate the envelope of the instantaneous signal and extract the statistical feature of the data, and then propose Gaussian-Bernoulli RBM (GRBM) to construct a DBN for real-valued classification [118].
The work in [119] proposed an improved FDD method based on DBN. The so-called DBN developed with a global back-reconstruction (GBR) approach was applied for early crack diagnosis of turbine blades using three-dimensional blade-tip clearance. The authors in [120] proposed a real-time online FDD method that can improve the accuracy of detection, classification, and prediction, while being effective for incipient faults that cannot be detected using statistical tools. A stacked sparse auto-encoder was applied to learn the deep models of fault data and minimize the loss of information. Conversely, [121] merged the benefits of fuzzy Petri nets (FPN) and DBNs to present an adaptive arc generation scheme that presents the label-weight based on confidence-weight to mark the occurrence of a fault. In [122], the authors developed an effective DBN-based approach for detection and diagnosis. An effective DBN model was implemented with an effective distribution of features at each layer of the network to improve the accuracy of the diagnosis at each instant. The authors in [123] proposed an enhanced DBN-based FDD approach that combines the information from multiple sources and enhances the robustness of fault diagnosis.

E. DEEP TRANSFER LEARNING BASED FAULT DIAGNOSIS
The performance of DL-based FDD is closely related to the amount of collected data. To achieve high performance, it is required to generate a large number of samples from the same domain to train the models [124]- [127]. Thereby, when using a large amount of data, DL-based FDD models with complex structures outperform other diagnosis models. Conversely, when a small number of training datasets is generated, the accuracy and reliability of these approaches inevitably decrease [128]. In addition, deep models with a large number of hidden layers can affect the performance of the FDD [129]. Moreover, the training and testing datasets, applied for deep models, have the same feature space and the same distribution [105]. Most statistical models must be rebuilt from scratch using the newly generated training data as the distribution changes. In PV systems, the cost required to collect data again and rebuild the models is very high [130]. Deep transfer learning (DTL) is a promising technique to address these problems [131]. The DTL is a new DL technique that applies existing knowledge to tackle problems in different but related fields, which eases the requirements for data features [132]. The DTL tools can reduce the training time and enhance the classification accuracy by using data in different operating conditions where there is only a small amount of target data [133]- [135]. Recently, it has been widely used for FDD as it can provide accurate results in complex situations where the transfer strategy can help to design a universal diagnostic model [132]. Its main goal is to apply knowledge and skills learned from a data-rich source domain followed by the application to a related target domain with only a small amount of data [133], [136], [137]. The difference between DL and  Figure 7. DL-based FDD aims to split the normal and faulty data into training and testing datasets. Training datasets are applied to train the model for FDD purposes, and then testing datasets are used to measure the performance of the model. The faulty dataset is smaller than that of the normal dataset, which can lead to poor classification performance [128]. In DTL-based FDD, there are two groups of data from different domains: the source and target domains. The source domain is applied to extract knowledge and the target domain uses the extracted knowledge for FDD purposes. In this case, the faulty data in the source field is relatively larger than the one of the target field, which will be exploited for FDD in the target field. The DTL can extract relevant features and perform knowledge transfer between the target and source domains. According to the knowledge transfer, transfer learning techniques can be classified into four categories [133], [138]- [141]: Instances-based DTL [142]- [144], Feature-based DTL [46], [145], [146], Network-based DTL [147]- [149] and Adversarial-based DTL [150]- [153]. Instance-based DTL consists in re-weighting samples from the source field for target field tasks. The Feature-based DTL aims to find the common feature space between the target and source fields. Network-based DTL is based on the assumption that some model parameters can be shared by the source and target fields. Adversarial-based DTL consists in determining the relationship between the samples in the target domain and source domain. Adversarial-based DTL has been proven to provide good results in finding a common latent space between the target and source domains. Thus, it has attracted more attention in the field of transfer learning. DTL has attracted more and more attention in the recent years [154]- [156] and approaches have been applied to several applications such as image recognition [157], text recognition [158], and software defect-recognition [159], as well as FDD. For instance, the authors in [160] developed an adaptive FDD method under different operating conditions to improve the classification accuracy. In [161], the authors developed an automatic DTL based FDD technique for PV systems. The experiments confirmed the very good diagnostic accuracy of the proposed diagnostic method under different simulation conditions. A transfer component analysis approach in [162] was also proposed for FDD. Despite DTL having been successfully applied for FDD, it still suffers from some limitations. For example, the data used for DTL are all from the same source domain, the knowledge is transferred from just one operating condition to another. However, when multiple related source domains are available, it is difficult to effectively explore general knowledge from those fields and use the information learned in a new related field. Moreover, these methods are not verified with practical data, and the generalization abilities of these methods have not been confirmed. Some previous studies extracted features before applying DTL, which suggests greater requirements for scholars' engineering experience and professional knowledge.

DTL-based FDD is shown in
As presented in Table 1, different enhancements were developed in recent years. The table shows improved DL-based FDD techniques and discusses their performance. The proposed techniques focus on the enhancements of basic performance, including reducing the computational complexity, strengthening the robustness of uncertainty and increasing precision. However, in cases of multivariate, uncertain, large and noisy data, it is important to more enhance effectiveness, such as the diagnosis capability, the intelligence level of fault diagnosis, the speed and cost of large data processing, and the integration of statistical and multivariate fault features. Therefore, fault detection and diagnosis more adaptive to the characteristics of data has become a very important topic of research.

IV. EXISTING PROBLEMS AND FUTURE RESEARCH DIRECTIONS
After reviewing the recent literature related to the field of fault detection and diagnosis in PV systems, the following procedure will be adopted.

1) Enhance Data Preprocessing in Deep Learning
In PV systems, the data distribution across the different categories is extremely imbalanced. To address the problem of FDD in the case of imbalanced samples, different solutions can be developed, including focal loss function, under-sampling and over-sampling methods at data level, cost-sensitive learning, imbalanced learning, and other models for preprocessing at model level, and then applied as inputs into existing DL-based approaches. Moreover, the larger the size of the training data set is, the lower the effectiveness of the diagnosis is in terms of computation time. This issue limits the implementation of DL methods in practical applications with massive data. To overcome this limitation, improved techniques based on data size reduction frameworks (such as Kmeans metric, Hierarchical K-Means Clustering, Euclidean distance) will be proposed to select the more effective features that can be used as inputs for the DL models for faults classification. In addition, the performance of any diagnosis method depends on the quality of the available process data. The PV measurements usually contain high levels of noise and autocorrelation and are infected     with errors that mask the important features in the data and limit the effectiveness of any process monitoring techniques. Therefore, multiscale data representation is a forceful data analysis tool that decomposes the original process samples into multiscale components to provide an effective separation of the deterministic and stochastic features from the data. The data is decomposed at multiple scales using low-pass and high-pass filters and the noise is separated from the important characteristics. Thus, the implementation of diagnosis frameworks that unify DL methods and multiscale representation schemes may improve the performance of classical FDD using DL approaches. Therefore, multiscale DL methods combining the advantages of the former with those of multiscale representation should improve the diagnosis results. The developed techniques will provide the grid operators and power system designers with significant information to design an optimal solar PV plant, as well as to manage power supply and demand. This task aims to develop various data preprocessing methods in terms of characteristics and performance. The developed techniques will be used due to their ability to solve non-linear, dynamic, and multivariate data structures of PV systems.
• Enhance Data Preprocessing using Multiscale Representation Data preprocessing approaches are completely based on the process data. Therefore, the quality of the data plays an important role in the accuracy of the derived features. It is known that the measured process data are contaminated with noise that degrades their usefulness in the diagnosis. Therefore, the measurement noise needs to be filtered to enhance the quality of the extracted features. Multiscale data representation is a powerful data analysis tool that has been effectively used to enhance the quality of various process data preprocessing methods [14], [163], [164]. In addition, multiscale filtering can also be used to further enhance the data preprocessing accuracy by developing multiscale data preprocessing techniques.
The developed techniques will be utilized to enhance the monitoring and diagnosis taking into account the measurement noise and the dynamics of the PV systems. Multiscale filtering is an effective data filtering method. However, pre-filtering the data before constructing the models may not provide the desired advantages of multiscale filtering. This is due to the fact that the data pre-filtering may eliminate some features in the data that are important to the model [165]. Therefore, data preprocessing and multiscale filtering need to be integrated to achieve the desired model accuracy.
One way to do that is to filter the data using multiscale filtering at different decomposition depths, construct data preprocessing methods using the filtered data from each decomposition depth, and then select the model that provides the optimum prediction accuracy. The proposed multiscale data preprocessing technique will combine the advantages of both multiscale estimation and data preprocessing.
• Develop Data Preprocessing Methods for Uncertain PV Systems using Interval-Valued Data Representation and Dimensionality Reduction New interval data preprocessing approaches can be developed to deal with uncertainties in PV systems. In fact, real PV systems are often affected by different types of uncertainties, mainly due to the measurement errors and noise, as well as current and voltage variability. The uncertainty in the model may be addressed by considering the interval-valued data. The developed techniques will enhance the above-proposed data preprocessing approaches by taking into account the irradiance, current, voltage, and temperature uncertainties.
Neither measurements nor estimations are 100 % accurate, so in reality the actual value x * j (k) of a variable can deviate from the measured one x c j (k). The measurement errors are defined as δx j (k) = x c j (k) − x * j (k). Usually, the sensors manufacturers provide an upper bound δx j (k) on the measurement error. Hence, once a measurement x c j (k) is available, one should know that the actual (unknown) value x * j (k) of the measured variable belongs to the interval . An interval valued data [x(k)] refers to a set of numbers enclosed in an interval on the real line, usually expressed as We will start by describing the properties of the interval-valued variables [166]. An interval-valued variable [X j ] ⊂ R is represented by a series of sets of values delimited by ordered couples of bounds referred as minimum and maximum: . The generic interval [x j (k)] can be also expressed by a couple {x c j (k), x r j (k)} and that this is a bi-univocal relationship, where: and x r j (k) = (2) VOLUME 9, 2021 The interval-valued data matrix [X ] is an n×m data matrix, given by, where x − j (k) ≤ x + j (k) for all k = 1, 2, . . . , n and j = 1, 2, . . . , m. Two Euclidean Distance (ED)-based intervalvalued data can be developed to remove the irrelevant and redundant samples during the data preprocessing task. The use of interval-valued data is motivated by the need of size reduction of massive datasets in some applications. An interval-valued variable [x j,k ], can be determined using a lower and upper bound [167], such as [x j (k)] = [x j,k , x j,k ], where k ∈ {1, . . . , N }, x j,k ≤ x j,k , and N is the number of samples. Given an N × m classical training data matrix X , where m is the number of variables and N is the number of samples, the interval data matrix [X ] can be constructed as per: where, the lower X L and upper X U bound matrices are respectively defined by: . .
The interval-valued variable [x j,k ] can be also expressed by a couple {x c j,k , x r j,k )}. The center x c j,k of the interval is given by: and the range x r j (k) of the interval is defined by: In this case, the center and range matrices are respectively defined by: .
By the concatenation of the center and range matrices, the new data matrix X CR can expressed by:

2) Enhance Deep Network Design in Deep Learning
The representation learning of features can help in reducing the structure of the DL model, and the data regularization processing for specific tasks can improve the performance of FDD. The application of reasonable professional knowledge and prior information helps to reduce the complexity of the process monitoring model and improve the diagnosis performance. Moreover, the performance of DL-based FDD mainly relies on the historical data. Although historical data contains the operating mechanism of complex systems, PV systems are dynamic production processes and the latest changes in the current operating state also include the cumulative relevance of the production process. The rapid/accurate analysis and effective simplification of the online data to achieve the incremental learning of complex dynamic system models and parameter adaptation is in fact a challenging and difficult task. In addition, several challenges may have an impact on the FDD results using DL-based approaches. In general, the DL approaches based FDD are built using default parameters and it is yet to investigate how the parameter variations affect these approaches. Consequently, the DL based on optimal parameter selection for diagnosing faults can be developed. The parameters to be optimized include the number of hidden layer nodes and the activation function for extracting features and reconstructing inputs. This task reduces the requirements for research experience during parameter tuning and avoids the need for tedious manual tuning. Moreover, the optimized DL model can achieve improved diagnosis performance. The optimization tools including Orca Optimization Algorithm (OOA), Particle Swarm Optimization (PSO), Genetic Algorithms (GA), and Multi-Objective Optimization (MOO) will be employed to optimize the DL parameters.

3) Enhance Decision Making in Deep Learning
In PV systems, the significant faults affect considerably the system performance. However, small faults are also very likely to cause a considerable damage. Thus, DL-based algorithms, merging faults with different characteristics, can be developed. Moreover, classical DL algorithms are generally utilized to model the dynamic nature of multivariable PV systems in both the offline training and online updating phases using the newly arrived measurements. Instead, using the online extensions of DL models for diagnosis may reduce the training and update time. In addition, complex PV systems often have problems such as uncertainty and multiple fault concurrency. If only a single FDD technique is used, the accuracy and generalization will be low. Thus, combining multivariate statistical analyses (such as PCA and kernel PCA), signal processing (such as Fourier transform, multiscale representation, interval-valued data representation), and other tools with DL models could improve the performance of FDD and more specifically the decision-making accuracy. It could also reduce the impact of noise, outliers, and uncertainties, and estimate the severity of the fault location. It has been shown in [168] that the introduction of selected features using PCA in the DL classifiers (i.e. NN and RNN) enhances the classification accuracy compared to conventional raw data-based classifiers. However, in the analysis, we assumed that the features were extracted and selected using a linear PCA, and the PV faults were classified using classical DL classifiers.
There are different ways to enhance the performance of the techniques developed in [27]. To solve the main problem of the linear characteristics of the PCA in high-dimensional spaces, the kernel PCA (KPCA) will be applied to extract high-order statistical information in the DL parameter space. Although KPCA can extract nonlinear features in high-dimensional spaces, it increases the space and time complexity compared to PCA. To improve the use of KPCA, reduced extensions will be proposed. The reduced KPCA (RKPCA) uses different dimension reduction metrics (such as Kmeans metric, Hierarchical K-Means Clustering, Euclidean distance) so that only the effective samples are selected and applied to build the KPCA model. To address the problem of uncertainties in the PV systems, interval RKPCA models will be also proposed.

4) Develop Enhanced Multiple Deep Learners
The last task in FDD is to build a learner model. For this purpose, an enhanced DL technique can be developed. The developed technique that merges different learners should improve the diagnosis performance.
In this task, a novel design optimization technique based on multiple deep learners, which includes the above improvements (Tasks 1-3), can be developed. In this technique, a hybrid method incorporating multiple deep learners that fits high-level information will be deployed. However, the diagnosis accuracy is based on the weighting factors linking the deep learners. Thus, an optimal selection procedure of the weighting factors is required to further improve the FDD performance of the deep learner. The optimization problem is addressed so that the miss-classification and execution time are jointly minimized. Therefore, an enhanced multiple deep learner method will be proposed to obtain better diagnosis ability compared to classical standalone deep learners. The developed technique will contribute to the reduction of the overall diagnosis error and will have the ability to combine various models. To do that, multivariate and dynamic features will be considered in designing multiple learning models. Classical multiple models ignore the time-dependence of PV measurements. However, the PV system data are frequently time-correlated. Accordingly, the dynamic and multivariate nature of the measurements will be considered when designing the prediction models by using multivariate and dynamic techniques (such as PCA, kernel PCA, and Dynamic kernel PCA).

V. CONCLUSION
Data-based fault detection and diagnosis (FDD) is an effective solution towards high performance and reliability PV systems. The most well-known data-driven methods are deep learning (DL) approaches. Therefore, this paper discussed the DL-based FDD in PV systems. In the present review paper, the DL-based FDD have been classified into five categories: FDD based on convolutional neural network (CNN), FDD based on recurrent neural network (RNN), FDD based on stacked auto encoder network (SAEN), FDD based on deep belief network (DBN) and FDD based on deep transfer learning (DTL), where their main advantages and drawbacks were indicated. Finally, the topic has been studied at three levels including data preprocessing, deep network design, and decision-making module. Furthermore, other FDD solutions have been proposed by considering uncertainties, complexity, multivariate and dynamic natures of industrial systems. The biggest advantage of the DL-based FDD algorithms is their capability to learn high-level features from data in a highorder, non-linear, and adaptive manners. Because of this powerful feature representation learning effectiveness, intelligent FDD techniques become more effective. Although DL-based FDD has greatly promoted the development of the diagnosis research field, it is relatively considered as a new concept and further in-depth investigations are required. He has published more than 100 journals and conference papers. He is the author of two books and two book chapters. His research interests include systems control with applications arising in the contexts of power electronics, energy conversion, renewable energies integration, and smart grids.
HAZEM NOUNOU (Senior Member, IEEE) is currently a Professor of electrical and computer engineering, Texas A&M University at Qatar. He has more than 19 years of academic and industrial experience. He has significant experience in research on control systems, data-based control, system identification and estimation, fault detection, and system biology. He has been awarded several NPRP research projects in these areas. He has successfully served as the lead PI and a PI on five QNRF projects, some of which were in collaboration with other PIs in this proposal. He has published more than 200 refereed journals and conference papers and book chapters. He has served as an Associate Editor and on the technical committees for several international journals and conferences.

MOHAMED NOUNOU (Senior Member, IEEE)
is currently a Professor of chemical engineering at TAMU-Texas A&M University at Qatar. He has more than 19 years of combined academic and industrial experience. He has successfully served as the lead PI and a PI on several QNRF projects (six NPRP projects and three UREP projects). He has published more than 200 refereed journals and conference publications and book chapters. His research interests include systems engineering and control, with emphasis on process modeling, monitoring, and estimation. He is a Senior Member of the American Institute of Chemical Engineers (AIChE).