Data-Driven Predictive Maintenance of Wind Turbine Based on SCADA Data

,


I. INTRODUCTION
The government is committed to transit away from fossil fuels and decarbonising the power sector to eliminate contributions to climate change by 2050.In 2020, the UK generated 43.1% of its electricity from renewable sources, with the wind making up 24.2% [1].Rystad Energy's analysis shows that installed offshore wind capacity is set to rise to 27.5 GW in 2026 from 10.5 GW in 2020 [2].As shown by Rystad Energy analysis in Fig. 1, the future trend is more of these wind turbines being installed in the offshore environments and less onshore [3].This indicates that increased complex maintenance needs must be met for such equipment.IRENA report shows that offshore wind operation and maintenance (O&M) costs typically constitute 16-25% of the cost of electricity for offshore wind farms deployed in the G20 countries.To drive these costs, optimising O&M practices to reduce unscheduled maintenance needs to be unlocked by improvements in data collection and analytics, The associate editor coordinating the review of this manuscript and approving it for publication was Kostas Kolomvatsos .
allowing for predictive maintenance and production output optimisation [4].
A Wind turbine is a device that captures wind energy through its rotating blades and converts the wind energy into electrical energy using its drivetrains.Wind turbine drivetrains are classed into direct drive(DD) and gear type, which has a gearbox; both classes have a hub as the input, the main shaft as the transfer and the generator as the output [5].Other wind turbine components include main shaft bearings, mechanical brake, shaft bearing, yaw systems, power electronic systems, hydraulic and cooling systems.The gearbox and generator play critical roles in the energy conversion process from the WT components mentioned above.Since WT gearboxes operate in a high-altitude nacelle, to reduce the weight and enhance the transmission ratio, planetary transmission is widely adopted in WT gearboxes [6].Therefore, WT gearboxes has been designed as a planetary/spur gearbox system where the spur gearbox is the fixed gearbox stage.The fixed gearbox stage increases the rotational speed of the planetary gear consequently leading to induced vibration manifesting as strong noise in the WT gearbox.Due to the stochastic nature of wind, the rotational speed is time varying making it difficult to diagnose fault in the WT gearbox system [7].The kinetic energy of the wind is transformed into rotational energy by the shaft connected to the turbine's blade when the wind hits the blade.The moving shaft is connected to a generator, which produces electrical power through electromagnetism [3].The double fed induction generator (DFIG) is extensively employed in gearbox driven WT, whose operation mode is based on the rotational speed of the rotor windings and stator windings connected to the transformer.The rotor windings are connected to the power grid through an inverter that regulates slip power based on the rotational speed of the rotor.The rotor sends power to the grid at ultra-synchronous speed, whereas the stator transfers all active power to the grid at the synchronous speed of the generator [8].The rotating shaft of the generators when the rotational speed is lower than the synchronous speed of the generator, the rotor absorbs energy from the grid.It is mainly supported by bearings, which qualifies it as one of the most critical components in a WT.As the generator shaft continuously rotates, bearing damage may emerge, so effective fault detection is necessary.This raises concerns since it can be costly and dangerous to perform maintenance.WTs are often deployed in harsh environments and remote locations such as offshore environments to maximise wind motion.WTs can be hundreds of feet above the ground, requiring lifting maintenance crew with a crane or dropping them from a helicopter.Hence, the need to monitor what is going on with our equipment is necessary to avoid such dangerous and costly activities and perform maintenance when needed.Typically, organisations adopt various maintenance programs to increase operational reliability and decrease costs, and these programs can be reactive, preventive, or predictive.In reactive maintenance, the equipment is used to its limits, and repairs are performed when components have become defective.Preventive maintenance is also known as scheduled maintenance, and here maintenance is carried out at a regular rate to avoid failures.The challenge here is determining when to do maintenance since we do not know when failure will occur; hence organisations use a conservative approach in planning maintenance for safety-critical equipment.The problem here is that if maintenance is scheduled very early, this will waste machine life that is still useful, which adds to costs.If we can predict when failure will occur, we schedule maintenance right before it [9].Predictive maintenance is performed based on condition monitoring (CM), a technique that informs maintenance of equipment and components that are likely to fail and have them replaced at the right time [10].So predictive maintenance helps asset managers to bridge the gap between reactive maintenance and scheduled maintenance by carrying maintenance not too late or too early but just-in-time.Predictive maintenance can help us: estimate time to failure (remaining useful life), detect problems in our equipment (anomaly detection) and help us identify what parts need to be fixed (diagnosis of fault types).The challenge of predictive maintenance can be solved by first-principles modelling, that is, using a physics-based approach.This does not require any data coming off from the wind turbines but does require a large amount of domain expert knowledge.It involves deriving equations that tell us how the system behaves, and from that, we can use those equations to determine how the equipment will degrade and eventually fail over time.On the other hand, data-driven modelling does not require expert knowledge of the system evaluation but instead requires a good amount of data taken off the real-world system.We then use several statistical and machine learning techniques to develop models based on the data to help us understand the system behaviour and how it fails [11].There are also several hybrid approaches, where data-driven strategies are used fill to the knowledge gap about the first principles of the system.
Over the past decade, there has been a rapid increase in autonomous condition monitoring systems to monitor equipment performance, including wind turbines.Condition monitoring strategy can be applied based on the vibration-sensor system, which has vibration sensors, strain gauges, or oil particle counters retrofitted to turbine sub-components for localised monitoring [12].The problem with this condition monitoring strategy is the cost involved in retrofitting the sensors and the data collection and analysis required to provide insight into system performance [13].Wind turbines are equipped with sensors that records data of the equipment state, this network of sensors form part of a Supervisory Control and Data Acquisition (SCADA) system.The SCADA system obtained by the network of sensors was initially installed to monitor and operate the system.Still, recently this engineering data have been harnessed to identify anomalies and access the health status of the wind turbine, paving the way for data-driven predictive maintenance [14].The sensors forming the SCADA system are in the main components of the wind turbine; the data is usually sampled at a frequency of 10-min.This sampling interval makes it easy for data transfer and storage in a database for ultimate retrieval [15].SCADA Systems on a WT typically record wind parameters like wind speed and wind deviations; performance parameters like power output, rotor speed, blade pitch angle; vibration parameters like tower acceleration and drive train acceleration; temperature parameters like bearing temperature and gearbox temperature.All these recorded parameters could be used to perform fault detection and prognosis activities [16].Capturing all this data will help us develop a robust algorithm that can better detect faults.This has created a SCADA system-based condition monitoring system when the captured data can be evaluated at different levels of granularity.At the most fine-grained level, we can monitor the condition of wind turbine sub-components such as drivetrain.Also, at the most coarse-grained level, we can monitor the whole wind turbine by combining signals of different components to provide a high-level warning [10].When considering sub-components to monitor, decisions should be based on failure rates and downtime per failure.Priority is given to components that are more prone to failure and have extended lead times for replacement [17].Data based on a survey of failure of wind turbine subsystems from two wind farms in China showed that 68% total downtime was caused by generator, converter, and pitch systems [18].
Usually, SCADA systems provide data representing normal operation and faulty conditions.In some cases, we may not have enough data representing a healthy and faulty operation, perhaps due to broken sensors.In such a case, we can build a mathematical model of the equipment and estimate its parameters from sensor data.We can then simulate this model with different fault states under different operating conditions to generate failure data.We can then use the generated data to supplement our sensor data and use both to develop our algorithm.After completing the data acquisition, the next step is to remove the outliers and clean them up by filtering out the noise [9].In this research, only sensor data representing the normal operating condition is available; we do not have data of faulty operation.We can build a predictive maintenance algorithm, but we would have to build a mathematical model of the wind turbine to generate failure data.This would require extensive domain knowledge of the system performance of the wind turbine.The following section will present a review of the different approaches that have used WT SCADA data for WT fault detection and prediction.In this paper, our contribution to knowledge will involve: 1) applying a purely data-driven approach to predictive maintenance using SCADA data without failure data 2) validate the process with a data from a different wind farm having failure data This study will examine data on a wind farm (La Haute Borne) in France operated by ENGIE, where four 2MW wind turbine has been installed.Section II will describe related works on the ENGIE dataset and research that propose similar solutions in this paper.Section III developed the methodology to identify failures through data preprocessing, model development, and data post-processing.Section IV tests the developed method on a real case study of a wind farm currently operating in Meuse, France.Also, we evaluate our proposed solution against data from a wind farm with failure data.Section V discusses the effectiveness and applicability of the fault detection algorithm.Finally, section VI will be considering future steps.

II. LITERATURE REVIEW
In the last decade, predictive maintenance has been achieved by machine learning techniques used to build inductive models that learn the underlying set of structures in SCADA data of wind turbines to predict incipient faults and anomalies [10].For the most part, many existing works utilise supervised methods, which can either be regression or classification; these methods have the advantage of providing a clear relationship between inputs and outputs [19].This section will examine existing works based on regression-based anomaly detection and research on the ENGIE dataset.

A. REGRESSION-BASED ANOMALY DETECTION
This approach is used for condition monitoring in wind farms by building a model of the normal behaviour of the wind turbine and its components.A set of independent input(s) variables, such as wind speed, is used to build a regression model to predict a numeric dependent output variable such as power, assuming that the component is ideal.For example, power curve modelling of a wind turbine is a critical task since the power curves of WTs made available from manufacturers were explicitly tested to the location where turbines are located.This implies that the turbines were subject to a particular weather condition which is most likely different from that of the installation site [20].To solve this challenge, study [21] compared four data-mining approaches: cluster centre fuzzy logic, neural network, K-Nearest Neighbour and Adaptive Neuro-Fuzzy Inference System (ANFIS) to monitor wind turbine power output and detect deviations.Initially only one input variable wind speed and output variable power was used, but by adding wind direction and ambient temperature as inputs variables, the models had a better fit with the data.In this research, ANFIS -a machine learning algorithm which combines neural network with fuzzy theory -achieved the best performance.Modelling turbine components such a generator using machine learning was investigated by the study [22], here extreme gradient boosting (XGBoost) and long-short term memory (LSTM) were compared based on their mean absolute error (MAE).In this study, XGBoost outperformed LSTM in terms of MAE, and it was more computationally efficient, executing at 150 times faster than LSTM.The predicted results were then compared with field measurements to detect if an anomaly was present.The study [23] developed a framework for anomaly detection and parameter identifications; the LSTM network was incorporated into the neuronal structure of the auto-encoder neural network.Adaptive threshold based on support vector regression(SVR) was used to reduce false alarm rate for anomaly detection.The effectiveness of the proposed method was verified by a case study using SCADA data from a wind farm near the coast of the south of Ireland.The study [13] utilized the generator temperature and gearbox oil temperature in SCADA data to establish a normal temperature model of the wind turbine components.The residual between the predicted and actual value was calculated, and the trend was monitored using an exponentially weighted moving average (EWMA) control chart.The study also proposed a fixed threshold and dynamic threshold based on adaptive algorithm compared -their fault detection efficiency.The study [24] performed feature selection using an adaptive elastic network, and convolutional neural network (CNN) and LSTM were combined to establish a logical relationship between observed variables.The method was efficient to detect over-temperature in the high-speed side of the gearbox bearing.The research [25] proposed a model that detects abnormal spikes in wind turbine components by adjusting temperature data for effects caused by ambient temperature and when the turbine is outputting power.Regression models with inputs variables (power output and ambient temperature) and output variable (component temperature) were built.The best model, which in this case was linear regression, was selected.The residual between the model's output temperature and raw temperature data was used to detect abnormal behavior of the component.The study [26] carried out predictive analytics of wind turbine gearbox based on SVR models for accurate prediction of gearbox oil and bearing temperature.Diebold-Mariano and Durbin-Watson statistical tests were used to analyse the residuals to establish the robustness of the tested SVR model.The study [27] applied the Mahalanobis distance method for feature selection, which helped to reduce the input variables fed into the LSTM prediction model.The fault detection was carried out using the error between predicted component temperature and actual measurement.This method yielded more efficient and accurate results lowering root mean square error by 4% compared to the traditional backpropagation neural networks.
The study [28] investigated the use of electrical parameters of SCADA measurements to build data-driven normal behaviour models constructed through SVR with Gaussian kernel to capture the non-linear relationship between the electrical parameters and operational variables.Principal components analysis (PCA) was used to orthogonalize and reduce features dimensions.The normal behaviour model of the healthy wind turbine and the target faulty wind turbine were analyzed in parallel; it was shown that the fault could be detected two weeks before it occurred.The study [29] proposed a comprehensive methodology for designing and applying artificial neural networks and statistical process control for effective fault detection of wind turbines.The proposed method was tested on an actual wind turbine in Italy to verify its effectiveness and applicability.

B. RELATED WORKS ON ENGIE WIND FARM DATASET
The study [19] proposed a novel idea of bringing together LSTM and XGBoost to predict an anomaly in wind turbines.The model was used on a source domain for learning on a labelled dataset (LDT dataset).The learning was transferred to the unlabelled dataset as the target domain (Engie dataset).The objective of the transfer learning was to enable wind farm operators with no access to historical data of failures to detect anomalies.The study [30] has developed a system for reconstructing the lost signal from low correlated parameters when one of the SCADA sensors fails to send data.The objective of the signal reconstruction model was for wind power prediction from other SCADA parameters.Linear and non-linear algorithms were analysed to find a generalised model, multiple linear regression random forest and, Cartesian genetic programming evolved Artificial Neural Network (CGPANN) was used to inform the generalised model.The study [23] proposed solution to high-dimensionality problems of condition monitoring (CM) data coming off mechanical equipment.Since this equipment presents multiple operating conditions, it is difficult to isolate the anomalies without mixing them up with the normal operating conditions of the equipment.Therefore, the Gaussian mixed model was employed to cluster the operating conditions.The isolation forest method was used to detect anomaly instances and identify the critical attributes responsible for the equipment degradation.This model was demonstrated on the ENGIE dataset to evaluate its effectiveness.The study [31] applied the novel improved dragonfly algorithm (IDA) to choose optimal parameters of support vector machine (SVM) for the forecast of short-term wind power.This hybrid model (IDA-SVM) outperformed the traditional grid search algorithm (Grid-SVM), which only compares different parameter combinations to select the best performance.In IDA-SVM, adaptive learning factors and differential evolution strategies were taken to boost the optimisation ability of the dragon algorithm (DA), which was applied to the ENGIE dataset at different seasons.The study [32] used the ENGIE dataset as a validation set to show that the novel k-means-based Smoothing Spline hybrid model achieves the most accurate power curve in terms of better goodness of fit statistics.This is in comparison to other k-medoids++ -based Gaussian hybrid models.

III. METHODOLOGY
This study aims to investigate a robust and precise workflow for fault detection in wind turbines based on xgboost, LSTM, and Statistical Process Control (SPC).The methodology will outline steps to build a predictive maintenance system based on fault detection when we do not know what failure looks like, that is, the absence of failure data.However, there has been much study about predictive maintenance based on SCADA data using machine learning and SPC, as elaborated in section II.One common thing about the works is that they validated their solutions using the available failure data, maintenance logs, alarm logs, or status logs recorded in the wind farm.This study will validate our model's predictive ability based on data from a different wind farm having failure data by way of transfer learning.We will examine the effectiveness of our method to predict failure when there is no historical data on the maintenance of the wind farm.
The critical steps of our method are highlighted below: 1) Data Acquisition and data preprocessing: data is collected from open-source platforms, data cleaning, outlier removal, and filtering normal operational data points for subsequent model processing.2) Model processing: the building of models for the turbines in the wind farm to represent normal behaviour.3) Post-processing: the deviations of model predictions against actual measured data is evaluated using the SPC control chart.We will build a model representative of the normal behaviour of the wind turbines with the assumption that our model will always provide information about the healthy state of the turbine.Next, we will predict the wind turbine's health status in the testing phase, this healthy representative state of the wind turbine will serve as a reference for asset managers.Therefore, when new SCADA data has been acquired, the deviations between the healthy wind turbine model are compared with the latest data.These deviations will be monitored through the SPC control chart; data points outside the allowable fault threshold are considered an anomaly.To validate this method, we will train our model on new data from a different wind turbine having failure data; only after this, the model is deemed ready for real-time monitoring.Fig. 2 represents the fault detection algorithm based on temperature prediction of wind turbine components.

A. DATA ACQUISITION AND DATA PREPROCESSING
To build a healthy representative model, historical monitoring data of wind turbines spanning over a considerable period was obtained from a wind farm.Since SCADA data provide helpful monitoring and control information in real-time, the data used in the study is SCADA data acquired from the La Haute Borne wind farm located in Meuse, France [33].This wind farm is operated by ENGIE Green, having four wind turbines manufactured by Senvion MM82 technology.The SCADA system in this wind farm acquired data of 34 measured parameters as well as their statistics such as average, maximum, minimum, and standard deviation of each parameter.We only retain the average values of each parameter since it captures most of the information.The frequency of captured data points is sampled at the 10-min interval.The rated power for each turbine at the La Haute Borne wind farm is 2050kW, having a rotor diameter of 82m and a hub height of 80m.The cut-in wind speed is 3.5m/s, rated wind speed of 14.5m/s, and cut-out wind speed of 25m/s.The key parameters that we will consider in this study include active power, wind speed, outdoor temperature(ambient), generator bearing temperature, gearbox bearing temperature, Generator speed, Gearbox oil sump temperature, Rotor speed and Nacelle temperature.

1) DATA CLEANING
The algorithms used to train our models will build a relationship between the inputs and output variables.Therefore the data quality must be examined to ensure the model represents the system condition with the feed data.Any anomalous data points must be removed to avoid giving the model a wrong impression of system performance.To build a model representing the healthy state of the WT, data cleaning operations must be carried out.After identifying the variables needed for model processing, an understanding of system performance and the variables describing them in the data must be carried out.This enables us to identify anomalies in the data and remove them since they have a significant impact on our model accuracy.From the WT system performance, sensors are used to gather data from the SCADA system; therefore, there could be data spikes or no data due to sensor errors.This sensors errors can arise due to non-calibration of sensors or sensor degradation over time, creating outliers in SCADA data [34].In addition to sensor malfunction, wind farms are subject to power reductions imposed artificially either due to maintenance or by the national grid to combat dispatching issues [29].Therefore, the following elimination criteria were used for preliminary data cleaning: • Instances where turbine power is zero or less, but wind speed is above cut-in speed • Samples where at least one input or output is missing • Samples with one or more values that are outside the normal range • Samples where wind turbine was on halt or data loss because of sensor transmission errors The summary of the data cleaning and resampling operation is shown in (1) [39].
The SCADA data records power limit values, and this operation does not represent the turbine's ideal behaviour.The data points collected during such power restrictions must be removed from the dataset.After getting rid of abnormal data points, the second pass of cleaning must be done on the data to catch outliers due to unknown reasons.

2) POWER CURVE FILTERING
The power curve is used as a reference for the expected behaviour of the wind turbine, as seen in Fig. 3. Hence data representing healthy is required to follow the power curve signature.The wind turbine power curve shows the relationship between wind turbine power and wind speed.It essentially captures the wind turbine performance.Hence it plays a vital role in condition monitoring and control of wind turbines.Power curves are made available by the manufacturers to help estimate the wind energy potential in a candidate site.The characteristics curve of a wind turbine behaves differently in different regions due to wind speed's intermittent and stochastic nature.Therefore, applying the traditional outlier detection methods usually fail to catch them or catches along with healthy data points.We are interested in fitting a power curve to data representing 'normal' turbine operation.In other words, we want to flag all anomalous data or data representative of underperformance.The study [29] recommends dividing(binning) the data into intervals where the turbine changes behaviour.After binning the samples, to detect outliers, we calculate the quantiles of the data within each bin and eliminate the outliers of the corresponding boxplot.The criterion for flagging is based on some measure (scalar or standard deviation) from the mean of the bin centre.A scalar measure was applied to determine the outliers consisting of the threshold value of 25% from the mean of the bin centre of the whisker length.

B. MODEL PROCESSING 1) FEATURE SELECTION
To describe the healthy behaviour of the wind turbine, the variables that will form the input and output must be known.
But it is difficult to know beforehand these variables since there are many parameters measured by the sensors that make up the SCADA system.In this study phase, we relied on the literature review to understand the best variable combinations needed to monitor the system behaviour of critical components like the gearbox and generator.The bibliographic search of the component variables covered various methods researchers have used to arrive at a list of the most influential variables.In Table .1,the input and output variables that define the behaviour of the components of interest, based on the scientific literature review, are displayed.

2) REGRESSION-BASED MODELS
Due to the stochastic nature of wind, the algorithm required to model wind turbines should adequately and accurately capture the complex relationship between variables defining the system performance.We will examine the effectiveness of regression models to build characteristic healthy behaviour of WT components using the input variables and output variable.The dataset instances will be divided into training and testing with percentages of 70:30 for each component model.The model accuracy on the training set was compared with that of the test set to check for model overfitting.We also employed K-fold cross-validation five times to ensure the model is robust and accurate, preventing data leakage, overfitting, or underfitting.Because values of the input variables are in different dimensions and ranges, it is necessary to force their values within a given defined range.In this study, the input variables were standardised using the sklearn standard scalar function.The function essentially computes the z-score with mean and standard deviations of the variables and scales them to the interval [0,1].The computation for transforming the selected input data to z-score is shown in (2).
where x i is the set of input variables, µ is the mean, and σ is the standard deviation.The model will use the input variables of the training set to predict the output variables also belonging to this set, studying their underlying relationships.How well the model predicts the output variable is used to define the training accuracy.After that, the input variables belonging to the test set unseen by the model are used to predict the output variable.The accuracy of the model is then determined by how well the model can predict the output variable.We then will compare the predicted output variables representing the healthy condition of the WT to the measured values.We will start with a naïve model using multiple linear regression (MLR) as a baseline model, then compare it with two non-linear algorithms such as extreme gradient boosting (XGBoost) and long short-term memory (LSTM).Our algorithm choice is determined through the study of technical and scientific literature [22], [23].

a: MULTIPLE LINEAR REGRESSION (MLR) MODEL
Multiple linear regression models the relationship between two or more input variables and an output variable by fitting a linear equation to observed data.Every value of the input variable x is associated with a value of the output variable Y .
It is a statistical technique used to predict the output variable Y from a set of input variables x i where i is the index of the predictor variables as shown in (3).
The model parameter β o which is the intercept of the fitted regression line, the regression coefficients (β 1 , β i ) are learned during model training of the data, and is the model's deviation in Y .The transformed dataset is fed into Python's Scikit-learn linear regression algorithm.

b: XGBoost REGRESSION MODEL AND HYPERPARAMETER OPTIMIZATION
XGBoost is a machine learning algorithm that Dr Chen proposed in 2016 [35].It is an ensemble model based on decision trees that combine multiple weak learners into strong learners through multiple iterative learning processes.It works by boosting numerous weak learners such as regression trees by assembling them to create a single but stronger learner [36].The basic principle behind the process is to learn at each iteration sequentially, and the present regression tree is fitted with the residual from the previous three.In other words, the base learners' (weaker regression trees) mistakes or errors are learned and are used to correct the new regression tree.The new regression tree is added to the fitted model to update the residuals while an objective function tracks the models' performance changes.The objective function has a regularisation term that penalises the model complexity to prevent overfitting of the model output and helps better generalise the model's ability.XGBoost uses the loss function of the base models to minimise the residual of the overall model.To do this efficiently, XGBoost uses first and second-order partial derivative estimations to gain information about the direction of gradients [22].The XGBoost exhibits faster model exploration by using all the CPU cores in a parallel and distributed manner during the training process, which helps it to reduce the training computation time and complexity and ensures faster learning [37].
XGBoost Regression Model: Since the entire process is an ensemble model of CART (classification and Regression Tree) having decision tree as the based model, the output of model ŷXi is voted or averaged by a collection F of m trees shown in (4).
where ŷXi denotes the predicted value of the i-th sample, M denotes the number of CART in the model, f m (x i ) represents the predicted value of the i-th sample in the m-th tree, F is the function space of CART.The objective function of the XGBoost includes the MSE loss function and the regularisation term represented by ( 5) [36].
where η denotes the number of samples, l denotes a second-order derivable loss function, which measures the difference between the actual value y Xi and the predicted value ŷXi .(f m ) represents the regularization term.T is the number of leaf nodes in the tree, w j is the score of the leaf nodes, γ and β are the parameters to control the complexity of the tree.The purpose of optimising the objective function is to determine the structure of CART, that is, to get the best-split feature and the best split point and the leaf node score w i .The objective function can be simplified to a unitary quadratic equation as a function of w j by the second-order Taylor expansion represented in (6).More details on the expansion simplification can be found in [14].
where I j represents all the data samples in the leaf node j, g i and h i denote the first and second derivatives of the MSE loss function.From ( 6), G j and H j can be defined as in ( 7) [35]: The optimal score of the leaf node w * , represented by (8).And the corresponding optimal value of the objective function Obj represented by ( 9) is obtained by solving the unitary quadratic (6) with the assumption that the structure of the CART is known.
A smaller value of the objective function provides a better structure of the CART.XGBoost applies a greedy algorithm to navigate all the split points and finally selects the split point with the smallest value of the objective function after splitting.This means that the optimal split point is chosen at the maximum gain as represented in (10).
where I L and I R are the data sample sets of left and right nodes after splitting, I denotes the union sets of I L and I R .
XGBoost Hyper-Parameters Optimization: Typically, machine learning models' performance gets better on tuning their hyper-parameters.For XGBoost, there are more than ten hyper-parameters that require manual setting of their values to build a regression model.The hyper-parameters have three categories: general parameters, task parameters, and booster parameters.By design and through experimental results, the boosting parameters possess the most significant impacts on the model's performance.To buttress this point, a critical look at one of the boosting parameters, eta, is used to update the weight of the leaf nodes.To keep the gradient in check as well as prevent it from being too big, the score of the leaf node is multiplied by the eta in each iteration.If the model has a small value of eta, then it is more likely to overfit, but if the eta value is too large, the model is expected to underfit.It is now clear how significantly the choice XGBoost hyper-parameters improves its performance [36].Determining the best hyper-parameters can be a painful task if one is required to perform them manually.Hence three effective techniques are used to select the best combinations of hyper-parameters algorithmically.These are random search optimisation [39],grid search optimization and Bayesian optimization algorithm [40].This study employs the random search optimisation technique to iterate over the dynamic model to obtain the hyper-parameter best combination that optimises the model.Because this technique scales faster for large datasets than grid search, this method fit our study.The random search optimisation algorithm works by setting up a grid of hyper-parameter values and selecting the combinations that train and evaluate the model [21].The tuning space of the hyper-parameters is shown in Table .2And the optimal parameters are shown in Table .3

c: LONG SHORT-TERM MEMORY (LSTM)
The long short-term memory (LSTM) mimics the human's ability to interpret the meaning of a word from the context of the entire sentence.Similarly, LSTMs produce predictions   from an ordered sequence of temporal data they receive as inputs.A typical example of such data is SCADA logs which have successive time intervals.Hochreiter and Schmidhuber first proposed the LSTM in 1997 [41] as special type of recurrent neural network (RNN) to overcome the incipient vanishing and exploding gradients problems in RNN.The LSTM's ability to learn the long-term and short-term dependencies inherent in a sequential data has made it more successful in predicting long input sequences such as that found in SCADA data [18].The architecture of the LSTM algorithm is shown in Fig. 4 and Fig. 5.This algorithm possesses feedback connections and can define non-linear dynamic systems by mapping input sequences to output sequences.The basic structure of this algorithm possesses a cell and three gates (input gate, output gate, and forget gate); the cell acts as the memory of each LSTM unit; the gates control information flow with each LSTM unit [26].
The solution to RNN's long-term dependencies problem lies in the cell state of the LSTM structure.This cell state's main purpose is to store long-term information in the LSTM's hidden layer.From Fig. 5, X t is the present-time input vector, which is the input data to the LSTM model at time t; h t−1 is the past time output vector; and c t−1 represents the past time cell state.In Fig. 4 f t and i t is the forget gate and the input gate respectively, which are used to control the cell state of the model.In other words, forget gates and input gates are fashioned to restrict the information flow.σ is a sigmoid function deciding which values to be updated in the cell state and outputs a number between 0 and 1 for each number in the cell state c t−1 .Where 1 represents ''completely keep this'' while a 0 represents ''completely get rid of this'' [42].The forget gate controls the past cell state information c t−1 transmitted to the present cell state.The process can be explained with (11), where g(•) is the activation function that executes the sigmoid nonlinear function, W f represent the forget gate weight matrix, b f is the bias vector of the forget gate, and [h t−1 , X t ] is the combination vector of the past time output vector h t−1 and the present time input vector X t .The input gate i t controls the present input X t information transmitted to the current cell state c t , shown by (12).
where W i is the weight matrix of the input gate, b i is the bias vector of the input gate.To capture the state of the current input, c t , can be calculated as seen in ( 13).
where W c and b c are the weight matrix and the bias vector, respectively, tanh is the hyperbolic tangent function (which distributes values of the cell state between −1 and 1).The present cell state c t can then be obtained by combining both the forget gate and the input gate, described by (14).
where * defines element-wise multiplication between vectors, the information flow from the present cell state c t is controlled by the output gate O t to the current output, described by (15).
where W o is the weight matrix, b o is the bias vector.Lastly, the output gate O t and the current cell state c t determine the output of LSTM model represented in (16).
The complex nature of SCADA data with its non-linear multivariate time series makes LSTM a perfect candidate to capture the long-term dependencies inherent in them.Also, LSTM can eliminate the need for manual feature engineering by identifying optimal features automatically [19].The model hyperparameters and configuration is shown in Table .4And Table .5.

3) EVALUATION METRICS OF MODELS PREDICTIVE ABILITY
In this study, four metrics was used to effectively evaluate the temperature predictive regression models discussed in section II above.These metrics are viz: coefficient of determination(R-Squared), root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).The corresponding (17) to (20) shows the calculation formulas for the relevant metrics: where y i is the measured value, ŷi is the predicted value, and m number of instances of data in the test set and ȳi is the mean of the measured value.R 2 is also known as the goodness of fit.This defines the degree to which the regression model fits the observed values.The closer the value is to 1, the better the degree of the fitting and vice versa.Whereas for RMSE, MAE and MAPE, the smaller their values, the higher the accuracy of the prediction model.RMSE is very sensitive to errors, MAE is more robust to outliers; whereas MAPE cannot handle extremely small observed values close to zero or zero, but RMSE and MAE can handle such.Therefore, these strengths and weaknesses of the different metrics inform our choice of using the RMSE, MAE and, MAPE, in addition, R 2 to complement each other.

C. POST-PROCESSING
After training, testing, and evaluating the model's accuracy, we proceed to assess and compare the deviations calculated as in (21).
= measured values − predicted values (21) To evaluate the deviations between each instance of sensor reading (measured sensor value and predicted values by the model), we used statistical process control to identify anomalies in the WT.For this study, the Shewhart control chart is used to evaluate the deviations as it evolves.The fault threshold is defined by two control limits used to evaluate abnormal behaviours: Upper Control Limit (UCL) Lower Control Limit (LCL) The two control limits describe the sensitivity of the control chart, which is expressed as multiples of σ of the deviation's distribution.Where σ is standard deviation calculated from the moving range MR, which is the difference between the i-th deviation and the last one, ( 22) to (25) shows the process of computing the control chart sensitivity: The predicted values depict the healthy state of the WT over time.The WT whose measured value is compliant with the healthy state will have deviations on the chart with the normal distribution.With a mean of zero and standard deviation of 1, whereas the presence of non-conformity is exhibited by randomness.Non-conformity is defined by data points beyond the fault threshold/control limits, shifts of the average.Hence, such signals are considered abnormal behaviour.The model is used to reveal incidents faults in a WT without failure data and validated using a different WT with maintenance logs to reveal real faults, and then the model is fit for purpose.

IV. RESULTS
In this section, we will present the application of the proposed methodology on two wind farms: La Haute Borne wind farm operated by ENGIE in Meuse, France, and a wind farm operated by EDP (Energias de Portugal) in the West African Gulf of Guinea [43], [44].La Haute Borne dataset does not have any failure data or maintenance logs, whereas the EDP dataset has failure data in addition to operational data.In this study, we illustrate the usefulness of our prediction model if we do not have failure data in a wind farm.Especially when the WT is newly installed and running for a short period, we can harness that short period of operational data to predict when the WT will fail using the algorithm in this paper.First, we demonstrate this on the ENGIE dataset.After that, we validate our proposed methodology on EDP data which has maintenance data to show the algorithm's effectiveness to detect the fault.

A. ENGIE DATASET 1) DATA PREPROCESSING
The data available is SCADA data recorded every 10 min from January 1, 2013, to December 31, 2016, for a total of 136 sample variables for four turbines.Thirty-four unique parameters were recorded, and their basic statistics such as minimum, maximum, mean, and standard deviation.
We deleted variables with min, max, and std since the average values captured the most information.Not a number(Nan) values handling were based on a set threshold.Variables with more than 10k Nan values were removed.This is because filling these values with any strategy can give a false impression of the WT conditions.Then the stepwise method in the data cleaning subsection of section III was followed to exclude data of the following categories: • Instances where active power is zero • Instances With at least one variable (input or output variables) of interest is missing • Instances where WT is operating under the restricted power regime Power restrictions are typically constraints put on wind farm operators by the national grid to prevent dispatch problems.These behaviours do not characterise normal WT operation; this influences our choice to remove instances affected by such limitations.Fig. 6(a) shows one of the WT; many outliers are present.About 23% of instances were eliminated in the primary cleaning phase, affected mainly by the power restriction regime.After applying the primary cleaning procedure heightened in the data cleaning subsection of section III, we obtain a cleaner version of the power curve in Fig. 6(b).In Fig. 6(c), the clean data is obtained by applying the process laid out in the power curve subsection of section III.After getting the clean data ready for model training, the models' input parameters are extracted from the cleaned SCADA data to construct the input data set.The input data set variables making up the model for each WT component were selected based on the Table1.In Fig. 7 and Fig. 8 we have the graphical display of the chosen output variables needed to define WT models of its components (i.e., gearbox and generators) across the four turbines in the wind farm in the training phase.
Since we do not have maintenance records in this case study, we assume the turbine is in normal condition throughout its operation; therefore, we select all data for each of the eight models to analyse its behaviour in the training phase.A set of variables required to build each WT component was chosen to form an input dataset and the corresponding output variable.The input data set was standardised as discussed in Section III; after that, we split the entire dataset of the input dataset and the output variable.The split was done by selecting the first 70% of the data for training and the last 30% for testing.This was done to prevent data shuffling since the dataset is composed of time series so that any random data selection could result in data leakage.

2) MODEL PROCESSING
To build a robust fault detection model, all the algorithms discussed in Section III were used.MLR, XGBoost, and LSTM algorithms were used for each of the eight models, and the best model was selected based on combination of the performance metrics discussed in Section III.As mentioned in section III, the hyperparameters for XGBoost and LSTM were used for the eight models.This resulted in the varied performance of the algorithms where LSTM outperformed XGBoost in some models and vice versa.Multiple linear (MLR) performance was the poorest in all eight models; this further confirms that the relationship between the variables is non-linear.All the models were developed in Python 3.8.

a: GENERATOR MODEL
Wind Turbine R80736: After trying out the three models on WT R80736, LSTM model with RMSE = 1.44 • C, M AE = 1.03 • C, M APE = 2.65% was chosen because it had the highest R− squared value and the lowest RMSE depicted having a lower MAE and MAPE than XGBoost as seen in Table .6.This is because its R-Squared value shows better goodness of fit than other models; this means that the data fits the LSTM model better.
Additionally, because RMSE metrics penalise errors with higher values by assigning higher weights, the LSTM model is more accurate because significant errors are particularly   undesirable in our application.The LSTM model used the historical SCADA data from sensors recording the generator bearing temperature in WT R80736 during normal operation (healthy state) to predict the temperature, as shown in Fig. 9(a).The control chart for this application is represented in Fig. 9(b) with a fault threshold of ±0.93 • C. The first point that wandered out of control was noted on November 10, 2013, but this was not significant as there was a slight shift in the average.However, on May 18, 2014, there was a significant point out of control, and on November 30, 2014, another point was out of control, showing a substantial shift in average.These events culminated in the spike of deviations from May 10, 2015, on the same side of the control chart, corresponding to the period where we have a massive spike in the actual generator bearing temperature in Fig. 9(a).Therefore, this model predicted about four months about the imminent fault in the WT, which we assumed occurred around May 10, 2015, as shown by the evidence presented in Fig. 9.
Wind Turbine R80721: For WT R80721, XGBoost model with RMSE = 1.06 • C, M AE = 0.8 • C, MAPE = 2% was chosen because it had the highest R-Squared value; lowest RMSE, MAE and MAPE compared to LSTM and MLR as seen in Table .6.The XGBoost model used the historical SCADA data obtained by sensors recording the generator bearing temperature in WT R80721 during normal operation (healthy state) to predict the temperature as shown in Fig. 10    RMSE, MAE and MAPE compared to XGBoost and MLR as seen in Table .6.The LSTM model used the historical SCADA data obtained by sensors recording the generator bearing temperature in WT R80711 during normal operation (healthy state) to predict the temperature as shown in Fig. 11(a).The control chart for this application is represented in Fig. 11(b) with a fault threshold of ±0.53 • C.Although there are numerous points out of control, there are still not enough elements to identify possible faults in the system, as seen in Fig. 11(b).However, the general trend observed in Fig. 11(a) is an upward trend of the actual and predicted temperature of the generator bearing.
Wind Turbine R80790: For WT R80790, LSTM model with RMSE = 1.7 • C, M AE = 1.05 • C, M APE = 2.6% was chosen because it had the highest R-Squared value; lowest RMSE, MAE and MAPE compared to XGBoost and MLR as seen in Table .6.The LSTM model used the historical SCADA data from sensors recording the generator bearing temperature in WT R80790 during normal operation (healthy state) to predict the temperature, as shown in Fig. 12(a).The control chart for this application is represented in Fig. 12   The XGBoost model used the historical SCADA data obtained by sensors recording the gearbox bearing temperature in WT R80736 during normal operation (healthy state) to predict the temperature, as shown in Fig. 13  was fault-driven.But looking at the event towards the end of the control chart, we see numerous points outside the fault threshold starting from April 3, 2016, through to December 25, 2016.Before this extended dramatic event, we observed an out-of-control point on May 10, 2015.We can infer that this point warned of the possible fault events towards the end of the control chart.
Wind Turbine R80721: For WT R80721, XGBoost model with RMSE = 1.12 • C, M AE = 0.77 • C, M APE = 1.3% was chosen because it had the highest R-Squared value, lowest RMSE, MAE, and MAPE compared to LSTM and MLR, as seen in Table .7.The XGBoost model used the historical SCADA data obtained by sensors recording the gearbox bearing temperature in WT R80721 during normal operation (healthy state) to predict the temperature, as shown in Fig. 14(a).The control chart for this application is represented in Fig. 14     November 8, 2015, to November 20, 2016, we do not have significant deviations that predicted such massive disruptions.

B. EDP DATASET 1) DATA PREPROCESSING
The data available: • Historical SCADA data of operation recorded every 10 min from January 1, 2017, to December 31, 2017, for a total of 83 sample variables for four turbines • Historical Failure Logbook for the year 2017 Some parameters in the SCADA were recorded along with their basic statistics such as minimum, maximum, mean, and standard deviation.Since we have maintenance records in this case study, we must analyse the failure data before selecting an appropriate data set for the training phase.A part of the dataset free from fault was manually selected for the two models to avoid impacting the monitored variables.Although there are no general rules on the ideal size of data to be selected, the chosen dataset must have all the variables (input and output) required to define the normal operation of the WT.Therefore, we decided on a monthly interval of wind turbine(T06) operation and a quarterly for wind turbine T07.Then the stepwise method in the data cleaning subsection of Section III was followed to clean the data.After obtaining the clean data ready to be used for model training, the models' input parameters are extracted from the cleaned SCADA data to construct the input data set.The input data set variables making up the model for each WT component were selected based on the Table .1.In Fig. 17 we have the graphical display of the chosen output variables needed to define WT models of its components (i.e., gearbox and generators) across the two turbines in the wind farm in the training phase.
A set of variables required to build each WT component was selected to form an input dataset and the corresponding output variable.The input data set was standardised as discussed in Section III; only after then, we split the entire dataset of the input dataset and the output variable.The data split was done by selecting the first 70% of the data for training and the last 30% for testing.This method of data splitting ensured no data shuffling since the dataset is composed of time series such that any random data selection could result in data leakage.

2) MODEL PROCESSING
The exact process was followed to build the two models, as discussed in Section III.

a: GEARBOX MODEL FOR T06
For WT T06, XGBoost model with RMSE = 0.7 • C, M AE = 0.5   predict the temperature, as shown in Fig. 18(a).From the failure logs, the Gearbox bearings were damaged at timestamp 2017-10-17 08:38.The control chart for this application is represented in Fig. 18  The LSTM model used the historical SCADA data from sensors recording the generator bearing temperature in WT T07 during normal operation (healthy state) to predict the temperature, as shown in Fig. 19(a).The control chart for this application is represented in Fig. 19(b) with a fault threshold of ±4.07 • C. From the failure logs, Generator bearings were damaged at timestamp 2017-08-20 06:08, and the generator was damaged at timestamp 2017-08-21 14:47.We can see from Fig. 19(a) that the WT was out of service from August 20, 2017, to August 29, 2017.The catastrophic damage was first predicted by our algorithm at timestamp 2017-08-06 16:00:00, and there were multiple points beyond the fault threshold up to three-time its value at timestamp 2017-08-18 01:00:00 as seen in Fig. 19(b).It can be said that the fault detection algorithm was able to predict the generator damage two weeks ahead and gave multiple alarms up to three days before it occurred.

V. DISCUSSION
This study followed three main steps to develop the fault detection algorithm: data acquisition and preprocessing, model processing, and post-processing.The data preprocessing action required a more rigorous process due to the complex working conditions of the WT presented by power restrictions and the presence of outliers in the historical SCADA data.A stepwise approach was followed to eliminate outliers and data points affected by power restrictions to prevent the elimination of valuable data points.The cleaned data was fed into three machine learning algorithms: MLR, XGBoost, and LSTM.The best model was selected based on strict performance metrics using a combination of R-Squared, RMSE, MAE, and MAPE.The selected model was used to predict the output variable required to define WT component normal behaviour (healthy state).Post-processing of the predicted output variable was carried out to determine its deviation from the actual historical record.The sensitivity of these deviations was evaluated using the Shewhart control chart; a fault threshold was established for each model evaluated.Data points outside the fault threshold coupled with a shift in the averages were indicators of a fault in the WT.We presented two case studies using SCADA data from operational wind farms.We gained valuable insights into when the wind turbine will fail even without knowing what failure looks like in the first case study.We validated our approach with the second case study, and our algorithm was able to predict the fault in the WT before the time it occurred, as recorded in the failure logs of the wind farm.

VI. CONCLUSION
In this paper, a system for monitoring and detecting anomalies in the wind turbine gearbox and generator is developed using SCADA data, extreme gradient boosting (XGBoost), and Long Short-Term Memory (LSTM).Statistical Process Control (SPC) is used to evaluate the deviations of predicted signals representing the healthy state of the system and the recorded signals, resulting in fault detection.The proposed method was tested on two real case studies regarding six different WT to determine its effectiveness and applicability.It was observed that the LSTM algorithm outperformed XGBoost in building the generator model for five out of the six WTs, whereas XGBoost better modelled the gearbox.We demonstrated the usefulness of our detection algorithm to detect faults on WT having no failure logs.The fault detection algorithm can assist asset managers of the newly installed wind farms in predicting when the fault will occur and plan for early intervention to prevent catastrophic damage.This system has proven valuable to WT maintenance crew and wind farm asset managers to give a more dynamic data-driven maintenance strategy, which can save the considerable cost of catastrophe failure associated with the current static time-based maintenance strategy.The next step of this paper will be to explore the use of other SPC techniques to explore the sensitivity level of deviations.Also, the use of streaming data to detect the fault and using the deviation signatures from the control chart to carry out fault diagnosis by inferring which specific parts(subcomponents) of the main components are about to fail.This would require working with domain experts to establish data requirements and define the normal behaviour of these subcomponents since we aim to build a robust system that helps the growing wind energy sector optimise and operate cost-effectively.

FIGURE 1 .
FIGURE 1. Rystad Energy analysis showing the future trend of more WT installed in the offshore environment.

FIGURE 2 .
FIGURE 2. Block diagram of WT predictive maintenance fault detection algorithm.

FIGURE 3 .
FIGURE 3. Typical Power curve for a wind turbine.

FIGURE 4 .
FIGURE 4. Single layer of an LSTM cell.

FIGURE 5 .
FIGURE 5. Temporal-logic framework of a single LSTM layer.

FIGURE 6 .
FIGURE 6. Wind turbine power curve: (a) Before data cleaning (b) after cleaning process (c) after the power curve filtering process.

FIGURE 7 .
FIGURE 7. Graphical display of selected output variables during training phase WT generator for the four wind turbines in the wind farm.(a) Generator bearing temperature for Turbine R80736 (b) Generator bearing temperature for Turbine R80721 (c) Generator bearing temperature for Turbine R80711 (d) Generator bearing temperature for Turbine R80790.

FIGURE 8 .
FIGURE 8. Graphical display of selected output variables during training phase WT gearbox for the four wind turbines in the wind farm.(a) Gearbox bearing temperature for Turbine R80736 (b) Gearbox bearing temperature for Turbine R80721 (c) Gearbox bearing temperature for Turbine R80711 (d) Gearbox bearing temperature for Turbine R80790.

FIGURE 9 .
FIGURE 9. WT R80736 Generator: (a) Temperature prediction result for generator bearing (b) Control chart for generator bearing temperature deviations.
(a).The control chart for this application is represented in Fig.10(b) with a fault threshold of ±0.49 • C. The first set of

FIGURE
FIGURE WT R80721 Generator:(a) Temperature prediction result for generator bearing (b) Control chart for generator bearing temperature deviations.

FIGURE 11 .
FIGURE 11.WT R80711 Generator: (a) Temperature prediction result for generator bearing (b) Control chart for generator bearing temperature deviations.
(b) with a fault threshold of ±1.46 • C. It is shown that the WT was out of service for a considerable amount of time from March 12, 2013, to December 28, 2013.The erratic temperature deviation was seen on December 12, 2016, almost two times more than the control limit and had an apparent outof-point variation on May 1, 2016.
(a).The control chart for this application is represented n Fig.13(b) with a fault threshold of ±0.25 • C.Although it is observed that some points are out of control at the beginning of the control chart from March 10, 2013, through to January 12, 2014, we do not have enough elements to validate if this event

FIGURE 13 .
FIGURE 13.WT R80736 Gearbox: (a) Temperature prediction result for generator bearing (b) Control chart for generator bearing temperature deviations.
(b) with a fault threshold of ±0.39 • C. The first set of out-of-control points that started from May 4, 2014, to June 8, 2014, culminated in a massive spike of about six times the fault threshold as seen in Fig.14(b) from November 30 2014 to December 28, 2014.This corresponds with the spike we see in Fig.14(a) of the actual temperature.We can say that our algorithm predicted the anomaly that occurred from November 30, 2014, to December 28, 2014, about five months ahead.Wind Turbine R80711: For WT R80711, XGBoost model with RMSE = 1.09 • C, M AE = 0.81 • C, M APE = 1.3% was chosen because it had the highest R-Squared value;

FIGURE 14 .
FIGURE 14. WT R80721 Gearbox: (a) Temperature prediction result for generator bearing (b) Control chart for generator bearing temperature deviations.
7. The XGBoost model used the historical SCADA data obtained by sensors recording the gearbox bearing temperature in WT R80711 during normal operation (healthy state) to predict the temperature as shown in Fig.15(a).The control chart for this application is represented in Fig.15(b) with a fault threshold of ±0.34 • C. From Fig.15(b), we can see that from March 10, 2013, there has been a shift in the average of the fault threshold with an upward trend in the deviations, which led to the massive spike of the variation seen on November 24, 2013.Assuming there is planned maintenance annually in the wind farm that is carried out between January and December of the year, our algorithm was able to predict the fault before the annual intervention.Wind Turbine R80790: For WT R80790, XGBoost model with RMSE = 1.09 • C, MAE = 0.85 • C, MAPE = 1.4% was chosen because it had the highest R-Squared value; lowest RMSE, MAE, and MAPE compared to LSTM and MLR as seen in Table.

7 .
The XGBoost model used the historical SCADA data obtained by sensors recording the gearbox bearing temperature in WT R80790 during normal operation (healthy state) to predict the temperature as shown in Fig.16(a).The control chart for this application is represented in Fig.16(b) with a fault threshold of ±0.33 • C. It is shown that the WT was out of service for a considerable amount of time from March 12, 2013, to December 28, 2013.Although there has been a massive spike in the deviations from

FIGURE 15 .
FIGURE 15.WT R80711 Gearbox: (a) Temperature prediction result for generator bearing (b) Control chart for generator bearing temperature deviations.

FIGURE 16 .
FIGURE 16.WT R80790 Gearbox: (a) Temperature prediction result for generator bearing (b) Control chart for generator bearing temperature deviations.

FIGURE 17 .
FIGURE 17. Graphical display of selected output variables during training phase (a) Gearbox bearing temperature for WT T06 (b) Generator bearing temperature for WT T07.
(b) with a fault threshold of ±0.74 • C. We predicted this fault at timestamp 2017-10-11 23:30:00 and a second alarm at timestamp 2017-10-12 04:30:00.The fault detection algorithm showed good predictive ability by alerting of failure six days ahead.b: GENERATOR MODEL FOR T07 For WT T07, LSTM model with RMSE = 4.8 • C, M AE = 3.71 • C, M APE = 6.2% was chosen because it had the highest R-Squared value, lowest RMSE, MAE, and MAPE compared to XGBoost and MLR as seen in Table9.

FIGURE 19 .
FIGURE 19.WT T07 Generator:(a) Temperature prediction result for bearing (b) Control chart for bearing temperature deviations.

TABLE 1 .
Input and output variables used for modelling different components.

TABLE 6 .
Model accuracy for generator bearing temperature prediction.
• C, M APE = 0.8% was chosen because it had the highest R-Squared value; lowest RMSE, MAE and MAPE compared to LSTM and MLR as seen in Table.8.The XGBoost model used the historical SCADA data obtained by sensors recording the gearbox bearing temperature in WT T06 during normal operation (healthy state) to

TABLE 8 .
Model accuracy for gearbox bearing temperature prediction.

TABLE 9 .
Model accuracy for generator bearing temperature prediction.