Long-Term Wind Power Forecasting Using Tree-Based Learning Algorithms

The intermittent and uncertain nature of wind places a premium on accurate wind power forecasting for the reliable and efficient operation of power grids with large-scale wind power penetration. Herein, six-month-ahead wind power forecasting models were developed using tree-based learning algorithms. Three models were developed to investigate the impact of input data on forecasting accuracy. The first model was trained with the average and standard deviation of wind speed values measured at a height of 40 m with a 10-min sampling time. To evaluate the impact of sampling time on model performance, a second model was trained with wind speed values measured at a height of 40 m with 1-h, 12-h, and 24-h sampling times. To assess the effect of measuring height on model accuracy, the third model was trained with wind speed values measured at 40 m extrapolated from values measured at heights of 30 m and 10 m. Experiments revealed that using longer time intervals and height extrapolation leads to considerable accuracy degradation in forecasted models. Finally, to study the generalization ability of the forecasted models, they were tested against wind data measured at heights and locations different from what the models had been trained with. Simulation results substantiated that tree-based learning algorithms can be successfully adopted not only for long-term wind power forecasting, but for potential wind power forecasting at different heights and geographical locations.


I. INTRODUCTION
The dramatic growth of the human population accompanied by the advent and advance of modern harvesting technologies has led to more intense exploitation of fossil fuel resources. This has resulted in gradual fossil fuel depletion and increased pollution density. These issues, besides the poor energy efficiency of conventional power systems, have motivated a new trend of generating power using renewable energies [1]. Wind energy is one of the most promising renewable energies experiencing an unprecedented proliferation The associate editor coordinating the review of this manuscript and approving it for publication was B. Chitti Babu .
in modern power grids worldwide [2]. For all advantages, wind energy is contingent upon the highly variable, both geographically and temporally, nature of wind [3]. Hence, it carries a great deal of uncertainty, which makes wind power integration into the power grid a complicated task [4], [5]. Employing reserve power plants is a practical but expensive solution to compensate for the fluctuations of wind power [6]. Wind power forecasting (WPF) is a viable alternative solution for more reliability and, therefore, penetration of wind power into the power grid [7]- [11].
The WPF methods can be classified into either deterministic (also called point forecasting), which provide one output for a specific time horizon, or probabilistic (also called VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ interval forecasting), which provide a range of possible values at a given time [12]. Comprehensive reviews of state-of-the-art probabilistic WPF methods are presented in [13], [14]. The deterministic WPF can be categorized based on input data, forecast output, time-scale, and the forecasting method [15]. Generally, input data are historical weather records such as wind speed and direction, pressure, temperature, humidity, and the radiation measured at various time intervals and at different heights. In cases that the required data at a wind turbine height are not available, due to the lower height of anemometer towers of meteorological stations, wind data measured at lower heights can be extrapolated to the turbine height [16]. The WPF can be realized either directly or indirectly [17]. The former is to directly forecast wind power based on the data collected from SCADA systems, while the latter is to forecast wind speed first and then convert it into wind power forecast using power curves.
Although there is no standard definition for time-scale among scholars, a generally accepted definition can be veryshort-term forecasting (few seconds to 30 min), used for wind turbine regulation and control strategies, electricity market clearing, and real-time grid operation; short-term forecasting (30 min to 6 hours) used for economic load dispatch planning, operational security in the electric market, and load decisions for increments; medium-term forecasting (6 hours to 1 day) used for decision-making of unit commitment, reserved requirement, and generator operation; and longterm forecasting (more than 1 day ahead) used to schedule maintenance and determine the long-term feasibility of the wind farm [18].
Deterministic WPF methods can be classified into five categories of persistence, physical, statistical, machine learning, and hybrid methods [19]. Persistence methods are simple yet accurate methods utilized as a benchmark among wind forecasting methods [20]. Physical methods are generally based on numerical weather prediction models and ordinarily perform well for long-term time horizons [21]. Statistical methods employ historical data to predict wind behavior and usually perform well for short-term time horizons [22]. Machine learning is a subdivision of statistical methods that can learn patterns from data and forecast accordingly [23]. Finally, hybrid methods attempt to provide better predictions by combining various forecasting methods using weighting, preprocessing or decomposition, feature selection or optimization, and postprocessing techniques [24], [25]. A review of hybrid wind forecasting is performed by Xiao et al. [26].
Over the past years, several studies have been conducted on the WPF using different forecasting methods and on various horizons. A comprehensive review of existing research and current developments in this area can be found in [18]. In recent years, the machine learning method has been extensively adopted (approximately 38% of the literature) to generate a more accurate forecast among which 48% is dedicated to very-short-term, 36% to short-term, 3% to the mediumterm, and 13% to long-term [18]. Rodriguez et al. [27] addressed very-short-term WPF based on artificial neural networks and the records of wind power in the last 24 h. They investigated the correlation between weather variables and wind power in order to appropriately choose input variables. Jiajun et al. [28] proposed ultra-short-term wind prediction with wavelet transform, deep belief network and ensemble learning. They designed several case studies in order to explore the promotion of high dimensional feature extraction. Li et al. [29] presented short-term WPF based on the support vector machine with an improved dragonfly algorithm. The adaptive learning factor and differential evolution strategy were presented to enhance the performance of the traditional dragonfly algorithm and, thus, choose the optimal parameters of the support vector machine. Sun et al. [30] introduced short-term WPF by a synthetical similar time series data mining method. The clustering similar measure function was supposed to identify the similar wind speed days that are close in space distance and have a similar variance trend synthetically. Han et al. [31] proposed short-term WPF using an improved long short term memory network, where the variational mode decomposition technique was utilized to decompose the wind power signal. Abhinav et al. [32] presented short-term WPF based on a wavelet-based neural network, which was applicable to all seasons of the year. Chen and Liu [33] investigated medium-term WPF based on a multi-resolution multilearner ensemble and adaptive model selection. The method employed heterogeneous-based learners and datasets with different resolutions to guarantee diversity.
Results of another review revealed that most of the literature is dedicated to very-short-term and short-term WPF (≈87.22%) and there are limited works on longterm WPF (≈8.36%) [34]. Kanna and Singh studied [35] long-term WPF using an adaptive wavelet neural network. Lledo et al. [36] explicated seasonal forecasting of wind power generation using manufacturer-provided power curves. Alencar et al. [37] explored different models for WPF using artificial neural network models, autoregressive integrated moving average, and hybrid models, including forecasting using wavelets. Demolli et al. [34] investigated machine learning algorithms application' for indirectly forecasting a year-ahead wind power based on the daily mean wind speed and standard deviation measured at the height of 10 m extrapolated to the height of 50 m. Barbounis et al. [38] proposed 72-h ahead indirect WPF using meteorological data of four nearby locations and three types of local recurrent neural network models. Kusiak et al. [39] discussed models for 84-h ahead direct prediction of wind farm power using five different data mining algorithms and 10-min average data measured at the height of 60 m. Khan et al. [40] presented a new hybrid approach for a year-ahead direct WPF of hourly spaced wind turbine data using deep learning with a tensor flow framework and principal component analysis. Dumitru and Gligor [41] introduced long-term indirect WPF using feedforward artificial neural networks. Yan and Ouyang [8] developed a two-phase hybrid approach for three-month-ahead indirect WPF based on physical and data mining methods and the sampling time of 15 min data. Khosravi et al. [42] explored a multilayer feed-forward neural network, support vector regression, and adaptive neurofuzzy inference system for 24-h ahead indirect WPF using data measured at 5-min, 10-min, 30-min, and 1-h intervals. Han et al. [43] proposed multi-step direct WPF based on a variational mode decomposition-long short-term memory. Finally, Hong and Rioflorido [44] introduced a hybrid deep learning neural network for 24-h direct WPF based on an hourly spaced wind power dataset.
It can be concluded from the literature that there exist limited works investigating long-term WPF using machine learning algorithms. Besides, the existing works neglect to consider the impact of input data on the performance of WPF. Overfitting, underfitting, and generalization are significant issues in machine learning that should be considered in the algorithm selection procedure. However, most of the published works split the dataset only based on time series, leading to totally dissimilar training and test sets with different characteristics, thus posing the overfitting problem on developed models. To deal with these problems, this article investigated machine learning-based six-month-ahead WPF based on wind records. Tree-based algorithms, including decision tree, bagging, random forest, boosting (AdaBoost), gradient boosting, and XGBoost, were selected to train accurate WPF models with generalization abilities. This selection was motivated by the fact that they benefit from various regularization (such as pruning) and ensembling (such as bagging and boosting) techniques, which empower them to better deal with overfitting and underfitting. The K-folds cross-validator is also adopted to split the initial dataset into several train-test subsets.
Clearly, the quantity and quality of the dataset have a profound impact on the performance of the WPF model. Indeed, the dataset should be a representative sample of the region's wind speed characteristics. Hence, the developed models algorithms were trained with the mean and standard deviation of wind speed and power values. The results corroborated that tree-based learning algorithms can be effectively used for long-term WPF. Experiments revealed that the height and time interval of wind speed records have profound impacts on WPF accuracy. Therefore, the forecasting accuracy of the proposed models was investigated using observations measured at various heights and time intervals. Besides, the possibility and effectiveness of applying the proposed models to a location different from the modeltrained location were explored here. Experimental results demonstrated the generalization ability of the developed WPF models to predict the potential wind power at different heights and geographical locations even before wind turbines are installed.
The rest of this article is organized as follows: Section II reviews the background of the adopted algorithms and the performance indices. Data analysis, illustrative case studies, and simulation results are presented in Section III. Finally, conclusions are drawn in Section IV.

II. PRELIMINARIES A. MACHINE LEARNING
Machine learning is a branch of artificial intelligence that can learn from datasets, identify patterns and forecast outcomes or behavior. Generally, learning models can be classified into four categories: supervised, unsupervised, semi-supervised and reinforcement learning. Supervised learning is the most common category of machine learning that can be further classified into classification and regression.
Tree-based methods are deemed to be one of the best and powerful supervised learning subsets used for classification and regression [45]. These algorithms employ predictive models with rapid performance, high accuracy, and easy interpretation. In contrast with linear models, the methods map nonlinear relationships perfectly and are capable of adapting to various kinds of problems in the machine learning area. Tree-based models include decision tree, bagging, random forest, boosting, gradient boosting and XGBoost, which are elaborated upon as follows.

1) DECISION TREE
Decision tree refers to tree-based algorithms that handle multi-output problems with little data preparation. It is like a map to find the probable consequences of a series of related choices. The goal is to make a model that predicts a target value by learning easy decision rules formed from the data features. The method usually starts with a single node, which branches into probable consequences. Next, each of those outcome nodes leads to extra ones, which are connected to other possibilities. This process has a tree shape to finally obtain the final result.

2) BAGGING
Bagging is an ensemble estimator that provides a better understanding of the bias and variance of the dataset and includes a random sampling of a small subset of the main dataset. It is necessary to note that the subset can be replaced, and the selection process of all samples in the dataset has an equal chance. In this method, samples are picked with replacement, and predictions are achieved through a majority voting mechanism. It can help us better realize the standard deviation and mean related to the main dataset.

3) RANDOM FOREST
Random forest is a flexible machine learning method that is made by a large number of decision trees. It is an ensemble learner for classification and regression by considering three steps, randomly selecting training data when making trees, choosing some subsets of features when splitting nodes, and employing only a subset of all features for splitting each node in each simple decision tree. During the training of the data, each tree learns from a random sample of the data points.

4) BOOSTING
Boosting method converts weak learners into powerful learners. In this method, each new tree fits on a modified version of the main dataset. The AdaBoost is a well-known subset VOLUME 8, 2020 of boosting algorithms and begins by training a decision tree in which each sample is assigned an equal weight. Next, weights of the samples are changed after examining the first tree-based on correcting their past performance. During the process, these weights are adapted based on the current prediction error, so subsequent models focus more on difficult items.

5) GRADIENT BOOSTING
Gradient boosting method is like AdaBoost when it creates a predictor in the form of an ensemble of weak prediction models, normally decision trees. In contrast to AdaBoost, Gradient Boosting fits a new predictor to the residual errors by using gradient descent to find the failure in the predictions of the prior learner. Overall, the final model is capable of use as the base model to decreases errors over time.

6) XGBoost
XGBoost is an ensemble tree method like gradient boosting and is designed for better speed and performance. The method applies the principle of boosting for weak learners. The power of this method lies in its fast learning through parallel and distributed computing and also offering efficient memory usage. In-built cross-validation ability, efficient handling of missing data, regularization for avoiding overfitting, catch awareness, tree pruning and parallelized tree building are common advantages of the XGBoost method.

B. PERFORMANCE INDICES
The effectiveness of WPF methods can be assessed using various performance indices measuring accuracy, stability or direction. Nevertheless, most of the studies have evaluated forecast results only in terms of following accuracy metrics.

1) MEAN ABSOLUTE ERROR (MAE)
MAE measures the mean absolute of the difference between the observation and prediction, without deeming the errors' direction, whose formula can be expressed as where N is the number of samples, y i is the observation and y i is the prediction.

2) ROOT MEAN SQUARE ERROR (RMSE)
RMSE measures the square root of the mean of the difference between the observation and prediction, whose formula can be expressed as It shows how widely spread the predictions are from observations. Due to the large amplitude of the wind power, the normalized RMSE (nRMSE) can also be used, whose formula can be expressed as where P inst denotes the installed wind farm capacity.

3) COEFFICIENT OF DETERMINATION (R 2 )
R 2 is a statistical measure of fit, which shows how closely predictions match observations. It measures the ratio of the residual sum of squares to the sum of squares of the total deviations, whose formula can be expressed as whereȳ denotes the mean of observations. It has a value within [0, 1] where the closer the value to 1, the higher the prediction accuracy.

III. SIMULATION RESULTS
Tree-based learning algorithms were used for forecasting sixmonth-ahead wind power based on the mean wind speed and standard deviation. In the following, data analysis is presented to analyze datasets for gaining insights that are advantageous in making strategic decisions on WPF. Next, illustrative case studies and simulation results are presented and investigated in detail.

A. DATA ANALYSIS
Over the last decade, wind power generation in Iran has experienced a remarkable increase from 203 MW in 2010 to 302 MW in 2019. Analysis of the wind energy potential in 26 stations, with a 33% energy efficiency, shows that electrical energy of 6500 MW can be produced [42]. The northeast region of Iran is a potential area for wind power generation with constant wind flow without extreme gusts. Hence, two sites in this region were considered, as displayed in Fig. 1: Ghadamgah (36.104 • north and 59.066 • east longitude) and Khaf (34.567 • north and 60.148 • east longitude). These regions are recognized for possessing a notable wind potential capacity accompanied by good infrastructural conditions and electrical grid connection, so they were selected for wind farm settlement.
Here, the dataset included 18 months of wind speed practical values with a 10-min sampling time as the inputs of the WPF model. A total of 236556 and 238002 observations measured at 40 m, 30 m, and 10 m heights were collected from Khaf and Ghadamgah wind farms, respectively. Figure 2 demonstrates the variation of the wind speed for selected wind farms at various heights, whose statistical information is shown in Table 1. Because of the intermittent nature of wind, it exhibits various fluctuations and spikes in different areas, heights, and lengths of time.
Probability density functions, such as Rayleigh or Weibull, are simple yet practical methods frequently adopted to  describe the behavior of wind speed data in a period [46]. Figure 2c illustrates Weibull distributions for the selected locations at various measuring heights. Evidently, the peak of the curve for Ghadamgah is 0.128, whereas this value is 0.077 for Khaf. It means that the frequency of observing average levels of speed in Ghadamgah is higher than Khaf, whereas Khaf has higher wind speed values than Ghadamgah.
The power curve provides a suitable way to describe the correlation between the generated power and wind speed.
The power curve and technical specifications of 1 MW wind turbines installed in the Khaf wind farm are given in Fig. 3 and Table 2, respectively. This generally S-shaped curve contains three key speeds: cut-in speed (v ci ), which refers to the minimum speed at which the turbine delivers useful power; rated speed (v r ), which is the wind speed at which the generator produces maximum output power (P r ); and cut-out speed (v co ), which denotes the maximum wind speed at which the turbine is allowed to produce power. The power curve can   be modeled using where α 0 ,. . ., α n−1 and α n are constants, and P i (v) is the corresponding output power. Accordingly, 1-h, 12-h, and 24-h wind power values can be computed by taking the average of output power values, as where N denotes the number of samples in the considered time interval (1-h, 12-h, and 24-h), while t is the sampling time (10-min) [34]. For instance, Fig. 3 represents the obtained wind power values for 1-h interval.

B. METHODOLOGY
The proposed methodology is illustrated in Fig. 4 utilizing the trial-and-error obtained parameter settings listed in Table 3. First, data preprocessing was performed to clean the data and derive useful information. Next, the corresponding wind power was computed based on the wind power curve. Then, actual wind speed values with 10-min time intervals were converted into hourly mean wind speed values and standard deviations, and then, hourly total wind power was calculated by summing every 10-min sampled power values. Thereafter, the dataset was split into training and test sets using the K-folds cross-validator and based on the specified forecasting horizon. The training set was utilized to train the candidate algorithm and fit the model, and the test set was used to generate wind power values based on the WPF model. Finally, performance assessment was performed in terms of several well-known metrics.
Multiple case studies were designed to analyze the effectiveness of machine learning algorithms in long-term WPF and validate their generalization ability.

C. CASE 1
The first case was assigned to six-month-ahead WPF given 12 months of hourly mean wind speed values and standard deviation and their corresponding hourly total generated wind power values. The Ghadamgah dataset measured at a height of 40 m with a 10-min time interval was converted into another dataset with an hourly time interval. Thereafter, in order to evaluate the effect of the time interval on the accuracy of the forecasted model, another two datasets with 12-and 24-hour time intervals were created based on the original dataset. Correspondingly, 1-h, 12-h, and 24-h total wind power values were computed. Therefore, three models were fitted using 70% of the data set for six-month-ahead WPF. Table 4 presents the calculated metrics for forecasted models. Evidently, all algorithms exhibited a powerful performance with a high coefficient of determination and small errors. However, XGBoost provided a more accurate model, while decision tree produced a less accurate model in terms of different performance indices. Moreover, performance indices revealed that longer time intervals lead to performance degradation due to the loss of representativity of the dataset. Nevertheless, all the forecasted models provided an acceptable performance against variations in the time interval, among which bagging and random forest showed better robustness evident in the lower variations of the corresponding metrics.

D. CASE 2
Although wind data at higher heights close to wind turbines' height are required to accurately forecast wind power, the height of anemometer towers of meteorological stations is usually 10 m. A conventional solution to this problem is extrapolating wind speed values measure at lower heights to desired heights applying [34]  where v 0 is the wind speed observed at the anemometer height (h 0 ), v is the wind speed at the desired heights (h), and α is the power-law constant that depends on the surface roughness [16]. The second case aims to investigate the impact of extrapolation on forecasting models. In this regard, the Ghadamgah datasets, measured at a height of 10 m and 30 m with a 10-min time interval, were extrapolated to a height of 40 m by using (7) and α = 0.14. The simulation results for various algorithms are listed in Table 5, which substantiate that the use of extrapolation causes performance degradation in the forecasted models. Nonetheless, the algorithms properly forecasted six-month-ahead wind power, among which XGBoost generated a model with higher accuracy and better robustness against data inaccuracies caused by extrapolation, whereas the decision tree produced the weakest performance. Figure 5 demonstrates the WPF accuracy of machine learning algorithms using the Ghadamgah hourly dataset measured at the height of 40 m as the best case. As expected, the extrapolation slightly deteriorated the accuracy of WPF models, in which the more the difference between the measuring height and the desired height, the larger the forecasting error and the less the performance accuracy. However, all the algorithms yielded acceptable models for an entire gamut of wind speed values. Figure 6 displays hourly actual and forecasted power values of six-month-ahead, verifying that the algorithms can deal with the uncertain nature of inputs with high accuracy and without overfitting. Again, XGBoost yield a slightly superior model with lower MAE and RMSE and higher R 2 , while the decision tree produced a slightly inferior model with higher MAE and RMSE and lower R 2 . VOLUME 8, 2020

E. CASE 3
A generalizable model is a neither underfit nor overfit model aiming to make sensible predictions based on unseen validation datasets. Besides, knowing wind power at different heights is a prerequisite for having a more efficient wind farm by establishing wind turbines at various elevations.
Hence, this case explored the ability of the proposed models to forecast wind power at heights of 30 m and 10 m. Simulation results for various algorithms are summarized in Table 6, where the developed models manifest the generalization ability through predicting wind power at heights different from the model-trained height. For example, the model generated   All the selected algorithms yield similar results for this case in terms of reported performance indices. This is due to their inherent features such as regularization and ensembling that enable them to successfully tackle variable and uncertain wind data with neither overfitting nor underfitting. Nevertheless, as expected, the more the difference between the training height and validation height, the more the forecasting error and the less the performance accuracy. This emphasizes the  importance of data quality and its impact on the performance of developed models.

F. CASE 4
This case was designed to appraise the proposed models' predictions based on a new and previously unseen dataset. To do this, hourly wind measurements at a 40-m height in Khaf wind farm were applied to the trained models as test datasets. Figure 2 illustrates the variations of the new dataset besides its probability density function. Khaf and Ghadamgah, which belong to different geographical regions, possess different wind profiles, and Khaf exhibits high wind speeds with more fluctuations. Performance indices and experimental graphs are compared in Table 7 and Figs. 7 and 8, respectively. Overall, the results clearly show the reliability of previously trained models to forecast wind power in Khaf. Based on Table 7, performance indices carry a slight deviation from previous metrics, while according to Figure 8, predictions represent observations very well. Figure 7 indicates that all the considered methods can predict the generated power with the accuracy of 5%-6% error in the case of MAE, and XGBoost has the best performance. This case revealed the intrinsic power of machine learning-based WPF models in making accurate predictions based on unseen datasets, which can be effectively adopted before the establishment of wind farms in another geographical region.

IV. CONCLUSION
This article proposed a comparative analysis for six-monthahead WPF using tree-based learning algorithms. Overfitting and underfitting were considered in the algorithm selection and training procedure. Furthermore, datasets with sufficient samples were considered to bolster the prediction accuracy and generalization performance of the models. To increase the diversity of the training set and prevent overfitting, crossvalidation was performed to split the initial dataset into several train-test subsets. Several models were trained with average and standard deviation values of the wind speed. The proposed models showed powerful performance in forecasting potential wind power in the Ghadamgah wind farm.
The intermittent and uncertain nature of wind poses serious problems for collecting sufficient representative datasets, which can degrade the prediction accuracy to a great extent and make it prone to overfitting. Hence, multiple case studies were defined to investigate the impact of the training dataset on forecasting accuracy. The simulation results revealed that using longer time intervals and height extrapolation leads to considerable accuracy degradation in the forecasted models, where height extrapolation engenders more adverse effects in the forecasting accuracy of developed models. Indeed, using the standard deviation of the wind speed in the training procedure preserves the representativity of the dataset and reduces the adverse effects of increasing time intervals. For all negative effects of increasing the time interval and decreasing the measuring height on the performance of WPF models, simulation results demonstrate the robustness of the proposed models against data uncertainties.
To reinforce machine learning in industrial WPF applications, the models were proved on an unseen dataset collected from different heights, which is useful in choosing various wind turbine heights to be installed in wind farms. Finally, the generalizability of the models was confirmed in dealing with unseen datasets collected from Khaf wind farm. Experiments revealed the reliability of the models in predicting the potential wind power at different locations and heights. MD. JALIL PIRAN (Member, IEEE) received the Ph.D. degree in electronics and radio engineering from Kyung Hee University, South Korea, in 2016. Subsequently, he continued his work as a Postdoctoral Research Fellow, in the field of resource management and Quality of Experience in 5G cellular networks and the Internet of Things, with the Networking Laboratory, Kyung Hee University. He is currently an Assistant Professor with the Department of Computer Science and Engineering, Sejong University, Seoul, South Korea. He has published substantial number of technical papers in well-known international journals and conferences in research fields of 5G and 6G wireless communications, the Internet of Things (IoT), multimedia communication, cognitive radio networks, applied machine learning, security, and smart grid.