Artificial Intelligence Based Hybrid Forecasting Approaches for Wind Power Generation: Progress, Challenges and Prospects

Globally, wind energy is growing rapidly and has received huge consideration to fulfill global energy requirements. An accurate wind power forecasting is crucial to achieve a stable and reliable operation of the power grid. However, the unpredictability and stochastic characteristics of wind power affect the grid planning and operation adversely. To address these concerns, a substantial amount of research has been carried out to introduce an efficient wind power forecasting approach. Artificial Intelligence (AI) approaches have demonstrated high precision, better generalization performance and improved learning capability, thus can be ideal to handle unstable, inflexible and intermittent wind power. Recently, AI-based hybrid approaches have become popular due to their high precision, strong adaptability and improved performance. Thus, the goal of this review paper is to present the recent progress of AI-enabled hybrid approaches for wind power forecasting emphasizing classification, structure, strength, weakness and performance analysis. Moreover, this review explores the various influential factors toward the implementations of AI-based hybrid wind power forecasting including data preprocessing, feature selection, hyperparameters adjustment, training algorithm, activation functions and evaluation process. Besides, various key issues, challenges and difficulties are discussed to identify the existing limitations and research gaps. Finally, the review delivers a few selective future proposals that would be valuable to the industrialists and researchers to develop an advanced AI-based hybrid approach for accurate wind power forecasting toward sustainable grid operation.


I. INTRODUCTION
The demand for electricity generation has been raising significantly due to rapid urbanization and economic growth [1]. However, electricity production by fossil fuels contributes to 63% of the shares in global carbon emissions [2]. Due to the reduction of fossil fuels reserve and carbon emissions caused by them, renewable energy (RE) could play a crucial role in mitigating global emissions as well as providing clean and sustainable energy. RE is abundant and has the enormous potential capacity to fulfill the increasing electricity needs [3], [4]. Renewables made up a quarter of global electricity generation in 2020 [5]. The main reasons involved with a drastic increase in power generating capacity of RE refers to cost competitiveness as well as performance optimization technologies [6]- [9]. Nevertheless, several limitations can hamper the development of RE-based power generation such as poor infrastructure, limited technology, lack of financial intensives, high investment cost and weather unpredictability [10]- [12]. Therefore, it is important to carry out research and development on RE not only to meet the growing electricity demand but also to achieve sustainable development.
Among the various RE sources, wind energy is one of the major and growing sources of renewable energy [13], [14]. Wind energy shares 5% of global electricity generation with a capacity of 591 GW [15]. However, compared to the traditional power sources, wind power is extremely unstable, random, intermittent and inflexible due to the impact of meteorological and adjacent terrain environments [16], [17]. Various factors can influence wind power such as wind speed, wind direction, temperature, humidity, atmospheric pressure and altitude etc. [18]. These variables are also correlated with each other that results in high wind power fluctuation and eventually causes difficulties to achieve satisfactory outcomes in wind power forecasting [19]. To enhance the reliability of the power supply system as well as address the intermittency characteristics of wind power, the reserve capacity of the power supply must be ensured to deliver the continuous power supply when the wind power is inadequate [20]. Nevertheless, the reserve capacity increases the overall expenses of wind power indirectly, thus, it is necessary to develop an efficient forecasting method for wind power generation [21]. Accurate wind power forecasting helps to make appropriate scheduling plan based on the variations in wind power, reduces the standby capacity of the power grid, decreases the operation cost of the power system, allows flexible dispatch strategies, improves the power quality and ensures the stable and reliable operation of the power grid [22]- [24].
Lots of efforts and approaches have been introduced to address the wind power prediction issues. Wind power forecasting can be categorized into physical-based methods, statistical-based methods and artificial intelligence (AI) based methods [25]. Among the methods aforementioned, AI approaches can self-adapt and self-learn, thus suitable to handle the dynamic, non-linear and complex wind power features [26]. Moreover, AI approaches have illustrated improved learning capability, high precision and better generalization performance [27]. The various AI approaches have been reported in the literature to evaluate the wind power forecasting including backpropagation neural network (BPNN) [28], radial basis function neural network (RBFNN) [29], extreme learning machine (ELM) [30], support vector machine (SVM) [31], Gaussian process regression VOLUME 9, 2021 (GPR) [32] and adaptive neuro-fuzzy inference system (ANFIS) [33]. Recently, deep learning methods have received wide attention due to their high computation intelligence and accuracy which comprises long short-term memory (LSTM) [34], convolutional neural network [35] and deep belief network (DBN) [36]. Moreover, various optimization algorithms are employed to determine the suitable parameters of AI approaches including genetic algorithm (GA) [37], backtracking search algorithm (BSA) [38], coral reefs optimization (CRO) [39], particle swarm optimization (PSO) [40] and fruit fly optimization algorithm (FOA) [41]. However, the single AI approaches may provide unsatisfactory prediction outcomes due to the inappropriate selection of features, hyperparameters and functions. Thus, the hybridization of AI methods has become increasingly popular not only in improving the complexity of the algorithm but also enhancing the forecasting of wind power generation [42]. Generally, the hybrid AI models are designed by combining two or three machine learning techniques [43] or integrating optimization algorithms with AI approaches [44]. Hybrid AI approaches have overcome the shortcomings of single AI approaches by finding the optimal features, hyperparameters and training algorithms [45].
So far, a few review papers related to wind power forecasting have been reported. Wang et al. [46] overviewed the wind power forecasting based on short-term and longterm approaches. However, the AI-based hybrid approaches for wind power prediction were not examined in detail.
Hanifi et al. [47] conducted a survey focusing on the physical, statistical and hybrid methods for wind power forecasting. Nevertheless, the authors explored a few machine learning approaches to predict wind power forecast. Besides, the key issues and challenges were not explored explicitly. Chang [48] delivered a classification of wind power forecasting based on different horizons. Nonetheless, the investigation of hybrid AI approaches, their implementations and limitations are not discussed in detail. Dhiman and Deb [49] reviewed the wind speed and wind power forecasting techniques. Nonetheless, the deep learning algorithm approaches and their executions were not covered. Maldonado-Correa et al. [50] presented a systematic literature review of wind power forecasting with various baseline models and AI tools. However, the hybrid AI forecasting approaches were not explored elaborately. Mao and Shaoshuai [51] delivered the classification of wind forecasting methods according to timescale, prediction models and output data. Nevertheless, the hybrid AI approaches were not studied comprehensively. Besides, the implementation factors of key issues of hybrid AI approaches were not outlined.
To bridge the existing research gaps, this review presents an in-depth investigation of wind power forecasting using AI based hybrid approaches. Besides, the paper showcases the various combination of hybrid AI approaches, implementation factors, issues, limitations and suggestions of wind power prediction. The main contributions of this review are summarized as follows: • The various hybrid AI-based wind power forecasting approaches are explored in detail. Besides, the classification of AI-based hybrid approaches regarding their structure, executions, benefits and shortcomings are delivered.
• The influential factors in forecasting the wind power based on the hybrid AI methods are discussed including data preparation, feature selection, algorithm functions, hyperparameters adjustment and evaluation process.
• The key issues and limitations of AI-based on hybrid methods for wind power forecasting are outlined including data diversity, implementation issues, optimization integration and hybridization issues.
• The selective future prospects for the development of advanced hybrid AI approaches for wind power forecasting are provided.
The rest of the paper is divided into six sections. Section 2 presents the methods to conduct the survey. Section 3 explains the detailed survey of hybrid AI methods for wind power forecasting. Section 4 covers the various implementation stages of AI-based hybrid wind power forecasting. The issues and challenges of hybrid AI methods are explored in section 5. The concluding remarks with prospects are highlighted in Section 6.

II. SURVEY METHODS
The target of this review is to gather all the latest information, conduct analysis, provide the critical discussion of AI-based hybrid approaches for wind power forecasting. Accordingly, the authors have collected numerous key literatures related to wind power prediction using various databases such as Scopus and Web of science. Various platforms are used to search for suitable studies including google scholar, IEEE Xplore, ScienceDirect and ResearchGate. The authors have utilized the keywords to explore the relevant papers within the scope and target including wind power forecasting, artificial intelligence, machine learning, deep learning, optimization, hybrid approaches. The authors have found several papers; nevertheless, the suitable studies are chosen based on title, novelty, abstract, outlines, contributions. Finally, the authors adopted the journals quartiles, citation, novelty, impact factor to carry out the final selection of articles. The results of the survey methods were divided into four groups. Firstly, the AI-based hybrid wind power forecasting was comprehensively reviewed. Secondly, implementation and influential factors were described. Thirdly, the several issues and challenges of hybrid AI approaches were identified. Finally, the conclusion along with selective prospects for further enhancement of AI-based hybrid approaches toward sustainable wind power generation is provided. The reviewing methodology is arranged into two stages as depicted in Fig.1. The summary of results is outlined below.

A. SELECTION PROCEDURES
• In the first screening results, a sum of 557 papers were found using various platforms such as google scholar, IEEE Xplore, ScienceDirect, MDPI and ResearchGate.
• In the second assessment and screening results, a total of 267 articles were analyzed using the proper keywords, title, abstract, outlines and contributions.
• In the third evaluation results, a total of 140 references were cited for review based on journals quartiles, citation, novelty and impact factor.

B. RESULTS OF THE REVIEW
• AI-based hybrid wind power forecasting and their classification, structure, benefits, drawbacks, performance, error analysis, research gaps and future works were broadly reviewed.
• The implementation and influential factors including data preparation, feature selection, algorithm functions, hyperparameters adjustment, validation and verifications were explicitly discussed.
• The key issues and challenges of AI-based hybrid wind power forecasting approaches were explored.
• Selective future proposals and directions for the further improvement of AI-based hybrid forecasting approaches for wind power generation were provided.

III. PROGRESS OF ARTIFICIAL INTELLIGENCE BASED HYBRID APPROACHES FOR WIND POWER FORECASTING
The AI approaches are becoming popular in renewable power systems for example solar, wind, ocean, geo-thermal, and hydro power to enhance the system efficiency. This section provides a classification and explanation of different AI-based hybrid approaches for wind power forecast including neural network, classification and regression, deep learning and rule-based algorithms, as shown in Fig. 2.

A. FEED-FORWARD NEURAL NETWORK-BASED HYBRID APPROACH
The feed-forward neural networks (FFNNs) have demonstrated high accuracy and robustness in predicting wind VOLUME 9, 2021 FIGURE 2. Classification of AI-based hybrid approaches for wind power forecasting. power. This section classifies various neural network approaches for wind power forecasting including feedforward neural network, radial basis function neural network, extreme learning machine, generalized regression neural network.

1) BACK PROPAGATION NEURAL NETWORK-BASED HYBRID APPROACH
The feed-forward back propagation neural network (FFBPNN) utilizes the artificial neurons, weight and bias to form a three-layer structure including hidden layer, input layer, and output layer, as depicted in Fig. 3. The hidden layer activity is performed utilizing the suitable quantity of activation functions, hidden neurons, hidden layers, and network hyperparameters. The output layer determines the prediction utilizing the data of the activation function and hidden layer [52]. The expressions of output layer output, netk, hidden layer output, netj, and sigmoid activation function are written in the following equations, where x i is the input vector. The weights are denoted as w i,j and w j,k . The biases are represented as θ i,j and θ k,j . O j is the hidden layer outcome. The bias and weight values are upgraded utilizing the following expressions, where α is the learning rate, e k is the estimated output and T k is the true output. FFBPNN has the strength to address highly non-linear and complex wind power problems. Nevertheless, FFBPNN has some drawbacks, for instance, local minimum trap, overfitting issues, less generalizing performance and slow convergence speed.
Saroha and Aggarwal [53] proposed genetic algorithm (GA) optimized FFBPNN technique to assess the multiple steps ahead of wind energy forecasting. Here, GA was employed to find the suitable values of weights and biases of FFBPNN. This strategy demanded the input of previous estimations for forecast and accordingly the inputs were selected based on auto correlation function (ACF). The proposed model was trained by Levenberg-Marquardt (LM) algorithm. The GA-based FFBPNN algorithm provided better outcomes than conventional FFBPNN with regard to mean absolute error (MAE) and mean absolute percentage error (MAPE). However, it is challenging to predict multiple time series due to the larger prediction horizon. Moreover, it is crucial to select the appropriate time lag for data training. Hence, in-depth exploration is required to overcome such problems.
Niu et al. [54] proposed FFBPNN integrated with singular spectrum analysis (SSA) to predict multi-step-ahead wind energy forecasting. The authors applied BAT optimization to determine the optimal feature selection. The validation was performed using different time horizons. The proposed hybrid model worked effectively under different experiments and the results indicated root means square error (RMSE) of 1.7332 m/s, 1.6552 m/s and 2.0424 m/s in one-step, threestep and six-step horizon, respectively under 60-min wind speed series. Future research on multi-objective optimization algorithms and the development of chaotic time series can be employed to improve the accuracy of wind speed forecasting.
Jiao et al. [55] proposed a hybrid model for wind energy forecasting based on the FFBPNN integrated with the stacked auto-encoders (SAE) technique. Initially, three hidden layersbased SAE was considered to extract the features from the reference dataset. After, the subsequent loss data was utilized in the pre-training cycle to obtain the connection weights of the FFBPNN. Finally, the BP strategy was used to calibrate the weights of the entire network. Moreover, particle swarm optimization (PSO) was employed to find the suitable values of hidden neurons and the learning rate of the proposed SAEbased FFBPNN technique, as denoted in Fig. 4. The MAPE of the proposed hybrid approach was calculated to be 15.96%, while that for SVM and BP were 27.88% and 47.33%, respectively. Even though the suggested model ensured improved prediction compared to existing techniques, deep learning can be considered in future research.
Zeng et al. [56] suggested a hybrid strategy incorporating differential evolution (DE) and FFBPNN technique for wind power forecasting. The DE was used to determine the appropriate value of thresholds and connection weights of FFBPNN leading to an improvement in the wind forecasting performance, as denoted in Fig. 5. The result indicated that that the DE-based FFBPNN technique delivered better wind power forecasting results than the other mainstream existing strategies by reducing MAPE by 91.19% and 64.14% in compassion to multi linear regression (MLR) and FFBPNN technique. Although the proposed hybrid technique delivered satisfactory results, a comparative analysis can be carried out with other advanced optimization algorithms. Besides, further studies are required to explore the appropriate parameters and structure of FFBPNN. Also, more validations are necessary under real-world wind power data to reduce carbon emissions.

2) RADIAL BASIS FUNCTION NEURAL NETWORK-BASED HYBRID APPROACH
Radial basis function neural network (RBFNN) is in the category of FFNN which comprises three layers known as input, output, and hidden layer, as shown in Fig. 6. RBFNN is the most utilized network for function approximation problems. The hidden layer neurons consist of information of Gaussian transfer functions. The output of Gaussian transfer functions is inversely proportional to the distance of the neuron to the center [29]. The real numbers and scalar input function vector are presented as X ∈ R n and ϕ : R n → R, respectively. The output of the network is expressed as, where N is the number of neurons in the hidden layer, c i represents the neuron vector center and a i is the weight vector of neuron i. In RBFNN, the output functions are solely based on the distance of the neuron center. Apart from this, all the inputs are directly associated with the hidden neuron.
There are several advantages of RBFNN that make this network prominent than others including easy online learning ability, strong tolerance to input noise, and strong generalization. The characteristics of this network ensure its flexibility and precise control systems. Along with the advantages, several drawbacks still need to be addressed. Although the VOLUME 9, 2021 FIGURE 5. Flowchart of DE based BPNN approach for wind power prediction [56]. training speed is faster, classification speed is slower due to the fact of dependency of each node in the hidden layer [29].
Karamichailidou et al. [57] introduced RBFNN based hybrid approach for wind turbine power curve modeling. The authors trained the RBFNN model with non-symmetric fuzzy means (NSFM) algorithm resulting in high accuracy and low computation cost. The efficient training of dataset with high dimensionality operation was ensured with the integration of NSFM algorithm with tabu search (TS) algorithm. The proposed technique can be utilized conveniently for building wind turbine efficiency measurement tools. The proposed hybrid approach measured MAE of 18.9955 kW and 18.9325 kW for February and July, respectively. Since the outcome of this technique is promising, hence this proposed hybrid model can be further investigated to solve other challenging applications in wind power.

3) EXTREME LEARNING MACHINE BASED HYBRID APPROACH
The ELM is suitable for predicting results of non-linear and complex systems. ELM demonstrates great adaptability, higher predictability of any continuous operation, higher learning speed and less calculation difficulty which help to achieve good prediction outcomes over other AI strategies [58]. The design of ELM is shaped using three layers consisting of one hidden layer, input layer and output layer, as shown in Fig. 7. The training activity of ELM is completed by assigning the hidden layer biases and input weights randomly. The outcomes of the hidden layer in ELM are expressed as, The input hidden layer bias and weight vector are expressed as β i and x = [x i1 , x i2 , . . . x iN ] T respectively. The concealed neurons are denoted asÑ . a i = [a i1 , a i1 , . . . , a iN ] T presents the weight vector which interacts the input nodes, and i-th hidden node. B i = [B i1 , β i2 , . . . , β iN ] T presents the outcome weight which associates the i-th output layer neuron and hidden layer neuron. The sigmoid activation function is denoted as f ().
ELM in deep networks has its benefits and drawbacks. It is noticed that ELM requires many hidden neurons. Furthermore, ELM can achieve better and faster outcomes because of the least square technique. One important advantage of ELM is the short training time. However, the ELM has some disadvantages such as the over-fitting problem [59].
Yang and Chen [60] proposed a hybrid technique combining ELM, SAE, and empirical mode decomposition (EMD), aiming to predict wind energy proficiently and precisely. The verification of the proposed hybrid approach was conducted using wide-ranging investigations including the genuine dataset. The assessment outcomes revealed that the ELM-SAE-EMD technique delivered higher accuracy, indicating 94.04% on the combined dataset and an average of 93.73% on a single dataset in terms of 12 h predicting horizon. Moreover, the evaluation report illustrated the capacity of the shared-hidden-layer-based ELM in forecasting wind power with ordinary computational intelligence. Although ELM shows great potential in forecasting wind speed, further investigation is required using the deep structure to address the complex time-series problems.
Li et al. [61] suggested a hybrid wind energy forecast strategy using the kernel mean p-power error loss (KMPE) and ELM. Moreover, principal component analysis (PCA) was utilized to remove few excess dataset features leading to a decrease in the computational burden of the huge amount of dataset. The effectiveness of the suggested technique was checked with the real-world dataset. The execution of the proposed hybrid technique was performed using five phases including data expansion, data preprocessing, parameter optimization, time complexity and error comparison. The outcomes demonstrated that the suggested ELM-KMPE technique was dominant to the conventional FFBPNN approach. Nevertheless, ELM-KMPE illustrated a slower estimation speed due to the fixed-point cycle. Thus, further examination is required to overcome the calculation complexity issues. In addition, further studies can be carried out using deep learning algorithms with better computer configurations.
Qolipour et al. [62] suggested ELM and Grey model to develop a hybrid machine learning model for long-term wind speed prediction. The wind speed assessment was executed using the Homer Pro programming software with the 10-year data (2005-2015) in 24 hours. The proposed hybrid technique was excellent with respect to execution performance and accuracy. The suggested hybrid technique with MSE of 0.000376 m/s and co-efficient of determination (R 2 ) of 0.99376 achieved better forecast results than the conventional ELM model with MSE and R 2 of 0.00720 m/s and 0.98075, respectively. Future research direction includes the evaluation of the hybrid model using different optimization algorithms under a real-world dataset.
Salcedo-Sanz et al. [63] introduced a hybrid technique for short-term wind speed forecasting including ELM, Coral Reefs Optimization (CRO) and Harmony Search (HS) algorithm. The effectiveness of ELM was enhanced by the suitable input features found through the CRO. The verification of the proposed hybrid technique was performed using two meteorological towers located in USA and Spain. The CRO-HS obtained better prediction results than HS and CRO, achieving RMSE of 3.329 m/s. Table 1 shows the summary of FFNN-based hybrid approaches for wind power forecasting.

B. CLASSIFICATION AND REGRESSION-BASED HYBRID APPROACH
The classification and regression techniques are proven to become effective in wind power prediction. A classification strategy may forecast a non-discrete value; however, the constant value is in the form of probability. On the other hand, the regression techniques may forecast a non-continuous value, yet the non-continuous as a number amount [19].

1) SUPPORT VECTOR MACHINE-BASED HYBRID APPROACH
The support vector machine (SVM) is a supervised AI strategy that can be utilized for both regression and classification difficulties. In the SVM technique, every dataset is plotted as a point in n-dimensional space with the estimation of each component being the estimation of a specific coordinate. Furthermore, the classification is performed by extracting the hyper plane that separates the two classes efficiently [31]. The hyperplane of SVM to separate distinct classes is presented in Fig. 8. The hypothesis operation is expressed as, where x represents the points in the element space in the hyperplane, n is the training dataset points, b is the offset of the hyperplane, w is the ordinary vector to the hyperplane, λ is the tradeoff among the margin size which ensure that the x i lies on the right half of the edge. SVM has been successful in higher dimensional spaces where the quantity of measurements is more than the quantity of sample. Nevertheless, SVM cannot assign the parameters optimally for all the cases. In addition, SVM does not VOLUME 9, 2021 give probability measurements straightforwardly and these are determined utilizing a costly five-overlay cross-approval process. Also, it does not perform very satisfactorily when the dataset contains more noise [65], [66].
Liu et al. [67] proposed Jaya algorithm-based SVM method for short-term wind speed estimation. Here, Jaya algorithm was applied to optimize the hyperparameters of SVM. The performance of the Jaya-based SVM model was compared with seven other techniques including granular computing, stacked sparse autoencoder, gaussian process regression, deep belief network, multi-layer perceptron regression model, extreme gradient boosting model and least absolute shrinkage and selection operator. The proposed hybrid model delivered the best results in comparison to other methods in terms of MSE, MAPE and R 2 . The report stated that when the prediction step increased from 1 to 2, the MSE estimation of the suggested technique also raised from 0.6451 m/s to 0.8623 m/s, which was around 3% addition. Similarly, from step 2 to 3, MSE estimation increased from 0.8623 m/s to 1.0154 m/s. The authors examined the wind speed prediction without considering the seasonal impact. Hence, further exploration is required to predict wind speed under seasonal factors and wind direction predicting.
Li et al. [68] developed a hybrid model for short-term wind power forecasting based on least squares support vector machine (LSSVM) optimized by improved ant colony optimization (ACO) algorithm. The LSSSVM can address the problem of two quadratic programming problems of SVM. The proposed hybrid technique was superior to FFBPNN and SVM methods, indicating MSE, average absolute error and average relative error of 1.6 m/s, 0.4 m/s and 6.65%, respectively. The results of the proposed hybrid strategy were promising; however, further analysis can be carried out using other notable optimization techniques.
Wang et al. [69] proposed a hybrid technique for dayahead wind power forecast based on LSSVM integrated with fruit fly optimization algorithm (FOA). LSSVM technique was utilized to demonstrate the non-linear relationship among atmospheric pressure, temperature, wind direction, and wind speed. FOA was utilized to search for the ideal features of LSSVM including kernel width parameter and regularization parameter. The results illustrated that FOA had fast convergence speed compared to PSO in finding the optimal parameters. The LSSVM with FOA had the lowest forecast error over LSSVM-PSO and LSSVM, achieving RMSE and MAE of 14.23% and 12.54%, respectively. The training periods of the sample dataset for LSSVM with PSO and LSSVM with FOA technique were estimated to be is 85s and 65s, respectively. In future research, other regression methods can be utilized to enhance the efficiency of forecasting error.
Qu et al. [70] established a hybrid model with FOAbased support vector regression (SVR) to achieve precise and reliable wind speed forecasting. The performance of SVR was compared with RBFNN, General regression neural network (GRNN) methods. Due to the nonlinearity and nonstationary of wind speed, the Ensemble Empirical Mode Decomposition (EEMD) was employed in the datapreprocessing stage to decompose the original dataset into a series of independent Intrinsic Mode Functions (IMFs). The prediction performance of SVR, RBF and GRNN was enhanced by FOA optimization. The methodological framework of the proposed hybrid technique is shown in Fig. 9. The experimental report illustrated that EEMD-FOASVR outperformed single models as well as hybrid models such as EEMD-FOAGRNN, EEMD-FOARBF, achieving the minimum statistical error with RMSE, MAE and index of agreement (IA) of 0.1301 m/s, 0.0999 m/s and 0.9978 m/s respectively in different seasons. Future exploration can be conducted to solve the nonlinear approximation problems.
Tian et al. [71] combined the LSSVM and BSA methods for the short-term wind speed forecast. The BSA was employed to explore the key parameters. The proposed model was updated using the prediction error precision approach integrated with the sliding window mechanism, thus avoiding the mismatch issue between the prediction model and actual wind speed data. The proposed hybrid model attained MAE, MAPE, RMSE and R2 of 0.1374 m/s, 0.1248 m/s, 0.1589 m/s and 0.9648%, respectively. Moreover, the average value of absolute relative prediction error and the absolute prediction error were 8.7111% and 0.1113 m/s, respectively while the wind speed fluctuated from 0-4 m/s. Nevertheless, the ideal parameters selection is a laborious task that needs human expertise as well as in-depth experiment. Thus, further investigation is required to obtain the parameters of BSA adaptively.
Li et al. [72] proposed cuckoo search optimized SVR approach for short-term wind power forecast, as depicted in Fig. 10. The improved cuckoo search (ICS) arithmetic was developed to optimize the parameter of the SVR including the penalty factor and kernel function. The ICS outperformed GA, PSO and CS with regard to fast convergence speed, high accuracy and outstanding global optimization capacity. The ICS-SVR was proven to be effective in forecasting wind power under volatile conditions with low RMSE, MAPE and R 2 . The proposed hybrid model computed MAPE of 7.03 % which was reduced by 11.14% and 14.51% from those obtained from GA-SVR and CS-SVR, respectively. Figure a presents the graphical flowchart of ICS algorithm. Future works can be performed using an appropriate data processing method and accordingly, validation can be analyzed under more influencing factors.
Yang [73] proposed a hybrid technique based on the BSAbased SVM approach. BSA was employed to update the weights, leading to an increase in prediction effectiveness in SVM. The performance of the recommended hybrid strategy was evaluated using the wind energy data in China between 2001 and 2013. The investigation outcomes demonstrated that BSA-based SVM was superior to individual SVM, FFBPNN and other regression algorithms, obtaining the highest MAPE of 4.72%. However, the weight determination was a concern of the combined model since the weight can be fixed and dynamic. Hence, further exploration with the optimization method is required to overcome the weight determination issue.
Wind energy forecasting is greatly affected by the sudden change in wind speed, resulting in major issues such as excessive operational expenses, higher reserve capacity, lower system reliability and the security of the power system. When the wind power output of any specific wind turbine or wind farm fluctuates over a predefined threshold value, those rates of change event are called ramp events. The predefined threshold value is normally 50% of the usual output. The conventional wind prediction approaches fail to capture the ramp event in wind speed time series. Hence, Dhiman et al. [74] proposed hybrid intelligent methods based on several SVR variants including Twin Support Vector Regression (TSVR), ε-Twin Support Vector Regression (ε-TSVR), Least Square Support Vector Regression (LS-SVR), and ε-SVR to solve the ramp event issue of wind power forecasting. The investigation was performed on the ramp events of five wind farms at various hub heights and comparative analysis was carried out based on the performance indices. The results demonstrated that the TSVR and ε-TSVR were effective in short-term wind power forecasting while ε-TSVR exhibited low computation speed and LS-SVR achieved the minimum central processing unit (CPU) time. Additionally, ε-TSVR was dominant to TSVR, LS-SVR and ε-SVR with regard to an absolute error during wind power ramp events.
Another study by Dhiman et al. [75] developed a hybrid forecasting approach in wind power applications comprising Wavelet Transform (WT) and various forms of SVR algorithm such as LS-SVR and ε-SVR, TSVR and ε-TSVR. WT was utilized to remove the stochastic volatility from the raw wind speed dataset. The authors trained and tested all these approaches with the dataset from a wind firm in Spain. The analysis revealed that WT-based ε-TSVR was the best regressor while LS-SVR obtained the lowest CPU time among all. In the future, the proposed hybrid algorithm may utilize the wavelet packet transform and hyperparameters optimization to achieve better efficient outcomes.
Another work by Dhiman and Deb [76] introduced a hybrid model based on discrete wavelet transform (DWT), TSVR, random forest (RF), and convolutional neural networks (CNN) to predict the wind power under the ramp events for hilly, offshore, and onshore zones. DWT assisted to extract features from wind speed. The outcomes illustrated that SVR was the most appropriate forecasting approach in comparison to other models and CNN provided better ramp event prediction for larger training datasets. The proposed SVR-based hybrid approach showed significant improvement in forecasting the ramp events, reducing RMSE by 4.87% and 17.88% in comparison to RF and TSVR models respectively. Furthermore, the randomness of the ramp event was evaluated for all wind firms by utilizing the log-energy entropy approach. The outcome demonstrated that EMD provided the least randomness compared to DWT. Apart from the aforementioned studies, the various ramp event prediction approaches are broadly discussed in the following book by Dhiman et al. [77].

2) RANDOM FOREST REGRESSION-BASED HYBRID APPROACH
Random forest (RF) is a supervised learning technique that can be utilized for both regression and classification applications. The operation principle of RF initially starts with the selection of a random sample from a given dataset. After, RF strategy makes decision trees based on sample datasets and afterward develops the forecast from each one of them. Then, the selection is executed based on each forecasted outcome. Finally, the highest voted outcome is chosen as the final forecast outcome [78], [79].
RF offers very high dimensional feature data and high training speed. Besides, RF balances the error for unbalanced data sets. The accuracy of RF can still be maintained even if a large part of the features is lost. Besides, RF illustrates better performance in certain noisy classification or regression problems. Nonetheless, RF creates a lot of trees and combines their outputs, thus requires much more computational power and resources [80], [81].
Sun et al. [82] suggested a hybrid intelligent method combining optimized RF and deep belief network (DBN) to forecast the multistep wind speed and wind power. Primarily, DBN was used to achieve short-term wind speed prediction and accordingly BAT algorithm was employed to update the parameters and improve the performance further. A weighted voting technique and a data-driven dimension reduction process were used in the prediction and training cycle, respectively to improve the RF computational capability, as depicted in Fig. 11. The effectiveness of the prosed hybrid method was verified by different experiments and results revealed that the optimized RF obtained RMSE, MAE, standard deviation (SD) and average percentage error (APE) of 71.85 m/s, 44.04 m/s, 70.21 m/s and 0.69%, respectively under different horizons. Nonetheless, the proposed approach has shortcomings of a longer learning duration with a large amount of data. Besides, it is difficult to determine the optimal number of layers, neurons and epochs by BAT algorithm simultaneously due to the varying parameters settings. Hence, in-depth exploration is required to overcome the abovementioned issues.

3) GAUSSIAN PROCESS REGRESSION-BASED HYBRID APPROACH
The Gaussian process regression (GPR) exhibits significant features, for example, marginal log-likelihood function expansion, logically tractable interpretation, explicit probabilistic formulation, and straightforward parameterization.
The training operation of GPR is initiated using the training data and hyperparameters. The hyperparameters of GPR are selected based on the conjugant inclination strategy that decreases the negative marginal log-likelihood operation [32]. Lastly, the outcome is determined by assessing the variance and mean difference of distribution which can be determined by, where . µ * is the forecasting output. I denotes the identity matrix. y is the training dataset output. K represents the kernel matrix. K + σ 2 n I is the reversal matrix which is estimated using the marginal loglikelihood function and its gradient.
GPR can provide accurate and robust predictions against own vulnerability. Nonetheless, GPR loses efficiency in high dimensional spaces when the number of features exceeds a few dozen. As wind speed is unpredictable, GPR is introduced recently to address the irregularity of wind power. Nonetheless, the drawback of the GPR model is its inability to adjust to time-varying frameworks and calculational difficulty [83].
Liu et al. [84] introduced a novel hybrid technique based on the combination of multiple imputation techniques and GPR algorithm under missing data scenario. The expectationmaximization calculation was utilized to tackle the missing information. A new dataset was produced by utilizing multiple imputation techniques. The GPR approach was developed to execute the wind power forecasts for every individual of these datasets. A final predictive strategy was designed by utilizing the ensemble model. The outcomes demonstrated that the proposed GPR-based hybrid strategy was dominant to SVM and MLP technique for wind energy forecast for both incomplete datasets with missing information and complete datasets without missing information. As the performance of this model is far better than other models, hence this proposed technique can be investigated for other renewables prediction applications.
Yan et al. [85] proposed a hybrid method to address the unpredictable nature of wind speed using the teaching learning (TL) optimized GPR approach. This novel hybrid strategy demonstrated to be suitable for wind power forecasting in terms of high accuracy, low computational burden and fast convergence speed. The operational process of TL-based GPR is displayed in Fig. 12. The accuracy and stability of the proposed approach can be further analyzed under the uncertainty propagation with multi-step forecasting.
Dong et al. [86] proposed a hybrid wind power forecasting model based on GPR with Bernstein polynomial. The EMD was used to decompose the actual wind power data series into several IMD. After Bernstein polynomial-based GPR was employed to forecast the wind power. Finally, a multiobjective state transition algorithm was established to find the ideal parameters of the hybrid technique. The validation of the proposed hybrid model was conducted through comprehensive experiments and wind data collected from the wind farm, China. The hybrid model obtained RMSE and MAE of 0.4922, and 0.3016, respectively. The experimental results illustrated that the proposed hybrid model achieved accurate and stable forecasting results in comparison to other popular forecasting models. Hence, further investigation is required to utilize the proposed algorithm under different forecasting variables and geographical locations.
Hu et al. [87] suggested a short-term wind power forecast using a hybrid method including numerical weather prediction, spatial correlation (SC) and GPR technique. Firstly, the optimal combination of different kernel functionsbased GPR models was developed. After, an automatic relevance determination algorithm was used to revise the errors in the primary numerical weather prediction. Then, data were extracted using the SC technique. Finally, reliable and FIGURE 11. Training operation of RF to predict wind power [82].
accurate wind power was forecasted using the revised numerical weather prediction and SC integrated with GPR. The proposed hybrid approach improved the forecasting accuracy by 10.88-37.49% under different seasons. In future research, the problem of automatic scene division of complex input data can be addressed by the proposed hybrid approaches. The summary of classification and regressionbased hybrid approaches for wind power forecasting is presented in Table 2.

C. DEEP LEARNING-BASED HYBRID APPROACH
Deep learning is new and advanced AI strategy that utilizes the various layers to dynamically separate high-level characteristics from the raw input data. Deep Learning algorithms-based hybrid approaches have become popular in renewable power predictions due to their high accuracy, generalization and strong computation capability.

1) LONG SHORT-TERM MEMORY-BASED HYBRID APPROACH
The long short-term memory (LSTM) algorithm is appropriate to forecast, process, and classify time-series given delays of unknown intervals. The performance of conventional recurrent neural networks (RNNs) has shortcomings of the vanishing or exploding gradient during backpropagation training. To handle this problem, LSTM can capture the long-term conditions through the usage of memory units rather than conventional hidden layers [88]. The structure of the LSTM memory unit includes a series of gates including forget gate, input gate, output gate and memory units connected through nodes. The execution of LSTM cell and output computation can be written as, where c k is the unit memory. h k is the hidden state. x k is the input. b is the bias parameter. U and W represent the weight matrices. o k , f k , and i k denote the activation function of the output gate, forget gate, and input gate respectively. σ h , σ c and σ g denote the hyperbolic tangent, hyperbolic tangent and sigmoid respectively. LSTM is a very powerful technique that can be ideal for wind power prediction. Relative insensitivity toward gap length is an advantage of LSTM over RNNs, concealed Markov algorithms and other group learning strategies in various real-time applications. Another advantage of LSTM cell compared to a typical intermittent unit is its cell memory unit. One of the limitations of LSTM is that there is no memory associated with the model which causes problems for sequential data, like text or time series [89], [90].
Chen et al. [91] proposed a hybrid forecasting model for short-term wind power generation based on LSTM. The potential feature set was selected using the EEMD where the original wind sequence was divided into several intrinsic mode functions (IMD). Then the appropriate sub-feature was chosen using the genetic algorithm (GA). The proposed hybrid approach demonstrated superiority under large-scale wind dataset and achieved RMSE, MAE and MAPE of 0.1337 m/s, 0.057 m/s and 1.0662%, respectively. Nevertheless, there are a few research gaps that need to be explored in the future including comparative analysis with other optimization algorithms, the inclusion of more features and validation under different resolutions with other large-scale wind datasets.
Yuan et al. [92] evaluated the interval of wind power forecast using the hybrid technique with Beta distribution based LSTM. PSO was used to optimize the parameters of Beta distribution. A detailed comparative analysis was carried out among the proposed Beta-PSO-LSTM model and other methods including the Beta-LSTM, Beta-PSO-BP model and LSSVM model. The results indicated that the proposed hybrid model achieved accurate and reliable results, indicating prediction interval coverage probability (PICP), average bandwidth ( P), index (F) and sharpness (Sα) of 95%, 540 KW, 4.32 and 84, respectively. The proposed hybrid technique can be applied in optimal scheduling and uncertainty of wind power in further research works.
Shahid et al. [93] predicted the short-term wind speed forecast using the hybrid intelligent method using GA optimized LSTM approach. GA was used to find the optimal values of neurons and window size in the LSTM network. The proposed hybrid model delivered accurate and robust predictions of wind power, improving the prediction accuracy from 6% to 30% in comparison to the existing techniques. Figure 13 denotes the working principle of the proposed GA-LSTM model. Although the LSTM model provides a better solution for learning hyper-parameters, it has two limitations including time and computational resources. Hence, in-depth investigation is required to address the abovementioned issues.
Liang et al. [94] designed intelligent wind speed prediction models based on bidirectional (Bi)-LSTM based hybrid techniques using the wind farm historical data under different characteristic parameters. Transfer learning was used to enhance the learning efficiency of the model. After, a new multi-objective optimization algorithm was employed to estimate the optimal values of weight. The experimental report demonstrated that the proposed hybrid model produced better prediction results than other methods with MAPE of 2.3604%. Although the proposed model illustrated enhanced prediction results, it has shortcomings of slow convergence speed. Hence, further investigation is required to overcome the slow convergence issue.
Jaseena and Kovoor [95] suggested a decomposition strategy-based hybrid forecasting approach for wind power generation using Bi-LSTM technique. The authors used the EMD, EEMD, WT and Empirical Wavelet Transform (EWT) to denoise wind speed data into several high and lowfrequency signals. The analysis revealed that the EWT based Bi-LSTM delivered more accurate and stable outcomes  Bi-LSTM models offered excellent prediction results as opposed to other methods. Nonetheless, further investigation is required to reduce the computational complexity issues.
Memarzadeh and Keynia [96] proposed an improved shortterm wind speed forecasting method integrating LSTM, feature selection (FS), WT, and crow search algorithm (CSA). The entropy and mutual information techniques were employed to find the appropriate feature selection. The variation characteristics of wind speed were addressed by WT. The ideal structure of LSTM was optimized by CSA to determine the key parameters including learning rate and batch size. The proposed hybrid approach outperformed other methods including basic LSTM, WT-FS-LSTM and WT-FS-LSTM-PSO. The results illustrated higher forecasting accuracy, indicting MAE, MAPE and RMSE of 0.189 m/s, 2.588 m/s and 0.259 m/s, respectively. Due to the accurate and reliable forecasting outcomes, the proposed approach can be applied to forecast the power system including load, price and reserve.
Hu and Chen [97] examined the wind power prediction using a nonlinear hybrid approach integrating LSTM, ELM and differential evolution (DE) algorithm. Firstly, the hysteretic character-based modified ELM was used to enhance the forecasting accuracy. Secondly, DE was employed to search for the optimal values of hidden layers and neurons of LSTM. The effectiveness of the proposed hybrid model was verified using two case studies under various experimental cases based on various performance indices. The RMSE of the proposed model in case study 1 and 2 was estimated to be 0.658 m/s and 1.596 m/s, respectively. The performance of the proposed hybrid approach can be further examined using multistep ahead wind speed forecasting with more interrelated features.

2) CONVOLUTIONAL NEURAL NETWORK -BASED HYBRID APPROACH
In the family of AI, the CNN is used for image recognition and processing. The CNN is a scheme of software and/or hardware modeled according to the working principle of neurons as per the human brain. The CNN algorithms are constructed using several neurons along with learnable biases and weights. Every single neuron receives multiple inputs which are proceeded as the weighted sum of them. After, they are carried through the activation function and accordingly they respond with an output [98]. The structure of CNN is shown in Fig. 14.
A CNN contains a hierarchical architecture that starts from the input signal x i which is expressed as, where W i denotes the linear operator and ρ represents the non-linearity function. Usually, in CNN, W i operates as convolution, and ρ represents as the sigmoid function.
There are several advantages of CNN algorithm that make it viable than other methods. The primary benefit of CNN is that it can detect the essential characteristics automatically and can operate without any human supervision. Besides, the computational capability of CNN is very efficient. However, CNN consists of several layers that take a lot of time during training and testing procedures.
Ju et al. [99] proposed an advanced shot-term wind energy forecasting method based on CNN and LightGBM classification algorithm. At first, a new feature set was produced through the proper examination of the raw time-series dataset. Subsequently, CNN was employed to extract the useful data from the input dataset. After CNN was combined with Light-GBM algorithm to overcome the constraints of the singleconvolution technique. Lastly, the outcome of the proposed hybrid method was compared with single CNN, SVM, Light-GBM and the fusion model. The results revealed that the proposed model had better effectiveness in terms of efficiency and accuracy, improving MAE and MSE by 35% and 28.5% in comparison to SVM algorithm. The methodological framework of the proposed hybrid approach is denoted in Fig. 15. Along with the advantages, the proposed algorithm exhibited some limitations with regard to model robustness which was challenging to achieve due to the abnormal data and false data.
Yildiz et al. [100] proposed a novel two-step method for wind energy forecasting based on the deep learning technique. Primarily, the feature extraction and conversion of features into images were executed using the variational mode decomposition (VMD) technique. Subsequently, an enhanced residual-based deep CNN was used to predict wind energy. The recommended hybrid CNN model ensured promising outcomes in short-term wind energy prediction compared with several deep learning frameworks including AlexNet, GoogLeNet, ResNet-18, VGG-16, SqueezeNet, indicating RMSE, MAE and MAPE of 0.0499 m/s, 0.0376 m/s and 0.2535 m/s, respectively. Since the proposed model performed better, the researcher can extend the investigation in various decomposition methods to further enhance the prediction accuracy.

3) DEEP BELIEF NETWORK-BASED HYBRID APPROACH
A deep belief network (DBN) is designed using the various layers of hidden units that are connected among the layers instead of connected among units inside each layer. Greedy learning algorithms are utilized to pre-train DBN and provide the optimal weight vectors of each layer [101]. The learning happens on a layer-by-layer premise and each layer receives an alternate adaptation of the information, and each layer utilizes the outcome from the past layer.
The structure of a DBN is formed using stacked Restricted Boltzmann Machine (RBM) and a regression layer. The input dataset characteristics are extracted by the RBM and therefore, the output is assessed by the regression layer. The DBN VOLUME 9, 2021 FIGURE 16. The structure of DBN for wind power prediction [102].
is structured utilizing a hidden layer and a visible layer, as shown in Fig. 16.
The RBM power capacity is calculated as follows [73], (19) where The weight is denoted as w j,i . i presents the visible unit and j denotes the hidden unit. θ = w ij , a i , b j represents the parameters vectors of RBF. The numerical conditions of the contingent probabilities and probability principle in the hidden layer and visible layer can be written as, DBN has numerous layers as well as complex presentation of input datasets which are suitable for unsupervised learning techniques. Besides, DBN can be fine-tuned for a specific assignment in a supervised style. Moreover, DBN is powerful to address non-linear fitting issues. Nonetheless, DBN has the weakness of having a complex structure with a hidden layer and visible layer. Lin et al. [103] proposed GA-based DBN approach using both time series data and multivariate regression data for wind power prediction. The SARIMA strategy and the LSSVR for time series GA (LSSVR-TS-GA) were utilized to predict wind speed in a time series, and the LSSVR with GA (LSSVR-GA) and DBN-GA strategies were utilized to forecast wind speed in a multi-variate format. The flowchart of the proposed wind speed forecasting approach is shown in Fig. 17. The experiential outcomes demonstrated that the wind speed prediction by the DBN-GA techniques was superior to other forecasting strategies, achieving RMSE and MAPE of 0.710 m/s and 15.597%, respectively. To increase the forecasting accuracy, landforms characteristics can be utilized. Moreover, the hidden layers and nodes in the fitness function of GA may be utilized to select the appropriate DBN structure.
Hu et al. [104] introduced an enhanced hybrid wind forecasting method using DBN, SC, adaptive learning technique and sliding window strategy. Gaussian-Bernoulli restricted Boltzmann machine technique of DBN and adaptive learning technique were implemented to enhance the convergence speed. PCA was used to extract the high dimensional actual dataset. Moreover, the fundamental features were extracted from the high-dimensional actual dataset by PCA. The training data of the forecasting method was updated using the sliding window strategy. Figure 18 presents the flowchart of the proposed hybrid technique for wind power forecast. The experimental outcomes ensured accurate prediction of wind power forecast compared to the conventional DBN, enhancing the estimation precision by 15.8975%. However, further studies are required to solve the redundancy problems, computational complexity and lower forecasting accuracy.
Wang et al. [105] developed a hybrid forecasting method integrating DBN with WT and spine quantile regression (QR) approaches. The wind power raw data was decomposed into various frequency series data by WT. The forecasting accuracy of wind speed was improved by nonlinear features extracted through the layer-wise pre-training of DBN. Then, the wind speed uncertainties were evaluated by the QR method. The proposed hybrid approach improved MAE, RMSE and MAPE by 48.99%, 50.13% and 42.48%, respectively in comparison to the FFBPNN method. Since the proposed hybrid method assured superior high accuracy under different seasons and prediction horizons, the method can also be applied in electric power and energy systems. The summary of deep learning-based hybrid approaches for wind power forecasting is presented in Table 3.

D. RULE-BASED ALGORITHM-BASED HYBRID APPROACHES
The principal rule-based algorithm is based on either heuristics or human expertise. The rule-based algorithm exhibits simplicity and flexibility; nonetheless, they need substantial calibration attempts and optimal control to attain a satisfactory outcome. Besides, they have a complex structure and need a time-consuming computation process.

1) ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM -BASED HYBRID APPROACHES
The adaptive neuro-fuzzy inference system (ANFIS) is designed using the advantageous feature of fuzzy inference VOLUME 9, 2021 FIGURE 19. The five-layered ANFIS architecture for wind power prediction [107]. systems (FIS) and neural networks. The ANFIS maps the input dataset and outcome by utilizing the associated parameters and membership functions (MF). For instance, x and y are considered as input variables and z is considered as the outcome. The first order If-Then fuzzy rules are expressed by, First rule: f 1 = p 1 x + q 1 y + r 1 (22) Second rule: f 2 = p 2 x + q 2 y + r 2 (23) where x is A 2 and y is B 2 , where MFs A 1 , A 2 are the inputs for x and B 1 , B 2 are the inputs for y. On the other hand, p 1 , q 1 , r 1 and p 2 , q 2 , r 2 are the related parameters [106]. The five-layered ANFIS architecture is shown in Fig. 19. The mathematical presentations of ANFIS in five stages are as below, where µ A and µ B denote the bell-shaped fuzzy MFs. a, b and c represents the characteristics that can change the size of fuzzy MFs. w i is the firing strength;w i is the normalization firing strength. f i stands for the first-order Sugeno model. p i , q i and r i denote the shape characteristics. The ANFIS has the fast-learning capacity, variation ability and capability to capture the non-linear characteristics. However, ANFIS experiences problems with huge input data that results in high computational costs.
Liu et al. [108] introduced a new hybrid technique for short-term wind energy prediction, combining ANFIS, FFBPNN, RBFNN and LSSVM methods. The Pearson correlation coefficient (PCC) was applied in the dataset preprocessing technique for choosing the appropriate input features. Firstly, the individual statistical prediction models; FFBPNN, RBFNN and LSSVM were used to obtain the forecasted power values. Secondly, ANFIS algorithm was utilized to combine the predicted energy of three individual algorithms and yielded the last predicted wind energy. The proposed hybrid model illustrated significant improvement in wind power forecast, reducing RMSE of 30.27%, 21.32% and 45.35% compared to FFBPNN, RBFNN and LSSVM methods. Further analysis can be extended using the advanced data preprocessing method with numerical weather prediction data.
Moreno and Coelho [109] proposed a hybrid method for wind power forecast based on ANFIS and Singular Spectrum Analysis (SSA). Initially, the SSA was used to decompose the actual wind speed into different features. Accordingly, two types of the dataset were prepared, one is the preprocessed data of actual wind time-series and the other was the remaining features clustered as noise. The ANFIS technique was employed to both of these datasets to evaluate wind speed in the next phase. The proposed hybrid model was compared with group method data handling (GMDH), Fuzzy c-means method (FCM) methods and obtained the best results with MAE, RMSE and R 2 of 0.2405 m/s, 0.452 m/s and 0.9687%, respectively in one-step wind power prediction. Further research can be extended by applying SSA to GMDH algorithm. Moreover, the SSA-ANFIS-FCM method can be developed and compared with the proposed approach to achieve improved wind power forecasting results.
Adedeji et al. [110] introduced the PSO-optimized ANFIS approach for short-term wind power forecasting. Three clustering strategies including fuzzy-c-means (FCM), subtractive clustering (SC) and grid partitioning (GP) were utilized in the hybrid PSO-based ANFIS model. The results revealed that the proposed approach clustered with SC delivered the best results among the three hybrid models with RMSE, MAPE and computational time of 0.127%, 28.11% and 30.23 s respectively. Although the ANFIS tuned with PSO enhanced the model accuracy, it had a limitation of lengthy computational time. Thus, there should be a trade-off between accuracy and computational time.
Zheng et al. [111] proposed a hybrid technique for shortterm wind power prediction based on ANFIS integrated with GA and PSO, as shown in Fig. 20. The effectiveness of the proposed hybrid method was tested using a case study of microgrid framework in Beijing with real-world data of wind energy production and climate conditions. The performance of the proposed method was compared with conventional ANFIS, FFBPNN, GA based FFBPNN methods. The GA-PSO-ANFIS demonstrated excellent forecasting results, achieving an average MAPE of 6.64% over the four seasons. Moreover, MAE was noted to be 45.73 in the proposed hybrid approach, while it was 47.51, 50.66 and 49.9 in ANFIS, FFBPNN, GA based FFBPNN methods, respectively.
The authors in [112] developed ANFIS based hybrid models for long-term wind power prediction at four different locations. The different optimization algorithms were combined with ANFIS including GA, PSO and DE to tune the MF. The proposed hybrid models were trained and tested with different data sizes collected from meteorological stations. The results indicated that GA-based ANFIS and PSO-based ANFIS delivered accurate forecasting results that outperformed the standalone ANFIS and DE-based ANFIS methods. Future works can be carried out using other optimization algorithms using the real-world wind dataset under different locations.

2) TYPE-2 FUZZY-BASED HYBRID APPROACH
Sharifian et al. [113] proposed a new intelligence model to predict medium and long-term wind speed accurately based on PSO optimized Type-2 fuzzy neural network (T2FNN). T2FNN combines both neural network learning and expert knowledge of the fuzzy system for precise wind power prediction, as depicted in Fig. 21. The optimal parameters of T2FNN were determined by PSO during the training phase. The proposed T2FNN-PSO model can suitably address the uncertainties datasets based on supervisory control and data acquisition (SCADA) system and numerical weather prediction (NWP) tools. The values of the MAPE and the RMSE were estimated to be 13.37% and 3.35% respectively. The structure of the proposed model was kept simple to reduce the computational time in the training phase. Therefore, this proposed method can be utilized for precise wind energy forecasting and can provide a practical solution to the power system control centers. Table 4 depicts the summary of rule-based hybrid approaches for wind energy forecasting.

IV. IMPLEMENTATIONS OF AI-BASED HYBRID WIND POWER FORECASTING
This section narrates the different factors in executing the hybrid AI approaches including data development, algorithm functions, hyperparameters adjustment, validation and verifications which are discussed below.
A. DATA PREPARATION 1) DATA PREPROCESSING Data preprocessing is used to preprocess the original signal that can increase the forecasting accuracy. Several techniques can be employed in data preprocessing steps in wind power prediction such as data division, data decomposition, data standardization, data normalization. Generally, the dataset is divided into two sets; training and testing. Wang et al. [114] applied the 10-fold cross-validation to enhance the diversity of the training subset as well as validate the performance evaluation in hybrid wind power forecasting. Data decomposition decomposes a high-dimensional dataset into several low-dimensional sub-datasets and is often employed in signal processing problems. Liu et al. [115] employed the four signal decomposing algorithms in the multiple-step wind speed forecasting. It was reported that the Fast EEMD achieved excellent in the three-step forecasting results while the Wavelet Packet Decomposition delivered accurate solutions in the one-step and two-step forecasting results. Data normalization is used to convert the dataset in different scales so that all the variables in the input dataset can proceed to the AI model under the same scales. Zameer et al. [116] used the normalization approach to adjust the input measurements of wind energy between zero and one. Data standardization converts the different sizes of data to the same scale size while using Z-Score values to evaluate the data scales. Manero et al. [117] used the z-standardization technique for data normalization in forecasting wind energy using the deep learning method.

2) DATA FILTERING
The data filtering can improve the accuracy and eliminate systematic errors in wind power prediction. The Kalman filter (KF) is an effective technique that can adapt to any change in observations leading to reduce the uncertainty of weather prediction. Generally, KF uses a group of mathematical equations to merge recursively observations and mitigate the corresponding biases resulting in optimal solutions. Several studies have illustrated that KF has enhanced the accuracy of wind power prediction. For instance, Louka et al. [118] employed KF to reduce the learning time and increase the performance in long-term wind power forecasting. A work by [119] applied KF to improve wind power forecasting by obtaining lower error rates and stable evolution. The authors in [120] used KF not only to assess the wind speed forecasting error but also to reduce the systematic errors.

3) DATA SAMPLING, DOWNSCALING AND OUTLINER DETECTION
The data sampling duration and sampling rate in wind power prediction should not be either too short or too long. The short interval of wind data cannot provide sufficient information for training the AI models while the long interval cannot be representative of a large wind dataset. A work by [121] evaluated wind power forecasting using the training dataset from the periods between 3 months and 2.5 years. The results demonstrated that the results were quite similar with the dataset longer than one year; however, accuracy declined with the dataset longer than two years and shorter than one year. Data downscaling is an effective strategy to enhance the quality of weather prediction data leading to elevate the accuracy in wind power forecasting. The downscaled weather prediction data employs higherresolution computations for wind speed estimation at wind turbines location that eventually enhances the accuracy of wind power prediction. The authors in [122] used downscaling technique to improve the proposed model resolution to 7 km, thus reducing the wind power prediction error. Outliers of supervisory control and data acquisition (SCADA) data can be caused by non-calibration of sensors leading to deliver inaccurate results in wind power prediction [123]. A work in [124] introduced an effective approach for processing raw SCADA data toward a reliable and cost-effective wind turbine condition monitoring development. In [125], the GPR was employed to identify and eliminate outliers from SCADA data, indicating a reduction of RMSE by 25% in comparison to standard forecasting methods. In [23], a deep learning approach was utilized to mitigate outliners from SCADA data resulting in accurate wind power prediction.

B. FEATURE SELECTION
Selecting the right combination of input features is one of the key factors in improving the accuracy of AI-based wind power forecasting. Numerous input features have been used in the reviewed literature for wind power forecasting such as wind speed, wind power, temperature, pressure, humidity, wind direction, location, blade pitch angle etc. Among them, wind speed is the most widely used input parameter for wind power forecasting. The authors in [126] found that wind speed along with location was the most important influencing parameter for wind forecasting accuracy enhancement using the LSTM and Gaussian mixture model. Another literature [127] reported that wind speed and wind detection were very sensitive to wind power prediction using a multilayer perceptron (MLP) network. The authors in [128] considered wind speeds, blade pitch angles, temperature and nacelle orientation to predict wind power at various heights and wind shear using deep learning neural network. The results indicated that blade pitch angle was critical in wind power generation. A work by [129] revealed that wind speed along with wind power density and power output increased the wind power prediction accuracy. A study by [130] illustrated that solar radiation, humidity and temperature increased the wind power prediction by 0.3%.

C. ALGORITHM FUNCTIONS AND HYPERPARAMETERS ADJUSTMENT 1) TRAINING AND TESTING OPERATION
Generally, real wind farm dataset collected from the metrological station are used for model performance assessment and validation. The developed model is trained and tested with a different subset of dataset. For instance, in [55], the authors executed the training and testing operation with data collected during the different timeframe. The 6057 samples obtained from 1 May 2014 to 21 June 2014 were employed for training operation while data samples collected from 22 June 2014 to 1 July 2014 were used to execute the testing operation. In [110], the authors verified the model performance by dividing the six-month wind dataset into 70:30 ratio to execute the training and testing operation, respectively. In [58], the authors divided the dataset into two subsets in which 2/3 data were used for training while the rest 1/3 data were used for testing. In [80], the authors divided the whole dataset into training, verification and testing based on 6:2:1 ratio. In [99], the dataset was separated into 8:1:1 ratio where 80% data were used for training, 10% data used for validation and 10% data used for testing purposes. In [72], the authors used 1000 datasets with a sampling interval of 10 minutes and accordingly 700 data points were used for training and 300 data points were used for testing.

2) ACTIVATION FUNCTIONS
The different training algorithms and activation functions are employed to operate the AI-based hybrid algorithms including Sigmoid, Gaussian, rectified linear unit function (ReLU) and hyperbolic tangent function (tanh). For instance, FFBPNN [53], ELM [62] and DBN [103] use Backpropagation and Sigmoid function while RBFNN [53] uses stochastic gradient and Gaussian activation function to execute the operation. In SVM [67], the Logistic regression, functional margin and Radial basis kernel functions are utilized. In RF [84], Bootstrap aggregating, nonlinear and differentiable functions are used for training and testing purposes. The squared exponential kernel, marginal loglikelihood function and Kernel function are employed to implement GPR [86]. The training algorithm and testing activation functions of LSTM [92] and CNN [99] include Gradient descent-based backpropagation and Sigmoid, tanh functions respectively.

3) HYPERPARAMETERS ADJUSTMENT
The selection of appropriate hyperparameters has a substantial impact on wind power generation forecasting. The wrong combination of hyperparameters results in inaccurate results in wind forecasting. The trial and error approach is inefficient and needs lots of time. Thus, optimization is employed to find the best values of hyperparameters. Normally, the forecasting error functions are used as the objective functions in optimization. Numerous optimization algorithms have been employed to find the hyperparameters of AI models in wind power forecasting. Wu et al. [131] used multi-objective grey wolf optimization to update the weight and threshold of ELM model. Li et al. [132] introduced the improved dragonfly algorithm to find the weights, position vector, step vector of SVM method. Lin et al. [103] used GA to improve the wind power forecasting accuracy by optimizing the momentum and learning rate of DBN. Sun et al. [133] proposed BSA to obtain the optimal values of input weights and hidden thresholds of Regularized ELM. Li and Jin [134] employed multi-objective PSO to determine the optimal parameters of the Least squares support vector machine. Salcedo-Sanz et al. [135] utilized the CRO algorithm to obtain the best values of hidden neurons of ELM. Sameer et al. [116] designed GA optimized RBFNN for wind power forecasting with the appropriate values of center vector and spread of Gaussian function.

D. VALIDATION AND VERIFICATION 1) COMPUTATIONAL COST
The computational cost for wind energy forecasting is often regarded as the duration required for training and testing the AI model. The computational cost depends on the volume of data, data acquisition rate, AI model complexity, training algorithms, functions of AI model as well as host computing power. The computational cost is crucial that can help to find the ideal prediction model and suitability of AI models in real-time, especially for short-term wind power forecasting. Among the numerous AI models mentioned in the literature, ELM model in wind power forecasting has a lower computational cost in comparison to FFBPNN and RBFNN approaches [136]. In [120], the authors predicted short-term wind power forecasting based on a statistical model with 72-h period simulations. The proposed model achieved a computational cost of approximately 60-70 min that could be employed in real-time execution. In [127], the authors completed the training and testing operation of MLP network in 30 min using the two months duration dataset and 10 min sampling rate. It was also reported that a faster training operation can be executed for each turbine rather than the wind farm using a separate neural network due to the reduced size and complexity of the network. In [128], the authors predicted the wind energy based on the deep learning algorithm using high-frequency SCADA data with a computational cost of 0.77 minutes.

2) FORECASTING EVALUATION INDICATORS
The testing and validation phase of AI-driven hybrid models for wind power forecasting is based on some standard performance evaluation matrices. Different literatures have reported different forecasting evaluation indicators to assess the performance of AI-based hybrid techniques. The most frequently used statistical error terms include MAE, MAPE, MSE, RMSE and R 2 . The mathematical expressions of these evaluation criteria are shown in the following equations, where N sample is the number of datapoints. P measured and P predicted denote the measured value and predicted value, respectively.

V. ISSUES AND CHALLENGES
Although hybrid AI algorithms have provided important contributions toward accurate wind power prediction, they have some shortcomings including algorithm framework, complexity and integration issues.

A. WIND DATA DIVERSITY
The challenges in applying the AI model in wind power prediction are the data variation and collection from different locations. The variation of the wind data due to storms, climate change and seasons results in inconsistency in the steady electricity generation that further affects the power system operations.

D. AI OPTIMIZATION INTEGRATION ISSUES
Although the integration of optimization into AI approaches has demonstrated significant contributions in achieving precise, robust and productive wind power forecasts, they have a few drawbacks concerning longer training duration and computational complexity. For example, GA is computationally time-consuming and costly [137] [137]. FOA can be trapped in a local minima value at the later advancement stage and has lower accuracy [138]. BSA needs a huge quantity of memory space for collecting various state principles [139] [140]. PSO can be trapped and converge prematurely into a local minimum [56], [57]. Moreover, the combination of AI and optimizations may deliver unacceptable results if the dimension, search capacity, and convergence settings are not assigned appropriately. Thus, the choice of appropriate optimization in the AI approach is the key issue to be explored.

E. AI HYBRIDIZATION ISSUES
The hybridization of AI algorithms has demonstrated superior performance in wind power prediction over a single AI algorithm. Generally, hybridization is designed by integrating AI method with other methods and strategies. For example, Yang and Chen [60] developed the AI-based hybrid method using ELM, SAE and EMD. In [61], the AI hybridization was formed using PCA, KMPE and ELM. Sewdien et al. [28] designed the hybrid AI model with RF and DBN. Jaseena and Kovoor [95] utilized EWT and Bi-LSTM to build a hybrid AI framework. Kartite and Cherkaoui [38] introduced a hybrid AI model using VMD and CNN. Generally, AI exhibits a complex computation process that needs high computing power. The hybridization of AI with another algorithm eventually leads to form complex configurations and increases the computational burden. Hence, a trade-off should be maintained between computational complexity and prediction accuracy.

VI. CONCLUSION AND FUTURE PROSPECTS
Wind energy has received significant attention in generating electricity either in standalone or grid-connected mode. However, due to the intermittency and stochastic nature, the assessment of wind energy potential strength, as well as wind power forecasting, becomes challenging. Hence, an advanced and efficient approach is necessary to achieve accurate wind power forecasting results. This review showcases the application of AI-based hybrid approaches in wind power forecast highlighting various techniques, implementation factors, issues and limitations. As a first contribution, this review explores the recent progress of AI-based hybrid approaches for wind power forecasting highlighting their mathematical expressions, model developments, benefits and drawbacks. Also, the classification of AI-based hybrid approaches is provided and accordingly the comparative analysis is performed based on time resolution, parameters used, accuracy and research limitations. Although several notable AI-based hybrid wind power forecasting methods are reported in this review, all the hybrid approaches are not good for the entire length of prediction. Few of them are excellent in the short term, while others are effective either in medium-term or long-term predictions. ELM with CRO and HS, Jaya algorithm with SVM, LSSVM with BSA, SVM with ICS, DBN with RF, GPR with SC, and EEMD with IMD combinations achieve satisfactory outcomes in short-term wind power prediction. In contrast, T2FNN model is effective in medium-term forecasting while ELM with Grey model, LSTM with RNN, ANFIS with GA, PSO, and DE illustrate better accuracy in long-term prediction. As a second contribution, the study highlights the various implementation factors toward the development of hybrid AI approaches. Various influential factors are discussed concerning data preprocessing methods, sampling, downscaling, feature selection, algorithm functions, hyperparameters adjustment, computational cost and performance indices. As a third contribution, the existing challenges and issues are explored such as wind data diversity, algorithm structure, implementation, hyperparameter tuning, optimization integration issues and AI hybridization issues. As a fourth contribution, the review provides some effective proposals and future opportunities for the development of an efficient AI-based hybrid approach for wind power prediction which are presented below.
• The AI-based hybrid methods require a large pool of datasets as well as high computer configuration to predict wind power generation. Moreover, an effective data pre-processing strategy and error post-processing approach are needed to obtain high-quality wind data, leading to crate huge computational burden. Hence, further studies are required on eliminating noise from the raw data and developing a hybrid model with a lower computational burden. An effective balance should be made between extensive computation and accuracy.
• AI-based hybrid approaches demonstrate the bright integrity in forecasting wind power in on-shore locations. However, in recent times, wind-farm installations are shifted from onshore locations to offshore ones due to the availability of high wind velocity. The offshore wind turbine has different topography and weather patterns. Thus, further exploration is required to validate the hybrid wind power method with offshore meteorological data.
• The integration of optimization algorithms in AI has delivered substantial contributions in wind power forecasts. Nonetheless, the appropriate combination of optimizations and AI is a laborious task and could lead to inefficiency and higher computational cost. Therefore, further examination is required to search for more advanced combinations.
• The accuracy of wind power depends on various features and it is challenging to select the ideal combination of features. Thus, further attention is required to select the appropriate features for the development of hybrid approaches toward accurate wind power forecasting.
• Most of the authors assessed the performance of hybrid methods with recently developed models with various performance metrics. However, there has not been any reliable benchmarking approach for wind power prediction. Therefore, further investigation is required on developing appropriate standard and evaluation systems.
• The limitations of hybrid AI approaches are the long duration of training operation and calculation complexity. The training and validation operation become complex with the introduction of various features, data pre-processing, optimization and hyperparameters. Hence, further studies are required to improve the training execution.
• The accuracy and robustness of wind power forecasting using the AI-based hybrid approach in real-time can be enhanced by online wind measurement data through cloud computing platforms. Cloud computing technology helps in improving accuracy through the usage of storage, servers, databases, networking, and software. The wind speed, wind direction, temperature, atmospheric humidity and atmospheric pressure are constantly transferred to the main server. Subsequently, AI methods can be trained in real-time and accordingly can deliver better prediction results.
The outcomes of this research toward the enhancement of AI-based wind power forecasting are: • The information, analysis, critical discussion, issues and challenges would serve as a useful forum and guide for engineers, industries, decision-makers to encourage investments and carry out further research in wind energy.
• The information provided may help researchers to select the appropriate hybrid approaches that will improve the wind power forecast toward reducing carbon emissions and achieving the global decarbonization target by 2050.
• The suggestions offered would be significant in achieving accurate wind power forecasting using efficient AI-driven hybrid approaches that can obtain a pathway for future sustainable development goals (SDGs), specifically SDG7, by 2030.