Wind Speed Ensemble Forecasting Based on Deep Learning Using Adaptive Dynamic Optimization Algorithm

The development and deployment of an effective wind speed forecasting technology can improve the safety and stability of power systems with significant wind penetration. Due to the wind’s unpredictable and unstable qualities, accurate forecasting of wind speed and power is extremely challenging. Several algorithms were proposed for this purpose to improve the level of forecasting reliability. The Long Short-Term Memory (LSTM) network is a common method for making predictions based on time series data. This paper proposed a machine learning algorithm, called Adaptive Dynamic Particle Swarm Algorithm (AD-PSO) combined with Guided Whale Optimization Algorithm (Guided WOA), for wind speed ensemble forecasting. The AD-PSO-Guided WOA algorithm selects the optimal hyperparameters value of the LSTM deep learning model for forecasting of wind speed. In experiments, a wind power forecasting dataset is employed to predict hourly power generation up to forty-eight hours ahead at seven wind farms. This case study is taken from the Kaggle Global Energy Forecasting Competition 2012 in wind forecasting. The results demonstrated that the AD-PSO-Guided WOA algorithm provides high accuracy and outperforms several comparative optimization and deep learning algorithms. Different tests’ statistical analysis, including Wilcoxon’s rank-sum and one-way analysis of variance (ANOVA), confirms the accuracy of the presented algorithm.


I. INTRODUCTION
Due to the intermittence and unpredictability of wind power, the increasing penetration of wind power into power grids might significantly impact the safe functioning of power systems and power quality because the amount of wind energy The associate editor coordinating the review of this manuscript and approving it for publication was Dipankar Deb . generated is proportional to the wind speed. As a result, the development and deployment of an effective wind speed forecasting technology can be able to improve the safety and stability of power systems with significant wind penetration. Wind energy is one of the essential low-carbon energy technologies. It can deliver a long-term energy supply and serves as a core component for micro-grids as part of intelligent grid architecture [1]. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ However, wind power generation is stochastic and intermittent, posing several hurdles to its widespread adoption. With the aid of wind speed and power generation projections, it is possible to reduce energy balancing and make power generating scheduling and dispatch choices. In addition, forecasts can reduce costs involved by mini missing the demand for wind curtailments and, as a result, enhancing income in power market operations. Due to the wind's unpredictable and unstable qualities, however, accurate forecasting wind speed and power is extremely difficult. A wind power forecast predicts the projected output of one or more wind turbines, often known as a wind farm. When one talks about production, it usually refers to the amount of power that a wind farm can generate (with unit's kW or MW depending on the nominal capacity of the wind farm). By combining power production throughout each period, forecasts may also be stated in energy [2].
Offer essential information about the projected wind speed and power over the next several minutes, hours, or days is the primary purpose of forecasting wind speed and power. The prediction can be separated based on power system operation requirements into four distinct time frames: longterm (from one day to seven days), medium-term (from six hours to twenty-four hours), short-term (from thirty minutes to six hours), and extremely short-term (from few seconds to thirty minutes. Turbine control and load tracking are based on very short-term estimates. Preload sharing is based on the short-term forecast. The medium-term projections are used for power system management and energy trading. Maintenance schedules for wind turbines are based on long-term forecasts [3]. Wind speed is considered a non-linear and time-relevant forecasting problem. This encourages researchers to make use of the knowledge included in the wind's historical data. Based on time-series data, one of the common methods for making predictions is the long short-term memory (LSTM) network [4]. Marcos et al. in [5] addressed the problem of wind power forecasting based on statistical and numerical weather prediction model models. Two different areas in Brazil were Brazilian developments on the regional atmospheric modeling system is employed to simulate forecasts of seventy-two hours ahead of the wind speed, at every ten minutes.
Liu et al. in [6] employed backpropagation neural network (BPNN), least squares support vector machine (LSSVM), and radial basis function NN (RBFNN) methods to forecast a sixteen MW wind farm that is located in Sichuan, China, based on two months data size at fifteen minutes sampling rate. Recently, Lin et al. in [3] applied Isolation Forest (IF) and deep learning NN for SCADA data of a wind turbine in Scotland to address the problem of wind power forecasting based on data size of twelve months and one-second sampling rate. Another method based on IF and feed-forward NN is applied to a seven MW wind turbine in Scotland (ORE Catapult) using a data size of twelve months and a one-second sampling rate.
To capture the wind speed data's unsupervised temporal features, an interval probability distribution learning (IPDL) model based on rough set theory and restricted Boltzmann machines was proposed in [7]. To capture the wind speed time series data's probability distribution, the IPDL model had a set of interval latent variables which can be tuned. For the future wind speed values' supervised regression, a real-valued interval deep belief network (IDBN) was also designed based on a fuzzy type II inference system and the IPDL model. Khodayar et al. [8] proposed a deep neural network (DNN) architecture based on stacked auto-encoder (SAE) and stacked denoising auto-encoder (SDAE) for wind speed forecasting using short-term and ultra-short-term. The auto-encoders (AEs) are used by the authors in [8] for the unsupervised feature learning from the unlabeled data of the wind. In addition, a supervised regression layer was employed for wind speed forecasting at the top of the AEs.
Authors in [9] proposed a scalable graph convolutional deep learning (GCDL) architecture to learn the powerful Spatio-temporal features in the neighboring wind farms from data of wind direction and speed. The GCDL architecture leveraged the extracted temporal features to forecast the whole graph nodes' wind-speed time series. The rough set theory was incorporated with the GCDL architecture in their model. Authors in [10] proposed a framework based on an enhanced grasshopper optimization algorithm for optimizing the hyperparameters and architecture of the LSTM deep learning model for wind speed forecasting. Table 1 shows the recent wind power prediction methods.
Hybrid machine intelligence techniques were proposed recently in the literature for wind forecasting based on different models. Authors in [11] utilized various variants of Support Vector Regression (SVR) and wavelet transform to forecast short-term wind speed. They evaluated their proposed techniques using various performance indices to get the best regressor for wind forecasting applications. A hybrid technique was presented in [12] using learning algorithms such as Twin SVR (TSVR), Convolutional neural networks (CNN), and random forest, in addition to, discrete wavelet transform (DWT) for wind forecasting. The extracted features from wind speed in their work were enhanced based on the wavelet transform. Another hybrid technique was proposed for the anomaly detection problem for wind turbine gearbox in [13] using adaptive threshold and twin SVM (TWSVM) methods.
In this work, a dataset of wind power forecasting is tested as a case study from Kaggle Global Energy Forecasting Competition 2012-Wind Forecasting for predicting hourly power generation up to forty-eight hours ahead at seven different wind farms. A proposed adaptive dynamic particle swarm algorithm (AD-PSO) with a guided whale optimization algorithm (Guided WOA) improves the forecasting performance by enhancing the parameters of the LSTM classification method. The proposed AD-PSO-Guided WOA algorithm selects the value of the optimal hyperparameter of the LSTM deep learning model for forecasting purposes of wind speed. A binary-based AD-PSO-Guided WOA algorithm is used for the feature selection problem from the wind power forecasting dataset. The evaluation of the binary AD-PSO-Guided WOA algorithm is presented compared to Grey Wolf Optimizer (GWO) [18], Particle Swarm Optimization (PSO) [19], Stochastic Fractal Search (SFS) [20], WOA [21], [22], Genetic Algorithm (GA) [23], and Firefly Algorithm (FA) [24]. The optimized ensemble method based on the proposed algorithm is tested on the dataset. The results of this scenario are compared with Neural Networks (NN), Random Forest (RF), LSTM, Average ensemble, and k-Nearest Neighbors (k-NN) ensemble-based methods.
• An adaptive dynamic PSO with guided WOA algorithm (AD-PSO-Guided WOA) is suggested.
• A binary AD-PSO-Guided WOA algorithm is tested for the feature selection problem using the wind power forecasting dataset.
• Tests of one-sample t-test and ANOVA are used to evaluate the binary AD-PSO-Guided WOA algorithm's statistical difference.
• To improve the wind power forecasting accuracy, an optimized ensemble method using the AD-PSO-Guided WOA algorithm is proposed.
• Wilcoxon's rank-sum and ANOVA tests are used for evaluating the proposed optimizing ensemble method's statistical difference.
• The current work's importance is applying a new optimization algorithm to enhance LSTM classifier parameters.
• The proposed algorithms can be generalized and tested for other datasets.

II. PRELIMINARIES
A. MACHINE LEARNING 1) NEURAL NETWORKS (NNs) Artificial neural networks (ANNs) are a type of prediction model and classification approach. ANN is used to simulate complicated relationships of finding data patterns or causeand-effect variable sets. Transient detection, approximation, time-series prediction, and pattern recognition are just a few of the disciplines they may use. ANN is considered an information processing pattern that functions similarly to the human brain. This information processing system comprises highly linked processing pieces called neurons that work together to solve issues in tandem. When formulating an algorithmic solution, a neural network comes in handy and where it is necessary to extract the structure from existing data [34]. A Multilayer perceptron (MLP) has input, output and one hidden layer. The weighted sum for the node output value is computed as follows [35].
where I i represents an input variable i, the weight of connection between neuron j and input I i is represented as w ij . The β j parameter is a bias value. Based on using of the sigmoid activation function, the node j output is calculated as where the value of f j (S j ) is then used to get the network output as follows.
where the weights between output node k and neuron j in the hidden layer is defined as w jk and β k indicates the output layer bias value.

2) RANDOM FOREST (RF)
As a method based on statistical learning theory, random forests provide several advantages, including fewer VOLUME 9, 2021 configurable parameters, higher prediction precision, and improved generalization ability. It extracts numerous samples from the original sample using the bootstrap sampling approach, builds decision tree modeling based on each bootstrap sample, combines the predictions of multiple decision trees, and uses a voting mechanism to determine the outcome. For the RF training algorithm, the regression/classification tree f b is trained based on X b and Y b training examples for X = x 1 , . . . , x n and Y = y 1 , . . . , y n . For B times, let b = 1, . . . , B. After the process of training, the unseen samples predictions x is calculated by averaging all the predictions of individual regression trees on x as in equation 4.
The model's interpretability. The findings of the prediction algorithm using the k-nearest neighbor's technique are based on the previous events that are the most like the current state based on a given distance metric. A simple average of the output values of the k nearest neighbors, or any weighted averaging, is used to make predictions. Thus, experts can analyze the findings of the k-nearest neighbor's method. The object's predictable variable in the k-NN numerical prediction this number is the average of its k closest neighbors' values. The k-NN method is one of the basic and the most powerful machine learning algorithms.
The k-NN model employs a similarity measure, Euclidean distance, to compare the data. Between x train as training data and x test as testing data, calculations of the Euclidean distance are based on the following equation.
To predict the output variables, k-NN determines k training data close to testing data. For unknown testing data to be predicted, the k training data output value is determined to be the nearest neighbours. The following formula is applied for predicting the testing data.
where the jth neighbor weight is indicated as w j and it is adjusted by the observed data, for w j = j/n, for n indicates number of training data. This model can be used as a k-NN time series model.

B. DEEP LEARNING 1) LONG SHORT TERM MEMORY (LSTM)
The LSTM model is an improvement ANN model and it can be applied for different kinds of problems as discussed in [36]. The LSTM's main advantage is that it can remember the information for a long time. The LSTM architecture is shown in Figure 1. The first step of the LSTM model is to decide what kind of data from the cell state should be discarded. A forget gate layer or sigmoid layer is used for that as presented in equation 7.
The following step is to decide about the new data to be stored in the cell state. The sigmoid layer decides about the values that need an update, and the new candidate is added to the generated state by the tanh layer as presented in equations 8 and 9.
The C t−1 parameter, old cell state, is then updated into the C t parameter, new cell state, by equation 10 using equations 7, 8, and 9.
The final step is about the output decision. The sigmoid layer helps in deciding which cell state parts should be moved to output. The cell state will then use tanh and force values between [-1,1] and then multiply it with the sigmoid gate output as presented in equation 11.
The goal such approaches is to combine the capabilities of a variety of single base models to create a predictive model. This concept can be implemented in a variety of ways. For instance, key strategies rely on resampling the training set, while others rely on alternative prediction methods or modifying some predictive technique parameters. Finally, the result of each prediction is combined using an ensemble of approaches.

III. PROPOSED ADAPTIVE DYNAMIC PSO-GUIDED WOA ALGORITHM
This section discusses the presented AD-PSO-Guided WOA algorithm using adaptive dynamic technique, particle swarm algorithm, and modified whale optimization algorithm. Algorithm (1) shows the AD-PSO-Guided WOA algorithm.

A. ADAPTIVE DYNAMIC TECHNIQUE
After the initialization of the optimization algorithm and for each solution in the population, a fitness value is evaluated. For the best fitness value, the optimization algorithm then gets the relevant best agent (solution). To start the adaptive dynamic process, the optimization algorithm starts to split agents of the population into two different groups, as in Fig. 2. The two groups are named exploitation group and exploration group. The main target of the individuals in the exploitation group is to move toward the optimal or best solution, and the main target of the individuals in the exploration group is to search the area around the leaders. The change (update)  between the agents of the population groups is working in a dynamic manner. To achieve a balance between the exploitation group and exploration group, the optimization algorithm is initiated with a (50/50) population. Figure 3 explains the balancing and the dynamic change between the number of agents (individuals) in the two groups over different iteration until getting the best or optimal solution.

B. GUIDED WOA ALGORITHM
The WOA algorithm shows its advantages for different problems in the area of optimization. WOA is considered in the literature as one of the most effective optimization algorithms [20], [37]. However, it might suffer from a low capability of exploration [38]. For mathematical calculations, let's consider n to be the dimension or number of variables of the search space that whales will swim in. If it is considered that the agents (solutions) positions in the space search will be updated over time, the best solution of food will be found.
The following equation can be used in the WOA algorithm for the purpose of updating agents' positions.

13:
Update ( − → z ) by the exponential form of Update current search agent position as − → 15: end if 16: else 17: Update current search agent position as − → 18: end if 19: end for 20: Calculate fitness function F n for each − → X i from Guided WOA 21: else 22: Calculate fitness function F n for each − → X i from PSO 23: end if 24:
The term Guided WOA, in this work, indicates a modified version of the original WOA algorithm [37]. In Guided WOA, the drawback of the original WOA is alleviated by updating the search strategy through one agent. The modified algorithm moves the agents toward the prey or best solution based on more than one agent. Equation 12 in the original WOA algorithm forces agents to move randomly around each other to get the global search. In the Guided WOA algorithm, however, the exploration process is enhanced by forcing agents to follow three random agents instead of one. For forcing agents not to be affected by one leader position to get more exploration, equation 12 can be replaced by the following one.
where the three random solutions are represented in this equation by − → X rand1 , − → X rand2 , and − → X rand3 . The − → w 1 term value is updated in [0, 0.5]. The terms of − → w 2 and − → w 3 are changing in [0, 1]. Finally to smoothly the change between exploration and exploitation, the term − → z is decreasing exponentially instead of linearly and is calculated as follows.
where iteration number is represented as t, and Max iter represents the maximum number of iterations.

C. PARTICLE SWARM OPTIMIZATION
Unlike the WOA algorithm, the PSO algorithm simulates the social behaviour of a different kind of swarming pattern of flocks in nature such as birds [18]. The agents in the PSO algorithm search for the best solution or food according to the updated velocity by changing their positions. The algorithm uses particles (agents) and each agent follows these parameters: where the new agent position is indicated as x i t+1 . Updated velocity of each agent v i t+1 evaluated as in the following form.
where the term ω represents the inertia weight. The terms C 1 and C 2 indicate cognition and social learning factors. The G parameter represents the global best position and the values of r 1 and r 2 are within [0; 1].

D. PROPOSED ALGORITHM COMPLEXITY ANALYSIS
The AD-PSO-Guided WOA algorithm' complexity analysis is presented in this section based on Algorithm (1). Using population number indicated as n iterations number as M t , the complexity can be defined for each part of the algorithm as • Initializing of the population: O (1).

E. BINARY OPTIMIZER
For the feature selection problem, the output solution should be changed to a binary solution using 0 or 1. The sigmoid function is usually employed to change the continuous solution of the optimizer to a binary solution.
where the best position is indicated as X Best for t iteration. The Sigmoid function is used to help in changing the continuous values to be 0 or 1. For Sigmoid(X Best ) ≥ 0.5, the value will change to 1, otherwise, the value will be changed to be 0. Apply AD-PSO-Guided WOA algorithm 7: Change updated solution to binary solution (0 or 1) based on equation 17 8: Evaluate fitness function for each agent 9: Update parameters 10: Update best solution 11: end while 12: Return optimal solution Algorithm (2) shows the step by step explanation of the binary AD-PRS-Guided WOA Algorithm.

F. FITNESS FUNCTION
The solutions' quality of an optimizer is measured based on the assigned fitness function. The function is mainly depending on the error rate of classification/regression and the features that have been selected from the input dataset. The best solution is according to the set of features that can give a minimum features with a minimum classification error rate. The following equation is applied in this work for the evaluation of solutions' quality.
where the optimizer error rate is indicated as Err(O), the selected set of features is denoted as s, f represents total number of existing features. The α ∈ [0, 1], β = 1 − h 1 values are responsible of the classification error rate and the number of selected features.

IV. EXPERIMENTAL RESULTS
The experimental settings and results for wind power forecasting problems using the presented AD-PSO-Guided WOA algorithm are presented in this section. The dataset is first discussed, and then the experiments are divided into feature selection, ensemble, and comparison scenarios.

A. DATASET DESCRIPTION
A wind power forecasting dataset to predict hourly power generation up to forty-eight hours ahead at seven wind farms is tested in the experiments as a case study. The dataset is published on Kaggle as Global Energy Forecasting Competition 2012 -Wind Forecasting [39]. The presented AD-PSO-Guided WOA algorithm is applied in different scenarios to test the best available accuracy compared to algorithms in the literature. A statistical analysis of different tests is also applied to the tested dataset to show the algorithm's accuracy. Prediction of regression is shown in Fig. 4. The figure shows the actual values from the dataset and the predicted values based on the proposed AD-PSO-Guided WOA algorithm.

B. FEATURE SELECTION SCENARIO
The experiment in this scenario desired to show the feature selection efficiency by the proposed binary AD-PSO-Guided WOA algorithm. The binary AD-PSO-Guided WOA algorithm performance is compared with the binary version of GWO (bGWO) [18], binary PSO (bPSO) [19], binary SFS (bSFS) [20], binary WOA (bWOA) [21], [22], binary FA (bFA) [24], and binary GA (bGA) [23] using performance metrics shown in Table 2. The variables in Table 2 are indicated as follows. An optimizer number of runs is indicated as M , the best solution at the run number j is represented by g * j , size of the g * j vector is indicated as size(g * j ) , and the number of tested points is N . A classifier's output label for a point i is C i , a class's label for a point i is L i , the total number of features is D, and the Match function is used for calculating the matching   between two inputs. The metrics include average error and standard deviation fitness.
The AD-PSO-Guided WOA algorithm configuration setting in experiments is shown in Table 3. The AD-PSO-Guided WOA algorithm's initial parameters are the number of population equal 20, the maximum number of iterations is set to 20, and the number of runs equals 20 for the dataset. The main parameters for the PSO algorithm are W max and W min , which their values are set to 0.9 and 0.6, respectively. In addition, the α parameter is assigned to be (0.99) and β is assigned to be (1−α). The GWO, PSO, SFS, WOA, FA, and GA algorithms' configuration is shown in Table 4.
In this scenario, Table 5 shows the results provided by GWO, PSO, SFS, WOA, FA, and GA algorithms. The AD-PSO-Guided WOA algorithm shows a minimum average error of (0.4790) for feature selection for the presented    results. The AD-PSO-Guided WOA algorithm, based on the minimum error of the tested problem, is the best and the SFS algorithm is the worst. In terms of standard deviation, the AD-PSO-Guided WOA algorithm has the lowest value of (0.1635) which indicates the algorithm's stability and robustness.
The convergence curve of the AD-PSO-Guided WOA algorithm compared to other algorithms is shown in Figure 5. The figure shows the exploitation capability of the algorithm and its ability to avoid possible local optima that can be occurred during the optimization process. Figure 6 shows the AD-PSO-Guided WOA average error based on the objective function compared to different algorithms. The minimum, maximum, and average values for different binary algorithms  The residual values and plots can be useful for some datasets that are not suitable candidates for feature selection. To achieve the ideal case, the residual values should be distributed uniformly around the horizontal axis. Considering that the sum and mean of the residuals are equal to zero, the residual value is computed as the difference between predicted and actual values. The residual plot is shown in Figure 7. A nonlinear and linear model is decided from the residual plot patterns and the appropriate one is determined. The heteroscedasticity plot is shown in Figure 7. Homoscedasticity describes if the error term is the same across the values of independent variables. Figure 7 also shows the quantile-quantile (QQ) plot, probability plot, and heat map. Since the distributions of points in the QQ plot are well fitted on the predetermined line, the actual and predicted residuals are considered to be linearly related. This confirms the presented AD-PSO-Guided WOA algorithm's performance.

C. ENSEMBLE FORECASTING SCENARIO
This scenario is formulated using ensemble-based models of the average ensemble, k-NN ensemble, and the proposed optimizing ensemble model based on the AD-PSO-Guided WOA algorithm. Some ensemble models use the training instances of the three base models of NN, RF, and LSTM. This can be used to forecast the unknown observations to predict wind speed. The hyperparameters that are fed to the AD-PSO-Guided WOA algorithm to train the LSTM model are the number of epochs T e , size of champion attention weights subset W a , encoding length for each attention weights L e , and size of attention weights set N a .
The evaluation metrics in this scenario include Root Mean Squared Error (RMSE), Relative RMSE (RRMSE), Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and the correlation coefficient (r) [40]. The RMSE is calculated as follows.
where predicted value is represented as H p,i and the H i value represents the actual measured. The total number of values is indicated as n. The RRMSE metric is calculated as follows.
The MAE calculates the average amount of errors in a set of predictions and is calculated as follows.  The MAPE is one of the most commonly used metrics to measure forecast accuracy, which is similar to MAE but normalized by true observation. MAPE can be calculated as follows.
The next metric is the correlation coefficient r which can be calculated as follows.
where x i represents values of variable x in a sample and y i represents values of variable y in a sample.x andȳ are the mean of the x values and y values, respectively. Different ensemble-based and single models results are shown in Table 8. The ensemble-based models in this table show promising results than single models of NN, RF, and LSTM. The proposed optimizing ensemble model, based on the deep LSTM learning model, with RMSE of (0.003728832), MAE of (0.00476), and MAPE of (1.8755), gives noticeable results compared to the k-NN and average ensemble models. Table 9 shows the detailed descriptive statistics of the presented optimizing ensemble model versus other models. Figure 8 shows the curves of Receiver Operating Characteristics (ROC) for the proposed optimizing ensemble model and the compared models. These figures indicated that the proposed ensemble model distinguishes data with a high Area Under the Curve (AUC) value near to 1.0. The proposed optimizing ensemble algorithm stability versus the compared models are confirmed by the RMSE  distribution, shown in Figure 9, the histogram of RMSE, shown in Figure 10, the histogram of RRMSE, shown in Figure 11, and the histogram of MAPE, shown in Figure 12.
Tests of ANOVA and Wilcoxon's rank-sum are applied in this scenario to evaluate the statistical differences between the presented and compared models. ANOVA output results are shown in Table 10. Wilcoxon's rank-sum statistical analysis presented in Table 11 determines if the models' results have a significant difference. For p-value < 0.05, this indicates significant superiority. The results show the AD-PSO-Guided  WOA algorithm-based proposed ensemble model superiority and also show the algorithm's statistical significance.
The residual plot in this scenario is shown in Figure 13. The heteroscedasticity plot, QQ plot, and heat map are also shown in Figure 13. Since the distributions of points in the QQ plot are well fitted on the line, the predicted and the actual residuals are considered as linearly related which confirms the proposed AD-PSO-Guided WOA ensemble-based algorithm's performance for the wind speed forecasting problem.

D. COMPARISONS SCENARIO
The third and last scenario is designed to show the performance of the optimizing ensemble-based AD-PSO-Guided WOA algorithm compared with PSO [19], WOA [22], GA [23], GWO [18], HHO [25], [26], MPA [28], ChOA [29],  and SMA [27]. The AD-PSO-Guided WOA algorithm ensemble model is also compared with four deep learning techniques including TDNN [30], DNN [31], SAE [32], and BRNN [33]. Table 12 shows the comparison results of the wind speed forecasting based on the proposed algorithm compared to other optimization techniques. The results in the table show that the presented optimizing ensemble model, based on the LSTM deep learning model and the AD-PSO-Guided WOA algorithm, gives competitive results with MAPE of (1.8755), MAE of (0.00476), RMSE of (0.003728832), RRMSE of (1.279369489), and r of (0.9998878) compared to other algorithms for the wind speed forecasting tested problem. Table 13 shows the proposed algorithm's descriptive statistics compared to other optimization techniques over 20 runs.
The ANOVA test results for wind speed forecasting based on the proposed algorithm compared to other optimization techniques is shown in Table 14. The test of the Wilcoxon Signed-Rank rest of the wind speed forecasting results based on the proposed algorithm compared to other optimization VOLUME 9, 2021   techniques is also shown in Table 15. The results confirm the superiority of the AD-PSO-Guided WOA based proposed ensemble model and indicate the statistical significance of the algorithm for the wind speed forecasting tested problem compared to the PSO, WOA, GA, and GWO algorithms. Table 16 shows the comparison results of the wind speed forecasting based on the proposed algorithm compared to    tested problem. Table 17 shows the proposed algorithm's descriptive statistics compared to other deep learning techniques over 20 runs.
The ANOVA test results for wind speed forecasting based on the proposed algorithm compared to other deep learning techniques is shown in Table 18. The test of the Wilcoxon Signed-Rank rest of the wind speed forecasting results based on the proposed algorithm compared to other deep learning techniques is also shown in Table 19. The results confirm the superiority of the AD-PSO-Guided WOA based proposed ensemble model and indicate the statistical significance of the algorithm for the wind speed forecasting tested problem compared to the TDNN, DNN, SAE, and BRNN techniques.

V. CONCLUSION
This paper uses a dataset of wind power forecasting as a case study from Kaggle to predict hourly power generation up to forty-eight hours ahead at seven wind farms. A proposed adaptive dynamic particle swarm algorithm with a guided whale optimization algorithm improves the forecasting performance of the tested dataset by enhancing the parameters of the LSTM classification method. The AD-PSO-Guided WOA algorithm selects the optimal hyper-parameters value of the LSTM deep learning model for forecasting purposes of wind speed. A binary AD-PSO-Guided WOA algorithm is applied for feature selection and it is evaluated in comparison with the GWO, PSO, SFS, WOA, FA, and GA algorithms using the tested dataset. An optimized ensemble method based on the proposed algorithm is tested on the experiments' dataset. The results of this scenario are compared with NN, RF, LSTM, Average ensemble, and k-NN methods. The statistical analysis of different tests is performed to confirm the accuracy of the algorithm, including ANOVA and Wilcoxon's rank-sum tests. The current work's importance is applying a new optimization algorithm to enhance LSTM classifier parameters. The presented algorithms will be tested for other datasets in future work. The algorithm will also be tested for other binary problems for the constrained engineering, classification, and feature selection problems. The sparsity of the proposed model will be evaluated and compared with other methods including the sparse autoencoding methods. and an H-index of 65. As the most cited researcher in robust optimization, he is in the list of 1% highly cited researchers and named as one of the most influential researchers in the world by the Web of Science. He is also internationally recognized for his advances in swarm intelligence and optimization, including the first set of algorithms from a synthetic intelligence standpoint-a radical departure from how natural systems are typically understood-and a systematic design framework to reliably benchmark, evaluate, and propose computationally cheap robust optimization algorithms. He is also working on the applications of multi-objective and robust meta-heuristic optimization techniques as well. His research interests include robust optimization, engineering optimization, multi-objective optimization, swarm intelligence, evolutionary algorithms, and artificial neural networks. He is also an Associate Editor of several journals, including Applied Soft Computing, Neurocomputing, Applied Intelligence, Advances in Engineering Software, IEEE ACCESS, and PLOS One.  He has launched and pioneered independent research programs. He is also motivating and inspiring his students by different ways by providing a thorough understanding of a variety of computer concepts and explains complex concepts in an easy-to-understand manner. His research interests include artificial intelligence, machine learning, optimization, deep learning, digital marketing, and data science. He is also a Reviewer for Computers, Materials & Continua journal, IEEE ACCESS, and some other journals.