A Combined Model Based on CEEMDAN, Permutation Entropy, Gated Recurrent Unit Network, and an Improved Bat Algorithm for Wind Speed Forecasting

Accurate and reliable wind speed forecasting is crucial for wind farm planning and grid operation security. To improve the accuracy of wind speed forecasting, a novel combined model is proposed for wind speed forecasting in this article. First, the complete ensemble empirical mode decomposition adaptive noise (CEEMDAN) and permutation entropy (PE) are employed to decompose the original wind speed time series into the sub-series with obvious complexity different; To overcome the disadvantage of weak generalization ability of single deep learning method when facing diversiform data, a cluster of gated recurrent unit networks (GRUs) with different hidden layers and neurons are applied to capturing the unsteady characteristics and implicit information of each sub-series; The predictions of the GRUs of each sub-series are aggregated into a nonlinear-learning regression top-layer which is consisted of radial basis function neural network (RBFNN), and improved bat algorithm (IBA) is introduced to optimize the parameters of RBFNN; Lastly, the prediction values of each nonlinear-learning top-layer are superimposed to obtain the final prediction values. To validate the effectiveness of the model, 15-min and 1-h wind speed data from the wind farm in Zhangjiakou, China, are used as test cases. The experimental results demonstrate that the proposed combined model can achieve the best performance and stability compared to other models. Such as the performance evaluation indexes (RMSE = 0.3294, MAPE = 2.6169%) are smallest obtained from case study 1, and (RMSE = 0.5876, MAPE = 4.7875%) are smallest obtained from case study 2.


I. INTRODUCTION
Due to the rapid development of modern industry, the rapid consumption of fossil fuels, the problems of environmental degradation and traditional resource consumption have become some increasingly serious. Wind energy as an environmentally friendly renewable energy, has been widely recognized and promoted worldwide. According to the Global Wind Energy Council, as of the end of 2018, the total installed capacity of wind power has reached 591GW in The associate editor coordinating the review of this manuscript and approving it for publication was Xiaowei Zhao. the worldwide, 209533 MW was contributed by China [1]. However, the randomness, abruptness and uncertainty of wind resources can not only destroy the reliability and stability for the wind farms and power system operation, but also result in low efficiency and high operating cost. And these will bring great challenges to the safe and stable operation of wind farms and grids with large-scale wind power integration [2], [3]. Therefore, reliable and precise wind speed forecasting not only helps to maintain the stability of the wind turbine output and develop an optimal wind farm maintenance plan, but also can reduce the operating cost of the power system [4]. In recent years, with the increasing number of wind farms, power generation companies have established corresponding wind farm centralized monitoring centers to facilitate centralized management and dispatch. And this allows the forecasting systems originally set up in the wind farms to be moved to the centralized monitoring center, and then the centralized control center distributes the wind speed and wind power prediction results obtained by the prediction system to the corresponding wind farms, thereby greatly saving costs. However, due to the scale, geographic location, climatic conditions and wind turbine models of each wind farm are different [5], it is difficult for the wind speed forecasting strategy of the same centralized control center to adapt to the characteristics of different wind farms, which gives the centralized control center management scheduling brings many problems. Therefore, it is necessary to propose a reliable and accurate wind speed forecasting method that is universally applicable to multiple wind farm environments under the centralized control system. Moreover, the centralized management of wind farms provides environmental support for us to build a wind speed forecasting system with more complex algorithms and stronger adaptability.
In recent years, many researchers have given much attention to the research of the methods of wind speed prediction. These methods can be classified into the following four types: Physical models, conventional statistical models, artificial intelligence models, and hybrid models. Physical models always use current geographic data and a variety of meteorological data to solve complex mathematical models such as temperature, pressure and humidity, and it has a large error in wind speed forecasting in short-term or ultra-short term. The widely used conventional statistical models include the ARIMA (autoregressive integrated moving average) model, the quantile-regression model (QR), and the Kalman-filter model. Masseran [6] has proposed a prediction model consisting of ARIMA and ARCH (Autoregressive Conditional Heteroskedasticity). Kavasseri and Seetharaman [7] proposed a fraction-ARIMA to predict the wind speed of one-day and two-day ahead in North Dakota. The conventional statistical model uses historical data to establish a predictive model, which can capture the linear relationship of the training data set well. However, due to the randomness and inherent complexity of the wind speed, the conventional statistical model has a low prediction accuracy. With the rapid development of artificial intelligence technology, artificial intelligence models have been successfully applied to time series prediction. Among them, back propagation neural network (BPNN), RBFNN, extreme learning machine (ELM), wavelet neural network (WNN) and other artificial neural networks, support vector machine (SVM) and expert systems have been widely used in wind speed prediction [8], [9]. Wang et al. [10] used RBFNN to predict wind speed. The experimental results show that the method has achieved certain improvement in accuracy. Santamaríabon et al. [11] used SVM to predict wind speed in the short and medium term. Jiang et al. [12] proposed a v-SVM hybrid short-term wind speed prediction model based on the cuckoo search algorithm optimization, which has improved the nonlinear approximation ability of the prediction model.
At present, a large number of wind speed prediction methods have been proposed, which improve the accuracy of wind speed prediction to some extent. However, the original wind speed time series is noisy and unstable. Therefore, directly using the original wind speed time series to establish a prediction model will produce a large of mistakes [13], [14]. The research shows that the data analysis method for wind speed time series can reduce the influence of wind speed non-stationarity on the prediction results. The decomposition-reconstruction prediction model based on signal analysis method has received more and more attention [15]. Empirical Mode Decomposition (EMD) is an adaptive signal processing method that decomposes time series into an intrinsic mode functions (IMFs) and a residue [16], [17]. Reference [18] proposed a new method by combining the two technologies of EMD and Elman neural network (ENN). The model showed a better and more accurate prediction. However, EMD has many drawbacks, such as end effect, mode-mixing [19]- [21], and so on. As an extension of EMD, ensemble empirical mode decomposition (EEMD) decomposes signals by adding white Gaussian noise and treating the average result as a real final result [22]- [24], which improves the EMD and avoids the phenomenon of modal-aliasing. However, the white noise added by the EEMD method in practice has not been completely offset after multiple averaging. CEEMDAN is proved to be an important improvement progress on EEMD [25]. Ye et al. [26] combined EMD and its improved version, EEMD/CEEMD/CEEMDAN with Support Vector Regression (SVR), to establish a predictive model. The results show that CEEMDAN-SVR is the best method.
In recent years, deep learning has been growing rapidly, which has achieved obvious advantages in different application fields, and has received more and more attention. [27]. Among the many deep learning methods, GRU and LSTM has improved the recurrent neural network (RNN) with special memory structure and gate structure to enhance long-term memory [28], [29]. And they have compensated for the gradient explosion of RNN and the long-term dependence of time series. However, GRU uses a gated recurrent neural network structure, which has fewer training parameters than LSTM. Meanwhile GRU is superior to LSTM in forecasting accuracy and forecasting speed, and it shows more adaptability in the analysis of time series data [30], [31]. Therefore, the GRU is chosen as the principal forecasting engine in this article. In recent years, in order to obtain an advanced prediction approach for higher accuracy levels and wider forecast horizons, the approaches called as hybrid models and combined models have emerged [32]- [34]. Niu and Wang [35] proposed a including complete ensemble empirical mode decomposition with adaptive noise and a multi-objective grasshopper optimization algorithm based on a no-negative constraint theory. The advantages of each individual forecasting model are effectively utilized, the stability VOLUME 8, 2020 and accuracy of the prediction results are considered, and this method achieves better prediction.
Based on the above analysis, a novel combined model, based on CEEMDAN, PE, GRU neural network, RBFNN and IBA is proposed in this study for wind speed forecasting. First, CEEMDAN is used to decompose non-stationary wind speed time series into different IMF components with corresponding frequencies. Next, PE was applied to analyze the complexity of each IMF component, and recombine the IMFs with similar entropy values. To solve the issues of over-decomposition and computational burden [36], [37]. Then, inspired by integrated learning technology, a cluster of GRUs with diverse hidden layers and neurons are applied to capture the unsteady characteristics and implicit information of each recombination sequence. In order to overcome the shortcomings of the traditional combination model linear representation, the predictions of GRUs are aggregated into a nonlinear-learning regression top-layer to give the final ensemble prediction of each recombinant sub-series in this article. The nonlinear learning top layer used in this article is composed of RBFNN, and the improved bat algorithm (IBA) is introduced to optimize the center vector, base width and weight of RBF neural network by which to overcome the problem of easy convergence to local optimum. Finally, the prediction results of the nonlinear-learning top layer of each recombination sequence are superimposed to obtain the final wind speed prediction value. To validate the accuracy of the proposed combined model, 15-min and 1-h wind speed data from the wind farm in Zhangjiakou, China, are used as cases. The results of experiments and discussions show that the proposed combined model can achieve a better forecasting performance and stability compared to other models.
The rest of the paper is organized in the following way. The related basic theory of nonlinear-learning ensemble of deep learning time series prediction for wind speed forecasting, CEEMDAN and PE are introduced in Section 2. In Section 3 BA and IBA are described. Section 4 presents the main structure of the proposed combined model. Then, in Section 5, two case studies are demonstrated, in which the prediction results of the proposed model and other involved models are evaluated. Finally, the conclusions are presented in Section 6.

II. RELATED METHODOLOGY A. CEEMDAN
When EMD technology was proposed by Huang et al. in 1998 [16]. EMD can adaptively decompose the original signal into IMF components with different frequencies.
EEMD is an improved method based on EMD, which mainly adds different white Gaussian noises to the original sequence multiple times, then separately performs EMD decomposition, and finally obtain the final result by the average of the obtained IMF components, avoiding the phenomenon of mode-mixing [38], [39]. However, in practice, the white noise added by the EEMD method does not be canceled completely after a number of averaging. On the basis of EEMD, CEEMDAN obtains the IMF by adding adaptive white noise and calculating the unique margin signal to overcome the EEMD deficiency, so that the reconstructed signal is almost identical to the original.
Let s(n) be the wind speed time series, v i (n) be the white Gaussian noise sequence added by the i-th experiment, and the i-th decomposition wind speed time series can be expressed as s i (n) = s(n) + v i (n). The definitions of E k (.) and are the Kth modal components produced by EMD and CEEMDAN, respectively. The specific steps of the CEEM-DAN algorithm are as follows: (1) Like the EEMD method, CEEMDAN performs I-time decomposition for wind speed time series s(n) + v i (n). The first modal component is calculated by the EMD method: (2) Calculate the first residual sequence: (3) Decompose (r 1 (n)) + ε 1 E 1 v i (n) . for i(1, 2, . . . I ) times Calculate the second modal component: (4) Similarly, calculate the kth residue sequence in the remaining phase. Then, according to step (3), the (k +1)th modal component is obtained: (5) When the residual sequence cannot be decomposed, that is, the maximum number of extremum points of the residual signal is no more than two, the algorithm terminates. At this point, k modal components are obtained, and the final result of the residual sequence is: The original wind speed time series s(n) can be expressed as: s(n) = K k=1 IMF k + R(n). In order to verify the effectiveness and the advantage of the CEEMDAN, Fig. 2 shows the decomposition results of EMD, EEMD and CEEMDAN respectively applied to an original wind speed series. The original wind speed series is shown in Fig. 1. And the reconstruction error between the original wind speed series and the sum of its corresponding IMFs obtained by EMD, EEMD and CEEMDAN are shown in Fig. 3. It can be observed from Fig. 2a, the results of EMD decomposition has severe mode-mixing phenomenon (IMF3-IMF6 in Fig. 2a). Fig. 3 shows that EEMD method has the largest reconstruction error compared to other methods, it demonstrates the incomplete decomposition of EEMD algorithm. From Figs. 2 and 3, it can be seen that the CEEM-DAN method show better decomposition performance and produces a negligible reconstruction error than other two methods. Therefore, the CEEMDAN method is adopted in this study for wind speed time series decomposition.

B. PERMUTATION ENTROPY
PE is a measure of the complexity of time series [37]. It has the advantages of simple calculation, strong anti-noise ability and stable calculation value. It possesses high sensitive to time series changes and good robustness and has been widely used in various time series. The wind speed data has certain randomness and non-stationarity, which makes the IMF component obtained by CEEMDAN decomposition more. Therefore, in order to reduce the computational scale, this article uses PE algorithm to carry out IMF reorganization, the algorithm of PE can be presented as follows: (1) First, phase space reconstruction is performed on each IMF sequence {X (i), i = 1, 2, . . . , N } obtained by CEEMDAN to obtain a phase space matrix: where m and τ represent the embedding dimension and delay time, respectively; j = 1, 2, . . . , N − (m − 1)τ .
(2) Taking each row of the above matrix as a component, there are a total of K components, and there is a rela- as an example, in ascending order according to the element value: where j 1 , j 2 , . . . , j m represents the serial number of the com- according to the size of j 1 and j 2 values. In summary, for each row vector of the phase space reconstruction matrix of any wind speed time series X i , a set of sequences S(l) can be obtained.
where, l = 1, 2, . . . , K and there are k ≤ m!, m-dimensional spaces have a total of m! different mapping symbol sequences.
(3) Calculate the probability p 1 , p 2 , . . . , p k of occurrence of each symbol sequence S(l). According to the form of Shannon entropy, the permutation entropy H p (m) of the k-th different symbol sequence of time series X (i) can be defined as: Decomposition results of wind speed by EEMD, EEMD and CEEMDAN. VOLUME 8, 2020 It can be seen from the above formula that when, P j = 1/m!, H p (m) reaches the maximum value ln(m!). For the sake of convenience, standardize H p (m).
where, 0 ≤ H PE (m) ≤ 1 The size of the PE value represents the complexity and randomness of the time series. The larger the value, the stronger the randomness; on the contrary, the weaker the randomness. The embedding dimension m and the delay time τ are two important parameters of the PE algorithm. Studies have shown that when the sample size is small, the embedding dimension m is usually small; the delay time τ has less influence on the permutation entropy.
Through the above steps, each IMF arrangement entropy is calculated. According to the closeness of the arrangement entropy values, the IMF components are combined to obtain a recombinant subsequence.

C. GATED RECURRENT UNIT
Recurrent Neural Network (RNN) is a special neural network developed for processing time series data [27]. However, as the network layer deepens and the number of neurons increases, RNN often suffers from gradient disappearance and gradient explosion in actual training, which limits its application. To solve these problems, LSTM and GRU are proposed. They are similar in function. However, with the special gate structure, GRU is superior to LSTM in accuracy and speed of prediction and is widely used [30]. Fig. 4 shows the structure of GRU. It has two important gate structures, being r an update gate and z a reset gate. The following calculation formulas are used in the GRU output calculations: (1) Reset Gate: (2) Update Gate: (3) Output: Where: h t and h t−1 represent the historical information; h t represents the candidate activation value; x t represents the current sequence input; σ and tanh represent the activation function; ω r , ω z , ω, ω o are training parameter matrix.

III. THE NONLINEAR-LEARNING ENSEMBLE OF GRU TIME SERIES PREDICTION FOR SHORT-TERM WIND SPEED FORECASTING
To achieve a better wind speed forecasting, in the structure of nonlinear-learning ensemble learning, the predicted values of a cluster of GRUs, for each recombination sequence, are input to a nonlinear-learning regression top-layer to obtain the final predicted values of each recombination sequence. Considering the RBF neural network can approximate any nonlinear function and the convergence rate is fast.
Thus, RBFNN is introduced as the nonlinear-learning toplayer. To improve the predictive ability of the nonlinearlearning regression top-layer, an improved bat algorithm, called IBA, is proposed to optimize the center vector c i , base width σ i and weight w ij of RBF neural network. The basic concepts of RBFNN and IBA are described below.

A. PRINCIPLE OF RBF NEURAL NETWORK
The RBF neural network is a highly efficient three-layer neuron feedforward neural network, which can approximate any nonlinear function and has a fast convergence rate [40]. The RBF neural network consists of an input layer, an implicit layer, and an output layer, in which the input layer is used to connect the network to the external environment; There are many forms of the radial basis function of the hidden layer. The Gaussian function has the advantages of radial symmetry, good smoothness and simple expression. Therefore, Gaussian function is chosen as the radial basis function to map the input vector from the input space to the high-dimensional space. The output layer is the linear output of the hidden unit.
The typical structure of an RBF neural network is shown in Fig. 5. Where, the input is: . . , h n ] T , h 1 represents the output of the i-th neuron in the hidden layer: where 1 ≤ i ≤ t, σ i is the width of the Gaussian function of the i-th neuron of the hidden layer, c i is the central value of the t-th neuron Gaussian function of the hidden layer. w ∈ R t×n is the output weight matrix of the RBF network, y = [y 1 , y 2 , . . . , y m ] T is the output of the RBF network, y j represents the jth output of the hidden layer: where 1 ≤ j ≤ m. BA is the most recent nature inspired optimization algorithm proposed by Professor Yang in 2010 to simulate bat hunting with ultrasound. It combines the advantages of intelligent algorithms such as particle swarm and ant colony to achieve a major breakthrough in convergence rate and optimization ability. The BA is based on the analysis of bat behavior characteristics, and establishes mathematical models with pulse frequency f , pulse intensity A, velocity v, and pulse emissivity BA. The basic steps of the BA are as follows: (1) Parameter space initialization: Initialize the parameter space. That of the BA algorithm consists of the number of iterations iter, the position vector x, the velocity vector v, the pulse intensity A, the pulse emissivity r, and the pulse range [f min , f max ].
(2) Calculate the fitness value of the bat and find the current optimal solution. And record the location of the optimal solution bat. ( where β ∈ [0, 1] is a random vector that satisfies a uniform distribution, x * is the current global optimal solution, and t is the current number of iterations. In addition, during the parameter initialization process, A indicates the frequency at which the i-th bat emits sound waves, (4) Generate a random number: rand. If the condition rand > r i is satisfied, according to the Eq. (22), search for the local new solution near the current global optimal solution where ε is the random number satisfying [−1,1], and A t is the average pulse intensity of all bats. (5) Generate a random number rand, if the new solution satisfies the following condition: rand < A i &f (x i ) < f (x * ), the pulse intensity and pulse transmission frequency can be updated according to the Eq. (23) and Eq. (24).
where γ is the pulse transmission frequency enhancement coefficient, γ > 0 and β are the attenuation coefficients of the loudness, β ∈ [0, 1]. (6) Evaluate the bat population, and search for the bat individual with the minimum fitness value.
(7) Determine whether the algorithm termination condition is met, and if yes, proceed to step (8), otherwise, returns to step (2) to continue the iterative training.
(8) The algorithm ends, outputting the position vector and fitness value of the optimal bat individual.
Because BA balances the global and local search process well, it has a performance advantage over traditional intelligent algorithms. However, as the number of digits in the optimization problem increases, its performance may degrade. In order to improve that, relevant literature has proposed some research. Nevertheless, the parameters of these modified BAs are obtained through trial and error methods, while introducing cumbersome computational problems. Based on the Eq. (21), this article introduces the dynamic inertia weight, increases the speed diversity of BA in the global search and local search process, and does not increase the excessive computational burden. The update is shown in Eq. (25): where ω(t) is the dynamic inertia weighting factor, and its value can affect the convergence rates. In order to increase the diversity of the algorithm in speed, the expression of ω i (t) is: where N (0, 1) is a random variable subject to a normal distribution.
In order to enhance the diversity of the bat position, the loudness attenuation coefficient in the loudness Eq. (23) is changed to improve its adaptability.
The loudness attenuation coefficient A, when taken as a smaller value, will give the algorithm a local search capability, while the larger value will encourage the algorithm to search globally. In order to achieve an appropriate balance between global search and local search, the experiment found the following dynamic formula:  Table 3, two points can be concluded: (a) For the four test functions, the maximum, minimum, and average number of iterations of the IBA are less than that of BA. This means that IBA has successfully improved the convergence ability of BA. (b) For the Rastrigin's function and the Schaffer's function, the convergence rates of BA didn't obtain 1. For these two test functions, the convergence rate of the IBA is 1. Therefore, the optimization performance of the IBA algorithm is also significantly improved compared to BA.

D. CONSTRUCT IBA-RBF ALGORITHM
The RBF neural network has a strong advantage in dealing with complex nonlinear mapping problems. However, the performance of an RBF neural network is highly dependent on its center vector, base width, and weight. This article introduces an improved bat algorithm to optimize the center vector, base width and weight of the RBF neural network to overcome its problem of easy convergence to local optimum. The objective function chosen by this article is: where N is the number of samples, y n is the actual value, and ∧ y n is the predicted value at time n. The flow chart of IBA optimization RBFNN is shown in Fig. 6.

IV. MAIN STRUCTURE OF PROPOSED COMBINED MODEL
In this section, A novel combined model is proposed, which takes CEEMDAN-PE technique, a cluster of GRUs with diverse hidden layers and neurons and IBA optimize the parameters of RBFNN together. The main process of is demonstrated in Fig. 7. The detailed processes of the proposed combined model are as follows: A. STEP1: DATA PREPROCESSING STRATEGY Due to the intermittent characteristics and highly noisy of wind speed series seriously restrict the wind speed   prediction accuracy, the CEEMDAN technique was employed to decompose raw wind speed series into several IMF signals. Next, PE was employed to calculate the PE values of each IMF, and then to recombine the IMFs into a new subseries (PE-IMFs) to solve the problem of over-decomposition and computational burden.  It should be noted that the network structure of the GRU cannot be predefined for specific data. In the actual problem, the parameters are selected through trial and error to make a trade-off between the learning performance and the complexity of the model. After that, the GRU cluster used in the integrated learning part of this article consists of five different GRUs.
Five diverse GRUs are adopted based on trial and error: GRU1: 1 hidden layer with 20 neurons. GRU2: 1 hidden layer with 40 neurons. GRU3: 2 hidden layers with 20,40 neurons. GRU4: 2 hidden layers with 40,60 neurons. GRU5: 2 hidden layers with 60,60 neurons. Five different GRU networks are trained as a single predictive model to forecast the PE-IMFs on train and test dataset. The prediction of five GRUs models on training set is the input as train-feature into IBA-RBFNN to learn. The task of IBA-RBF is to learn the nonlinear relationship of five GRUs predictors, similar to solving multiple regression problems.
Final, the prediction results of the five GRU models on the test dataset are input to the well-trained IBA-RBFNN to obtain the prediction result of the recombinant sub-sequence PE-IMFs.

C. STEP3: ASSEMBLE FORECASTING RESULT
The combined forecasting result of each PE-IMF is obtained by step 2. Then, the forecasting results of which are superimposed to get the final result.

V. EXPERIMENTS AND ANALYSIS A. WIND SPEED DATA DESCRIPTION
The Zhangjiakou area has a rich reserve of wind energy resources, especially in the Bashang area. The Zhangjiakou Bashang area is a rare wind energy resource gathering area in China. The city's wind energy resource reserves in Zhangjiakou reach more than 20 million KW (70% of the land wind resource reserves in Hebei Province).
In this article, two historical short-time wind speed datasets i.e. dataset A with a 15-min interval, and dataset B with a 1-hour interval are regarded as two illustrative examples to test the forecasting performance of the proposed combined model. The wind speed time series is displayed in Fig. 8. Dataset A: The wind speed data were sampled at an interval of fifteen-min from November 21, 2013 to December 10, 2013, 1920 data points in total. Dataset B: The wind speed data were sampled at an interval of one-hour from January 22, 2013 to April 10, 2013, 1920 data points in total. Each dataset is classified into a training set (the first 1632 values) and a testing set (the last 288 values). Specifically, the first 816 values of training set, were applied to train five GRUs models; And the last 816 values of training set, were applied to train the nonlinear-learning regression top-layer model. The statistical information of the above dataset is showed in Table 4. And the proposed combined model was implemented though mixed language programming based on Python and MATLAB. And the GRU Neural Networks was operated by using the ''Keras'' deep learning package.

B. EVALUATION OF FORECASTING PERFORMANCE
Four evaluation criteria are considered to check the forecast performance of the proposed models [42]. And they are defined as shown in Table 6. Here y n and ∧ y n represent the actual value and predicted value at time n, respectively. N is the sample size.

C. DATA DECOMOPOSITION BY CEEMDAN-PE
CEEMDAN decomposition was performed on the wind speed time series, and I = 500 groups of white noise signals with a non-standard deviation of 0.2 were added to obtain IMF components. The permutation entropy values of the IMF components are calculated, and the results are shown in Table 6. In order to avoid excessive decomposition and reduce the calculation scale, the IMF components with similar entropy values are superimposed to obtain the recombination component PE-IMFs, according to the permutation entropy values shown in Table 5. The recombinant sequences are shown in Fig. 6 and Fig. 7.

D. CASE STUDY 1: 15-min TIME-SCALE WIND SPEED FORECASTING
In this case study, the wind speed data of Dataset A is used to perform 15-min ahead utmost short-term wind speed forecasting. The wind speed is displayed in Fig.8.

1) EXPERIMENT 1: COMPARISON WITH MODELS USING DIFFERENT DATA DECOMPOSITION TECHNIQUES
Experiment I was designed to compare proposed combined model with three models based on different data decomposition techniques, namely WPD based model, EMD based model and EEMD based model. Fig. 11 shows the forecasting performances comparison of proposed combined model and three models. The comparison results are shown in Table 7.    Table 8. Fig. 13 visualizes the comparison VOLUME 8, 2020    Moreover, from the analysis of Figs. 12 and 13, a comparison of the forecasting results indicates that the proposed combined model has the best accuracy and stability when compared with other models, shows apparently a better curve fitting of the actual wind speed time series and smaller residual errors. Meanwhile, it can be clearly seen that compared with the CEEMDAN-PE-GRU, the ensemble of GRUs has improved the forecasting performance of single GRU. performances comparison of proposed combined model and three models. The forecasting results of each model are shown in Table 9. From Fig. 14 and Table 9. It can be seen that the proposed combined model can achieve better forecasting performance, which indicates the superiority of nonlinear-learning top-layer composed by IBA-RBF model.

E. CASE STUDY 2: 1-H TIME-SCALE WIND SPEED FORECASTING
As described above, the proposed combined model has shown its superiority in 15-min time-scale wind speed forecasting. In order to investigate the potential of the model in 1-h time-scale wind speed forecasting, in case study 2 three experiments are designed to validate the accuracy of the proposed combined model.

1) EXPERIMENT 1: COMPARISON WITH MODELS USING DIFFERENT DATA DECOMPOSITION TECHNIQUES
Experiment I was designed to compare proposed combined model with three models using different data decomposition techniques, namely WPD based model, EMD based model and EEMD based model. Fig. 15 shows the forecasting performances comparison of proposed combined model and three models. The comparison results are shown in Table 10.
From Table 10 and Fig. 15, the proposed combined model achieves the best forecasting performance, with the MAE, RMSE, MAPE and SSE of 0.4453, 0.5876, 4.7875% and 96.9548. Then, the models from highest to lowest based on the forecasting accuracy are EEMD based model, EMD based model and WPD based model with MAPE values of 4.9072%, 5.2324 % and 5.4861%.   Table 11. Fig. 16 provide the bar graph of prediction VOLUME 8, 2020

FIGURE 12.
Forecasting results of combined model and models using same data decomposition technique in case study 1.     3.5795%, 3.7818%, 4.4607%, 4.6432% and 4.9521%. Similar to case study 1, compared with the CEEMDAN-PE-GRU, the ensemble of GRUs has improved the forecasting performance of single GRU.

3) EXPERIMENT 3: COMPARISON WITH MODELS USING DIFFERENT NONLINEAR-LEARNING TOP-LAYER
The aim of experiment 3 is to compare the proposed combined model with models using different nonlinear-learning VOLUME 8, 2020 top-layer. Similar to case study 1, three forecasting models of different top-layers are used to analysis of impacts of the ensemble learning top-layer on prediction performance. The forecasting results of each model are shown in Table 12. Fig. 18 shows the forecasting performances comparison of proposed combined model and other models. It can be observed from Table 12 and Fig.18 that our proposed combined model performs superior wind speed forecasting when compared with CEEMDAN-PE-GRU-SVR, CEEMDAN-PE-GRU-BP and CEEMDAN-PE-MeanGRU, which is benefited from nonlinear-learning ensemble top-layer of IBA-RBF.
Remark: Compared with Case Study 1, it can be seen that when the forecasting time is extended from fifteen minutes to one hour, influenced by the volatility and abruptness of wind resources, the forecasting errors of each model also become correspondingly larger. 1-h time-scale wind speed forecasting is more complicated and difficult than 15-min time-scale wind speed forecasting. However, the accuracy and stability of prediction of the combined model proposed in this article are still optimal compared to other models.  In order to further assessing and discussing the forecasting performance of the proposed combined model, the DM test is conducted to examine the effectiveness of the proposed combined model from a statistical perspective [42]. The specific steps are presented are as follows: The null hypothesis is that the predictions of the two models are no significant differences. In contrast, the alternative hypothesis is that a significant difference in the forecasting performances of the two models. The null hypothesis and alternative hypothesis of the DM test are expressed as follows: where e 1 t and e 2 t are the forecasting errors between real values and forecasted values of the proposed combined model and another compared model, and F represents the loss function for forecasting errors.
Further, the statistics of DM test can be expressed by: where S 2 is the estimated variance of d = F(e 1 t ) − F(e 2 t ). After obtaining values of the calculated DM and the critical value Z α/2 , if the DM statistic is greater than Z α/2 or less than −Z α/2 , the null hypothesis will be rejected, and it is deemed that the distinctions between our proposed combined model and the compared model are evident.
The values of the DM test for two time-scales are shown in Table 13. From the results given in Table 13, it can be clearly seen that the proposed combined model is a significant difference from all the other models at a 1% significance level. In addition, the smallest value of DM for two time-scales are separately 2.8906 and 2.9546. Thus, the null hypothesis could be rejected at a 1% significance level. Therefore, the proposed combined model significantly outperforms the other models, and this verifies the effectiveness of our proposed combined model for wind speed forecasting.

2) COMPUTATIONAL EFFICIENCY: RUN TIME
In order to analysis the computational efficiency of the proposed combined model. Table 14 presents the average computation time of the proposed combined model and the other models with two time-scales wind speed forecasting. Specifically, the average computation time of the proposed combined model for two time-scales are separately are separately 382.56 and 386.98, which are less than the values from models using other data decomposition technologies, and models using different nonlinear-learning top-layer except the CEEMDAN-PE-MeanGRU model. Moreover, relative to the models only based on CEEMDAN-PE, the computation time of the proposed combined model is longer, which is reasonable because it has a better forecasting performance. And in practical implementation, the time consumed is acceptable. Moreover, the run time can be further improved by using a high-performance computer.

VI. CONCLUSION
Wind energy, as a crucial type of renewable energy, has aroused widespread interest and research enthusiasm. However, the intermittent characteristics and highly noisy of wind speed series increases forecasting difficulty to a great extent. In addition, due to the scale, geographic location, climatic conditions and wind turbine models of each wind farm are different, it is difficult for the wind speed forecasting strategy of the same centralized control center to achieve the desired accuracy across different wind farms. Motivated by recent developments in combined model forecasting, we developed a novel combined model for wind speed forecasting, which can not only achieve precise forecasting, but also be applicable to various wind farms environments under the centralized control system. In this article, two wind speed series collected from the wind farm in Zhangjiakou, China are adopted as cases to validate the accuracy and stability of the proposed combined model.
The experimental results demonstrate that the advantages of our proposed combined model mainly lie in the following aspects. (1) When comparing between the proposed combined model and models using different data decomposition techniques. The MAE, RMSE, MAPE and SSE values of our proposed combined model in two time-scales wind speed forecasting are all ranked as the best, which indicates the superiority of the proposed data decomposition technique. (2) Similarly, compared with models only based on CEEMDAN-PE in two time-scales wind speed forecasting, the lowest MAPE values for two time-scales are obtained from the proposed combined model, with values of 2.6169% and 4.7875%, which are reduced by 2.3352% and 2.0951% compared with maximum MAPE values obtained from the CEEMDAN-PE-SVR, which indicates the proposed combined model has excellent forecasting accuracy relative to models only based on CEEMDAN-PE. (3) When comparing between the proposed combined model and models using different nonlinear-learning top-layer. The proposed model can achieve better forecasting accuracy and more stable forecasting performance, which indicates the superiority of nonlinear-learning top-layer composed by IBA-RBF model. (4) In two case studies, generalized from all the models that, the proposed combined model has the best forecasting performance. In