A Hybrid Short-Term Wind Power Prediction Model Combining Data Processing, Multiple Parameters Optimization and Multi-Intelligent Models Apportion Strategy

As the wind power dates have strong volatility and intermittency, it is difficult to meet the safety and stability of power system operation. To make full use of the characteristics of wind power data and further improve wind power forecasting performance, an innovative hybrid framework is proposed to response to the above challenges, of which the non-stationary weakening technique, sample entropy (SE)-based prediction model allocation, optimized reduced kernel extreme learning machine (RKELM) and a novel deep learning network are integrated befittingly. Initially, the original wind power sequence is decomposed into multiple sub-sequences by empirical wavelet transform (EWT) adaptively, after which SE is employed to analyse the high and low components of sub-sequences. Then, a simplified gate recurrent unit network (SGRU) to achieve the prediction of the component with low SE-value. Besides, for the components with high SE-values, the multiple parameters optimization is implemented for RKELM based on slime mould algorithm (SMA), which possesses superior convergence speed and precision. Ultimately, the final prediction results can be obtained by accumulating the predicted values of the multiple models. In the experiment phase, two datasets are adopted to verify the predictive ability of the proposed hybrid EWT-SMA-RKELM-SGRU model, where the sufficient results indicate that the proposed model has a superior performance.


I. INTRODUCTION
With striking aggravate of environment pollution, the gradual depletion of traditional energy, and a goal for zero-carbon emissions of government, people has been an increasing attention to the renewable energy [1]. In recent decades, wind power, as one of comprehensive technology involving environmental protection and sustainable development, has become remarkably significant as renewable energy comparing with other traditional energy sources, such as coal, petroleum and natural gas [2]. Generally, wind power operation possessing abundant uncertainties, which conclude intermittent factors of varied topography, atmospheric pressure, sun illumination and random effects of wind speed The associate editor coordinating the review of this manuscript and approving it for publication was Huai-Zhi Wang . as well, has a great challenge. For those mentioned above, a precise short-term wind power forecasting undoubtedly plays a crucial part in aspects of effective dispatch plan and optimal unit combination scheduling. In reality, owing to the fact that data-driven forecasting approach has the ability of excavating the dynamic behavior of wind power development trends, stability and accuracy of prediction results are the important indicators for short-term wind power forecasting, Consequently, the development for accurate wind power model appears remarkably importance by degrees, thereby promoting economic benefit for wind power plants effectively and conspicuously. It is worth noting that as most significant factor of wind power, wind speed seriously and actually influences over the quality and stability of power system, which reveals usually characteristics of extreme abruptness [3]. Researchers have been gradually adopting measures a lot on prediction of wind power, owing to providing sensible suggestions for power system management, wind power generator turbines control and energy negotiations [4].
Wind power data changing with various factors, leading to dynamic features and time-varying information, is generally gathered as time-series, which may include the main trend, periodic trend and quasi-periodic trend implicitly [5]. Therefore, volatility of the wind power series impact undoubtedly the prediction effect of predict engine that could not get rid of influence for noises, which are implied in raw wind power series [6]. To promote performance of forecasting results, Scholars have pay attention to the combined prediction structures, in which the combination for a series of individual approaches is implemented. For time series prediction, models based on decomposition techniques take advantage frequently to solve the strong non-stationary and volatility of wind power time series [7], [8]. Besides, the hybrid or ensemble approaches mainly consist of two parts: pre-treatment and post-treatment respectively. The process of pre-treatment utilize decomposition method to decompose and reconstruct time-series dataset for further usage, such as empirical wavelet transform (EWT), variational mode decomposition (VMD), empirical mode decomposition (EMD), singular spectrum analysis (SSA) as well as their improved forms. Moreover, the process of post-treatment usually refer to prediction modeling of data via pre-treatment using artificial intelligence(AI) models such as support vector regression (SVR), extreme learning machine (ELM), and deep learning models. For example, researchers proposed a novel robust hybrid method, namely IVMD-SE-MCC-LSTM, for short-term wind power forecasting, whose raw datasets are decomposed and reconstructed by improved variational mode decomposition (IVMD) proposed in their paper, thus to group modeling employed Long Short-term Memory (LSTM) [9]. Analogously, Safari et al. [10] developed a new multi-step framework for short-term power prediction applying multi-scale singular spectrum analysis(MSSSA) and least squares support vector machine(LSSVM), where the results demonstrate better performance in of the proposed model in the aspect of computational burden. Scholars have adopted long-short term memory network to predict oil production, where network architecture and window size are optimized by genetic algorithm, fed oil production time serials data of SL and JD oilfields as input, before which ensemble empirical mode decomposition (EEMD) with dynamic time warping (DTW) based decide the model subsequences through mean value comparison [11]. A time domain signal decomposing algorithm proposed by Huang et al. [12], reconstructs several intrinsic mode functions (IMFs), namely locally narrow band components, of which modal aliasing is inherent qualities yet [13]. Dragomiretskiy and Zosso [14] developed a novel decomposition approach, referred to as VMD, to turn decomposition for signal into variational solution problem and, transform signal into ensemble of band-limited modes. In spite of which the VMD has such advantages and been applied in many fields, such a decomposition method still not extricate itself from some shortcomings that the punishment factor and number of modes are not non-automatically determined [15]. SSA, firstly presented by Bonizzi et al. [16], has been widespread utilized in various research fields, as a well-known data analysis instrument for decomposition. However, there are comparatively low quality and scarce data that it is not well received frequently yet in some application scenarios [17]. Besides, embedding dimension should be selected carefully and experimentally in construction process of SSA, where the improper dimensions could bring to the effect of adverse decomposition results [18]. In contrast, EWT, a novel approach for prepossessing data with combined characteristics of adaptability of EMD and formalism of wavelet transform (WT), possesses an admirable advantage that is to remedy technology for WT's demand of wavelet basis and parameters [19].
The prevailing wind power prediction approaches can be briefly divided into four categories, namely physical modeling, statistical modeling, intelligent modeling, and hybrid structure modeling. As to physical models, numerical weather prediction (NWP), which could generate forecasting results of long-term and large-scale and based on ensemble of sigmoidal power function, was developed by Pelikan et al. [20] to calculate week-ahead or day-ahead forecasting [21]. Nevertheless, technologies with physical-based requires typically fruitful characteristic value, such as atmospheric density, wind farm area topography and surface roughness of parts [9], whose applications are limited universally in short-term wind power forecasting [22], [23]. Statistical modeling methods, roughly including ARMA (autoregressive moving average) [24], [25], ARIMA (autoregressive integrated moving average) [26], [27], and f-ARIMA (fractional ARIMA) [28], [29], could much fit for prediction of short-term wind power, as considering the mapping relationship within the input and target variables [30], by which nonlinear and irregular features of wind power series could not be tackle with because of the prior assumption linear forms well [31]. In recent years, machine learning and deep learning approaches [32], developed widely in the field of prediction, could achieve high-predictions for short-term wind power series that possess non-stationary and nonlinear characteristics ordinarily. As a shallow learning model, extreme learning machine(ELM), a popular member of the machine learning family, is usually reported in literature for the research of prediction, owing to its preeminent capacity of adaptively learning nonlinear features. For instance, Majumder et al. [33] utilized ELM to obtain reliable and accurate solar power forecasting for 5-minute interval and 30minute interval serials data respectively, whose final experiment results achieved performance of 4.23% and 5.26% respectively according to index. Nevertheless, individual model or its improved type, namely non-combination mechanism, has not enough ability to capture the specific feature of each component, since there are distinguishing volatility and complex relevance in every components. In view of the above, approach of grouping modeling is struggling to retain components characteristics and adapt to their complexity that is remarkably different for individuals, where the determination of complexity for each component is formulated by instrument quantified the tightness of wind power time series data. For the purpose of recession on forecasting difficulty of each component, which still remains disordered and non-stationary information, various subsequences should be categorized into several group with similar feature using method that usually utilizes entropy value [34], [35]. As a tool to describe complexity of time series, developed from approximate entropy (AE), sample entropy (SE) has been widely applied in a myriad of academic fields [36]. For instance, a diagnosis method was present for diagnosis and early battery faults prediction, in which SE was established to estimate battery health [37]. Chen et al. [38] proposed newly hybrid runoff forecasting approach, where the hybrid model adopting sample entropy to measure the complexity of the components is proved to possess the benefits of reducing the time consumption as well as ensuring the prediction accuracy.
Thanks to the less intervention and extremely learning rate, extreme learning machine neural network [39], is more efficient and simpler than other neural networks. However, when the dataset is disturbed by outliers and noise in it, the robustness of ELM is decayed dramatically [40]. Hence a neoteric reduced kernel extreme learning machine (RKELM) is employed to build up the recognition model in initial offline phase, which is designed to enhance the abilities of excellent generalization and rapid learning speed. Additionally, it has been proved in experiments that the good generalization of such a machine learning method is mainly caused by the quality of the sequence rather than the length of the sequence [41]. For this point, RKELM is applied to forecast the relatively tight and complex subsequences for promotion of generalization ability of such sequences, as far as other subsets, at the same time, simplified gated recurrent unit network (SGRU), developed from recurrent unit network (GRU), is taken advantage of prediction in parallel.
Due to considering the characteristics of wind power time series data, a novel prediction structure, which involves EWT, sample entropy calculation, reduced kernel extreme learning machine, slime mould algorithm optimization algorithm and simplified gated recurrent unit network, is built in this article. For one thing, decomposition that is achieved by EWT for original wind power serials datasets is aiming to obtain various sub-sequences with distinctive entropy values, which are regarded as principle trend, periodic trend and residual trend, respectively. Subsequently, to further process the trend subseries with lowest value via computing of sample entropy, SGRU is employed to achieve prediction modeling. For the remaining ones with comparatively higher values, are predicted applying RKELM optimized by SMA to decide best parameters for subsequent modeling process. Afterwards, two groups of sub-sequences are integrated into the final prediction results. In summary, primary contributions and innovations of this study are given as follows in detail: 1) Owing to the insensitivity for parameters of EWT, complex time series data is decomposed and reconstructed by EWT method in order to weaken the non-stability of raw data. 2) For the better stability and accuracy of the combined wind power prediction model concurrently, the parameters of RKELM and the matrix shape of decomposed components are both optimized to be decided by SMA optimization. 3) A novel deep learning network layer that is improved by simplifying GRU, namely SGRU, possessing the characteristics of broadly similar computational performance and even less computing time, is designed in this study, after which forecasting indicators of cases has confirmed it actually in the tail of the article. 4) An hybrid prediction method based on EWT-RKEML-SGRU with SMA based consists tow branches, SGRU modeling and RKELM modeling, into which the subsequent is brought ultimately via judging from the largeness after target calculation by sample entropy. That is one with lower value belongs to the SGRU modeling group, while the others belong to the RKELM modeling group.
In addition, the theories foundation of EWT, SE, SMA, RKELM and SGRU are provided thoroughly in Section II. Section III describes the method and steps of prediction of wind power. In Section IV the research dataset is described briefly, after which experimental procedure and comparative assessments are demonstrated, respectively. Lastly, the conclusions are discussed in Section V.

II. BACKGROUND KNOWLEDGE A. EMPIRICAL WAVELET TRANSFORM
The EWT was proposed by Gilles firstly in 2013 as a novel composition method for processing series data. Compared with other kinds of wavelet transform method, EWT could analyze adaptive decomposition of signal due to utilizing Fourier spectrum. To achieve the aim above, the EWT employs several basic function, which mainly including empirical wavelet function and empirical scale function. In each frequency band, the empirical wavelet function usually indicates the band-pass filters, while the empirical scale function defines the low-pass filters. The steps of EWT method can be summarized simply are shown as follows: Step 1: The spectrum diagram of the signal is obtained by Fourier transform of it.
Step 2: Fourier spectrum is divided into N consecutive segments firstly. Furthermore, segments could be sorted order by descend. Actually, we should retain M maximums if M is smaller than N , otherwise N − 1 maximum are kept.
Step 3: What was retained in step 2 builds the empirical scaling function and the empirical wavelets function according to (1) and (2), as shown at the bottom of the next page, respectively.
Step 4: The approximation parameters and the detailed parameters are calculated according to (3) and (4), respectively .
Step 5: The signal, we need finally, is reconstructed according to (5). where φ 1 indicates empirical scale func- and W ε f (n, t) using the method of Fourier transform, respectively.

B. SAMPLE ENTROPY
As one category of measuring the complexity of a dataset belonging time series, Sample Entropy has been utilized in various fields, including signal analysis, mechanical fault diagnosis and meterology research, which could reflect signal characteristics in view of frequency components of the time series. Compared with other entropies, such as AE, SE possesses not only a better performance for time series than AE in aspects of data length, but also consistent measurements, although which is a kind of development from AE. For the original bounded time series {x i | i = 1, 2, . . . , N }, the specific implementation steps of SE is exhibited briefly below: Step1: Reconstruct the given time series Step3: Calculate the ratio B m i (r) corresponding to the total number of D m (X i , X j ) < r for the i-th vector, afterwards computing the mean of B m i (r), defined as B m (r).
Step4: Deduce B m+1 (r) by iterating from Step 1 to Step 3, after selecting a new window length as B m (r). Step5: For the given bounded time series, the result of SE could be formulated as follows finally: where the calculated SE value is defined as SE (N , m, r), N refers the length of series, m represents the window length and r is the similarity tolerance which is usually set in the range of [0.1SD, 0.25SD] (SD indicates the standard deviation of the time series) [42]. In this study, the value of r is chosen as 0.1SD experiential.

C. SLIME MOULD ALGORITHM
Slime mould algorithm (SMA) is a novel nature behavioursbased stochastic optimization algorithm, whose imitation mechanisms are developed based on the oscillation mode of slime mould [43]. Among the optimization processes, the adaptive weights are introduced to mimic the positive and negative feedback process of slime mold propagation wave based on biological oscillator. Here, the central features of SMA mentioned above are exhibited as below:

1) APPROACH FOOD
The following formula is developed to construct the approaching mechanism of slime mould: where v b is a parameter with a range of [−a, a], v c decreases linearly from one to zero, t represents the t-th iteration, X represents the location of slime mould, X b represents the individual location with the highest odor concentration currently found, X A and X B are the random individuals selected from the population, W indicates the weight vector of slimê moulds, S(i) represents the fitness value of X , DF represents the best fitness obtained in all iterations. The detailed descriptions of the above arguments can be found in [43].

2) WRAP FOOD
The mathematical formula for wrapping food that representing the updating process of slime mould is as follows: where LB and UB denote the lower and upper boundaries of the search range, rand and r denote the random value in [0. 1].

3) OSCILLATION
As mentioned in II-C1, v b oscillates randomly between [−a, a], which will gradually decay to zero as the iterations increase. The value of v c oscillates between [-1, 1] and tends to zero eventually.

D. REDUCED KERNEL EXTREME LEARNING MACHINE
As a frequently-used learning method, ELM is put forward by Huang et al.in 2006 [44], which mainly was designed for single layer feed forward neural networks (SLFNs). Although ELM holds the ability of outstanding generalization and rapid learning [45], it is well known that the ELM has the drawback of tackling with morbid data not well [46]. For this purpose, Kernel Extreme Learning Machine(KELM) was developed, which, however, remains the fact that performance is barely satisfactory in real-time computation.
In order to remedy a defect of traditional KELM(kernel extreme learning machine) in aspect of extremely large time and space complexities in real-time computing, an advanced method, reduced kernel based extreme learning machine, is raised by Deng et al. [47]. In RKELM, n samples has been selected from the raw dataset of N samples, where n N . Thus, both the complexities of time and space are reduced through the method mentioned above. The calculation formula is given as: The algorithm of RKELM could be described briefly as following: a. Select subset of n samples from original dataset possessing N samples, where n N . b. Rebuild the reduced kernel matrix with the shape of N ×n. c. Calculate the values of final prediction which tends to better generalization performance via formula as follow: where K refers to the matrix of rectangular kernel.

E. SIMPLIFIED GATED RECURRENT UNIT NETWORK
Similar to LSTM, a widely used typical cyclic neural networks and effectively solving problem of the disappearing gradient when time lags are continuing, GRU proposed as an development structure of LSTM [48], consists of update gate and reset gate, which leads to lower computing consumption than LSTM and could ignore the loss of performance for modeling, and its inner structure is shown in Figure 1.
Taking into account the potentially exceedingly long computing times in practical modelling, a simplified version developed from GRU is proposed in this study, namely SGRU, among which only two activation functions and four gates are retained, and the internal data flow characteristics are basically Inherited from GRU in form. Figure 2 illustrates the internal structural composition of SGRU clearly, and also its mathematical expression is as follows: where x i , as output of the previous layer information, is passed into the structure, at the same time, t i that contains  the implied information of previous layer is also passed. After a gated calculation by sigmoid function, both x i and t i spliced together are sent into tanh activation function in order to scale the value into the range of [−1, 1]. Finally, output with implied information, namely t i , is sent off to the next one, thus memorizing the current information. And sigmoid function and the tanh function are defined below respectively: tanh(x) = e x − e −x e x + e −x (12) VOLUME 8, 2020 As can be seen from the formula above, the output of this structure can be obtained via relatively fewer calculations, which considerably reduces the cost of operation, in addition to guaranteeing the transfer of implied information that is involved in current layer.

III. SHORT-TERM WIND POWER FORECASTING MODEL BASED ON EWT, SMA, AND RKELM MULTIPLE PARAMETERS OPTIMIZATION A. MULTIPLE PARAMETERS OPTIMIZATION BASED ON SLIME MOULD ALGORITHM
There exist two dominating factors affecting the forecasting performance of the developed combined system, i.e., the rationality of the refactor form for input matrixes and the parameters of the forecasting model. In the previous investigations applying the decomposition methods and parameter optimization, the input matrixes for the predictor are fixed by the subjective experience of scholars, where the different components equipped with same reconstruction arguments should be reconsidered in accordance to the diversities of the input structures. For this purpose, phase space reconstruction, which transforms the unidimensional series into the high-dimensional feature space considering the time delay and embedded dimension simultaneously, is employed to construct the input matrixes appropriately. However, the optimization for phase space reconstruction and the predictors are generally implemented separately, which isolates the correlation between the arguments of the above modules as well as increasing the computational cost significantly. Focusing on improving the optimization efficiency and forecasting accuracy in this study, a multiple parameters optimization strategy is proposed to determine the optimal shape of input matrix and the appropriate arguments of RKELM.

B. MODEL ALLOCATION FOR VARIOUS COMPONENTS
The application of decomposition methods has been investigated in the field of series prediction widely. It is worth noting that a fixed forecasting model is employed for each decomposed component in the previous literature, despite some scholars pay attention to the adaptive parameter optimization for the prediction model on each component. However, the computational resource of the whole combined model will significantly increase with the number of components increase. Therefore, a model allocation strategy considering the balance of forecasting accuracy and computational complexity is developed in this study. Specifically, we apply SE to measure the complexity scale of each component, which is aimed at the partition of the tendency item and the the random ones. It is worth mentioning that the series will be difficult to predict with the corresponding increase of SE value. Hence, the newly constructed deep network, namely SGRU, is adopted to predict the tendency component with minimum entropy value, while the random items with larger entropy values are predicted by RKELM with above multiple parameters optimization. Applying the model allocation for  the components based on SE, the training time of the entire combined model can be significantly reduced without significantly forecasting capability decrease.

C. ENTIRE STRUCTURE OF THE PROPOSED MODEL
In summary, the detailed procedures of the proposed model adopting the aforementioned modules, i.e., EWT-based feature selection, SMA-based mutilple parameter optimization as well as SE-based model allocation, can be summed up as follows: Step:1 Employ EWT to extract the tendency item and the random ones with appropriate parameters.
Step:2 Determine the prediction model for each model in accordance to the corresponding SE value of each component.
Step:3 Train the SGRU model for the determined tendency component, and train the optimial RKELM model for the random components using the SMA-based mutilple parameter optimization.
Step:4 Sum all the predicted results up to obtain the ultimate prediction values of training set. Additionally, the entire flowchart of the composite model consisting of above supplemental strategies is exhibited in Figure 3.    monitored from Jan. 18, 2020 to Jan. 29, 2020 and Jan. 30, 2020 to Feb. 10, 2020 are considered in this study, which is measured with time period of ten minutes as well as consisting of 1704 samples. The diagrams of above mentioned two wind power sequential series are depicted in Figures 4 and 5, respectively, which demonstrates significant intermittent, randomness and volatility. Besides, the datasets are partitioned in to three parts, namely training set, verification set and testing set, to achieve the model construction, optimal parameter selection and performance evaluation, respectively, where the corresponding partition ratio is 3:1:1.

B. EXPERIMENTAL DESCRIPTION
In this section, the particular description of the contrastive models and the employed performance evaluation indicators will be introduced in detail.

1) CONTRASTIVE MODELS
In this study, a composite forecasting model integrating decomposition approach, prediction model allocation strategy and SMA-based multiple parameters optimization is proposed to achieve accuracy prediction results. To comprehensively analyzed the contribution of each developed module, a series of single forecasting models and combined models are carried out for contrastive discussion. Among the single models, SVR, RKELM, BPNN, GRU and SGRU are applied as the benchmark models to reveal the difference of each single models as well as the improvement of the newly developed SGRU model. On the basis of RKELM, CEEM-DAN and EWT are combined with RKELM successively, thus illustrating the performance improvements achieved by CEEMDAN and EWT. Moreover, EWT is further fused with SGRU to compare with EWT-RKELM, with which we intend to demonstrate the predictive performance robustness of RKELM and SGRU for various subseries.

2) PERFORMANCE EVALUATION
To quantitatively analyse the improvements achieved by different modification strategies as well as revealing the differences among the proposed method and the contrastive ones, the indicators including root-mean-square error (RMSE), mean absolute error (MAE) [49], grey relational analysis (GRA) [50] and Diebold-Mariano (DM) test [51] are employed in this study. Among the indexes mentioned above, RMSE, MAE, MSE and GRA are selected to reveal the forecasting accuracy of each model, where MAE can prevent errors from canceling each other out, reflecting the actual situation of the error. Additionally, the DM test can demonstrate the significance superiority of the proposed forecasting approach compared with the contrastive models. The detailed definition of RMSE, MAE, MSE are depicted as follows: where y i andŷ i denote the actual value and the predicted value, respectively. In addition, the specific formulation for GRA is sequential listed as: (1) normalized the reference and comparison sequences; (2) calculate the correlation coefficient among the above two normalized sequences at each time; and (3) formula the mean value of all the correlation   coefficients to obtain the grey correlation degree. Furthermore, the specific calculation for the DM test is presented as below: where s 2 denotes the estimation for the variance of d h = L(error 1 i ) − L(error 2 i ), where L indicates the loss function adopted to evaluate the prediction accuracy of two models. For the given confidence level α, the performance difference between the proposed approach and the comparison ones can be claimed as less applying null hypothesis H 0 , while H 1 is the inverse hypothesis of H 0 . Meanwhile, H 0 will be rejected in the condition of that the DM values are outside the interval of [−Z α 2 , Z α 2 ].

C. EXPERIMENTAL RESULTS AND ANALYSIS
The components obtained by performing EWT on the collected two datasets are depicted in Figures 6 and 7   possessing larger SE values, where the obtained optimal arguments for each component are expressed in Table 1. To verify the superiority and availability of the proposed model, representative single forecasting models and decomposition-based combined models are chosen for comparison. The forecasting results of different models among two cases are shown in TABLE 2 and TABLE 3 respectively. As to single forecasting model, a newly developed deep  learning approach regarded as SGRU can achieve better forecasting results in all cases contrasted with SVR, RKELM, BPNN and GRU. Taking case 1 as an instance, the metrics of RMSE and MAE obtained by SGRU are 66.4231(KW) and 54.0535(KW) respectively, which has decreased 52.45% and 45.45% compared with the average value of remaining single models. The metric GRA of SGRU has increased 7.95% and 7.13% contrasted with BPNN and GRU respectively. It can be concluded that SGRU possesses better performance on excavating potential information of raw data. Thus, SGRU is proposed to predict wind power in this article.
As to decomposition-based combined models, CEEMDAN-RKELM, EWT-RKELM and EWT-SGRU can achieve better performance compared with RKELM and SGRU in all cases, which attributes that CEEMDAN and EWT can reduce the non-stationarity of raw wind power  data. For case 2, the decreasing rates of RMSE and MAE between EWT-RKELM and RKELM are 50.77% and 50.61% respectively meanwhile the metric GRA of EWT-SGRU has improved 8.74% contrasted with SGRU. Additionally, EWT-RKELM can achieve better forecasting performance compared with CEEMDAN-RKELM, where the metrics RMSE and MAE are decreased by 43.6% and 33.14% respectively meanwhile the metric GRA is increased by 9.03% VOLUME 8, 2020 in case 2. Accordingly, EWT, in this article, is selected as preprocessing approach, which possesses better capability in decomposition.
Additionally, the forecasting results of the proposed model are superior to EWT-RKELM and EWT-SGRU, where the descent rates of the metrics RMSE and MAE between the proposed model and EWT-SGRU are 7.89% and 2.78% respectively in case 2. The comparative experiment demonstrates that the forecasting accuracy can be improved when RKELM and SGRU are employed to analyze high frequency series and low frequency series severally. Therefore, the proposed model can achieve satisfactory result in the field of wind power forecasting.
In this section, the prediction result visualizations of contrastive models are demonstrated and discussed separately. The comparison between the proposed model and other models on metrics MSE(KW), RMSE(KW), MAE(KW) and GRA are displayed in Figures 8 -13. It can be seen from Figures 8,9 and Figures 11 and 12 that the proposed model possesses better forecasting results and shorter error blocks compared with remaining models. As to Figures 10 and 13, the maximum value of metric GRA are made of the proposed model among two cases, which further testifies that the proposed model can obtain higher forecasting accuracy. Subsequently, the fitting curves of models belonging to single one and combined one with decomposition-based are displayed in Figures 14, 15 and 16, 17, respectively. As can be observed from Figures 14 and 16, the fitting curves of the single models diverges from the actual values, while the fitting curves of the decomposition-based combined models are relatively close the actual value in Figures 15 and 17. This reason is that decomposition-based approach can weaken the non-stationarity of raw wind power series. Moreover, the fitting curves of errors distribution for different models are shown in Figures 18 and 19 intuitively. Taking case 1 as an example, the proposed model possesses the lowest deviation compared with remaining models. Ultimately, Figure 20 has been demonstrated obviously that the scatter plots illustrate the correlation distribution of each model. At the same time, it also could be observed directly from Figure 20 that the model proposed is uniformly distributed over the regression line. Similar conclusions can also be observed from Figure 21. Based on above discussion, the superiority of higher forecasting accuracy obtained by the proposed model can be verified effectively. This reason is that decomposition-based approach can weaken the non-stationarity of raw wind power series.

V. CONCLUSION
In this study, a novel structure for wind power forecasting, based on the EWT decomposition, the SMA algorithm, the Sample Entropy theory, the RKELM and the SGRU network, is present for wind power forecasting. The EWT method, as a pre-dealing, is utilized to decompose raw wind power forecasting series data into three sub-sequences, which are generally regarded as primary trend, periodic trend and residual trend. The entropy value of those sub-sequences, which possess different complexities, could be calculated by taking advantage of Sample Entropy theory. As one part of prediction, sub-sequence, whose value of entropy is the lowest than others, is substituted into SGRU, a new deep learning structure improved from GRU, for prediction modeling. Meanwhile, RKELM network, combined with SMA optimization algorithm based, is exploited to determine the optimal model value of d, τ , g and C for the remaining sub-sequences, after which the two of the original subsequences, with optimal parameters, are fed into RKELM network in order to build two prediction model as well. Then, the prediction results are finally obtained via merge all of the sub-sequences prediction data, which are produced from SGRU and RKEML, respectively. In summary, the experimental results obtained by the proposed hybrid prediction model performed on two short-time wind power serial datasets in various periods illustrate that the the proposed model achieves significantly improvements compared with the relevant contrast models.