Research on a Novel Combination System on the Basis of Deep Learning and Swarm Intelligence Optimization Algorithm for Wind Speed Forecasting

Wind speed forecasting takes a significant place in electric system owing to the fact that it has significant influence on operation efficiency and economic benefits. Aimming at improving forecast performance, a substantial number of wind speed prediction models have been proposed. However, these models have disregarded the limits of individual prediction models and the necessity of data preprocessing, resulting in poor prediction accuracy. In this study, a novel forecasting system is proposed consisting of three modules: data preprocessing module, individual forecasting module and weight optimization module, which effectively achieve better forecasting ability. For data preprocessing and individual forecasting module, more regular sequences are obtained by decomposition technology, and association features are extracted by deep learning algorithm in prediction module. In the weight optimized module, the combination method base on the multi-objective optimization algorithm and nonnegative constraint theory are used to improve the prediction effectiveness. The combination model successfully exceeds the limits of individual predicton models and comparatively improves prediction accuracy. The effectiveness of the developed combination system is evaluated by 10-min wind speed in Penglai, China. The experiment results indicate that proposed forecasting system is better than other traditional forecasting models on three real wind speed datasets indeed.


I. INTRODUCTION
With the improving attention to clean energy, the utilization rate of resources is raising day by day. At the same time, the resource lacking has also become a critical problem to be solved. As widely distributed, pollution-free renewable energy, wind energy is the theme of new energy resources analysis and development. Wind energy is a kind of fastest growing renewables, and is considered as an alternative to traditional fuel-fired electricity generation. Prediction of wind speed is the soul of wind energy system and takes a vital place in the supervision of wind farms. The precisely wind speed forecast is also significant for improving wind energy utilization and stable electric system operation. In contrast, incorrect wind speed prediction can lead to unfavorable decisions and wind power systems can be caused huge economic losses. Recently, the electric power produced by wind power has been increasing greatly. Wind power is advance in the aspects of its reliability, good ability and low price, and the utilization of wind energy helps to reduce air pollution, which is the largest environment task for most regions and countries [1]. Wind speed forecasting brings the decision-making challenge to electric system running due to the prediction has its own error [2]. Conventional approaches of wind speed prediction pay close attention to the potential features of previous data and the effects of numerical weather on wind speed [3]. In recent decades, some artificial intelligence forecasting models have been developed for wind speed prediction due to the rapid development of artificial technology, such as artificial neural network (ANN) [4], [5], fuzzy logic method [6] and SVM [7]. Short term wind speed prediction methods could be separated into 4 classes [8]: (a) physical approaches; (b) statistical approaches; (c) artificial intelligence methods; and (d) hybrid model.
The physical model is the numerical weather forecast, which mainly uses the detailed information of the lower atmosphere to analyze, mine and forecast [9]. This model is based on the basic information of the wind turbine provided by the numerical weather forecast system, parameterizes the physical phenomena according to the primary conditions and the nonlinear partial differential equation system, and then obtains a series of different meteorological parameters [10]. For example, Wilgan et al. [11] established a comprehensive neutral atmosphere model with high spatial-temporal resolution for prediction. The combined model will be according to the NWP model, with high spatial resolution and vertical message about different meteorological parameters. Statistical model is according to a large number of historical data to study the prediction model, without considering the impact of many meteorological factors. Milligan et al. [12], [13] used ARMA model to forecast wind power of wind farms in the United States. Maatallah et al. [14] combined Hammerstein model and AR model, and thus proposed a new wind speed forecasting model, which finally achieved better prediction effectiveness. Nevertheless, on the basis of the hypothesis that there is a linear model between time series, statistical methods can't successfully capture the nonlinear structure [15].
In addition, with the rapid development and wide application of artificial intelligence algorithm, a lot of scholars already successfully used artificial intelligence method to carry on wind speed forecasting. JursaR [16] proposed a short-term wind power predicting model according to particle swarm optimization (PSO). Amjady et al. [17] established a prediction model on the basis of the ridge neural network (RNN) as a prediction engine to effectively predict wind power. An advanced prediction model put forward for very short-term wind power prediction in [18] is combined the adaptive Bayesian learning and approximated Gaussian process. In [19], Zhang et al. used the radial basis function (RBF) network as well as the swarm intelligence optimization algorithm to forecast the wind speed in the interval, and finally achieve high prediction accuracy. As the artificial intelligence model has strong nonlinear prediction ability, it is generally better than the time series model [20].
Reviewing the previous literature, it is believed that the above prediction methods own certain of intrinsic shortcomings. The shortcomings of these approaches are summed up bellow: (1) Physical model is easy to model and cost less. However, compared with the statistical model, the physical model has more requirements on data. Models based on air pressure, terrain and temperature are usually used for wind speed prediction of long-term situation of various weather variables [21].
(2) Statistical methods include ARMA [22], ARIMA [23], fractional ARIMA (FARIMA) [24] and so on, whose major restriction is the pre-assumed linear form of the model. And exponential smoothing (ES) [25] is established according to the relationship between the variables, using mathematical statistics method to represent the latent correlation of historical samples in wind speed prediction. Nevertheless, the elegant statistical model according to the linear structure between time series does not grasp the nonlinear mode of wind speed time series well. Only in some specific cases can statistical models achieve higher accuracy [26].
(3) Different from other methods, artificial intelligence algorithm can effectively find the implicit non-linear relationship between historical data, which has been widely studied and applied in solving complex relationship and accurate prediction. However, there are still some shortcomings in artificial intelligence methods, such as local optimization, over adaptation and relatively low convergence rate.
(4) The hybrid model integrates two or more methods, such as data preprocessing and prediction model, and integrates the advantages of each method. With the disregard of the limits of individual prediction models, the forecasting power of the hybrid model seems to need improvement.
Therefore, in view of the shortcomings of the above prediction methods, several combination wind speed prediction models are proposed. Bates and Granger proposed a combination prediction theory with good results in 1969 [27]. Since the 1970s, the research of combinatorial forecasting model has been widely concerned [28], [29]. In the aspect of wind speed forecast, Retrospected and sorted the combination wind speed prediction models, Xiao et al. proposed two combined wind speed prediction models which have preferable prediction abilities: the first is no negative constraint theory (NNCT) combined model, the second is meta-heuristic algorithm combined model [27]. A powerful combination prediction model was developed by Wang et al. in 2017, which includes GRNN, RBF, WNN and BP. The model combines these single prediction short-term wind speed prediction models applying meta-heuristic optimization method MOBA. The results show the established combination model is superior to individual model in the aspect of accuracy and reliability of forecasting.
Generally speaking, the goal of combinatorial prediction mechanism is to search the best weight by diminishing the total of prediction error squares of training sets provided by each model. Nevertheless, quite a few conventional wind speed prediction models ignore the importance of accuracy and stability, and these two are equally significant in the effective prediction of wind speed. Ideally, when the weights of member model in the combination model are obtained by optimization algorithm, the excellent accuracy and stability are obtained at the same time. Multiple objective problems (MOPs) aim at the simultaneous optimization of multiple conflicting objectives, which has aroused extensive research interest. For a nontrivial optimized question, a range of solutions exist when optimizing each objective concurrently, rather than a single solution, and this is different from single objective optimization.
However, wind speeds are often highly nonlinear, irregular, and nonstationary, but upward moving in secular trend. Many of the traditional predictive models ignore the both the accuracy and stability, which is significance to the wind speed prediction. The leading dedications and innovations of this article are summed up as: (1) An effective data preprocessing method is selected as the step before the model forecast. The preprocessing approach is used to diminish the objectionable effects of highly frequency noise and collect the major features of the data to achieve higher prediction accuracy, according to the decomposition and integration technique,.
(2) The deep recurrent neural network is successfully adopted in this study to build the combined model. Using the deep recurrent neural network to predict the wind speed has higher accuracy than the classifier without considering the time dependence of the encoding in the wind speed time series signal.
(3) A more effective combination weight optimization method, multi-objective optimization algorithm, is adopted to calculate the weights of member model in the combination system. A group of non-dominated solutions are usually produced by the MOP, which are called Pareto optimal solutions. Every goal corresponds to the point on the Pareto frontier, and can be enhanced only by reducing one or more other goal.
(4) According to the three hybrid artificial neural networks, the new combination model developed successfully to enhance the prediction effectiveness of wind speed. The developed combination model utilizes the advantages of the member model and conquers the drawbacks of the traditional individual model with low accuracy and instability effectively.
The process of established combination prediction system architecture is shown as: I. In the first stage, in consideration of the randomness and non-determinacy of the original wind speed series, a nonparametric, data driven and adaptive time series preprocessing technology is used for wind speed forecast to collect the major characters of the original wind speed sequences.
II. Three individual models including the MLP, LSTM and ARIMA are look upon as prediction methods to forecast the preprocessed wind speed sequences respectively.
III. According to the prediction results of MLP, LSTM and ARIMA, a combined model is established, and the Fig.1 shows the structure of the combined model.
IV. MOPSO is applied to obtain the optimization of the weights of member models in the combined model. The proposed MOPSO architecture is also displayed in Fig.1.
V. Through meta-heuristic algorithm discussing and prediction effectiveness evaluation of the prediction system, the forecast ability of the model and the forecasting availability is further verified.
The other part of this study is designed as: In Section 2, EEMD and MOPSO, as well as the establishment of the combination model on the basis of MLP, LSTM and ARIMA model and the theory of the combination prediction model are introduced. The numerical experiment results of the forecasting system are presented in Section 3. The discussion on the meta-heuristic algorithm and forecasting system are shown in Section 4. Ultimately, the conclusions are presented in Section 5.

II. COMBINED FORECASTING SYSTEM DESIGN
For the sake of acquire precise and steady forecasting results at the same time, the system developed in this article is divided into three stages: data preprocessing stage, single model forecasting stage, combined forecasting with multi-objective optimization stage. MOPSO is adopted to calculate the weighting coefficient of each model to establish a combination model. Fig.1 displays the flowchart of the established combination system.

A. ENSEMBLE EMPIRICAL MODE DECOMPOSITION (EEMD)
Firstly, the empirical mode decomposition (EMD) put forward by Huang et al. is reviewed. Then decompose the signals, in order to handle the nonlinear and non-stationary data. The EMD can disintegrate the complex initial signal into intrinsic mode functions (IMF) and residuals. In the EMD method [31]: Definition 1: The data X (t) is disintegrated into IMFs, c j where r n is the residue of data X (t) , and X (t) is defined as follows [32]: As a result, a series of IMFs c j and a residual r n are obtained. And the IMFs c 1 , c 2 , · · · , c n contain the bands of different frequency.
Definition 2: The EEMD produces a set of datasets which add different white noise ω i (t) to the initial data firstly. In this case, the i-th observation X i (t) will be set as Next, these new datasets are conducted using EMD. Last, the set average of IMFs obtained by different decomposition is calculated. For a time series X (t) , the steps of EEMD are [33]: A new sequence Y (t) is generated by adding white noise to the original sequence X (t) . Determine all the local maximum and minimum of the time series Y (t) . The upper envelopes Y (t) and lower envelopes e l (t) of Y (t) are generated.
Calculate the average m (t) from upper and lower envelope.
Pick up the different value between the data Y (t) and m (t) as the first component h (t) The residue r i is regarded as a new series and the above steps are repeated to obtain the whole r j and a residue c n . By summarizing all IMFs and residues: MLP is a feedforward method mapping the input set to output set, and its weight is an adjustable model parameter [34]. MLP is composed of multi-layer sigmoid processed neurons or elements, which reach interaction through weighted connecting. Considering the structure shown in Fig. 2, the neuron outputs in each layer except the input layer are shown below: Among them, ω h ji is connecting weight between the i-th neuron in the h-th layer and the j-th neuron of the (h + 1)-th layer, and y h i is the condition of the i-th neuron in the previous h-th layer. For input layer nodes, it can be obtained that y 0 j = x 0 j , and this is the j-th component of the input vector. For a given network weight vector w, the minimum mean square error in the output vector is decided as where d j,c is the expected condition provided by the teacher and y H j,c (w) is the condition received by the output node j in the H layer of the input-output situation c. One way to minimize E is to use gradient descent method to update each weight repeatedly starting from any group of weights.
Among them, ε is the positive constant controls the descent and 0 ≤ α ≤ 1 is the momentum coefficient. Besides, t is the current iteration numbers in procedure. The error E in Eq. (8) can be reduced to minimum after the multiple times of scan of the training set.
The main purpose of the training is to simulate the procedure of data generation [35]. Excellent generalized ability can make well prediction for the experiment data [36]. For the sake of assessing the performance of the model, the available data must be separated into three subsets: training set, testing set and validation set [37], [38]. The train set can be applied to calculate network weight value and bias values. In the train stage, testing set is applied to ensure the generalized ability of the training model, while validation set is applied to test the generalized ability of the training model. The error function is applied to assess the prediction ability of the model. By measuring the range from the network prediction to the target, the range from the target to the prediction is provided. Fig.2 displays the construction of MLP.

C. LONG SHORT-TERM MEMORY (LSTM)
LSTM network is a special recycle neural network (RNN) [39], and its inner structure is shown in Fig.3 LSTM provides precise control over the algorithms embedded in memory and dislodged from the hidden layer memory. This is obtained by combining three gates, and they dominate the inflow and outflow of memory units: input, forgetting and output gate [43].
Definition 2: The condition of a memory cell at time t are described and the connection from time t − 1 to time t + 1 are repeated. Fig.3 shows that two vectors remain unchanged in time t − 1: the hidden vector h t−1 and the storage unit state s t−1 . Eq. (2) shows that the forgetting gate f t decides what can be removed from the memory unit state. That is to say, it compels the storage unit to forget the unimportant stuffs based on error back propagation: Among them, the weight matrix w xf is from input x to forgetting gate f t , the weight matrix w hf is from the prior hidden vector h t−1 to forgetting gate f t , x t represents the input at t time, and b f is forgetting gate deviation. The i t is the input gate which decides the amount of every element of the candidate renovate vector to the relevant storage unit element in time t according to the recursive connecting between the hidden vector h t−1 and the successive input x t at time t.
Definition 3: The gate outputs from the complete connection tanh layer and scales the candidate update vector: The delicate balance needed to maintain the state of memory cells in long series is achieved by discarding old message and merging new message. And provide new message by joining a part of each value given by the input gate product of in the candidate renovate: Finally, for the designative time context, the output gate determines the content of the output from the storage unit state to the hidden vector to minimize errors.

Definition 4:
The output gate o t and hidden vector h t can be calculated as: Every gate uses the S-shaped function σ in the direction of the element to proportional change each gate vector element to the value in the range of [0, 1]. The gate function is realized by selecting the value vector in the range of [0, 1], and multiple with another vector, so as to specify the value that the second vector passing the gating account for the whole vector; otherwise, determine which parts are blocked [40]- [42].
ARIMA model, proposed by Box and Jenkins [44], is an important prediction model.
Definition 5: ARIMA model is defined as: Among them, y i (i = 1, 2, . . . , t) is the actual value, φ i and θ i are the coefficients, ε i (i = 1, 2, . . . , t) is the random error at time t, and p, q are interrelated numbers, usually called autoregressive polynomials and moving average polynomials [45]. This model considers the objective sequence as a random sequence, and applies the certain model to approach the sequence. The model is divided into three stages: model identifying, parameter estimating and diagnosis test. Then the model is applied to predict the sequence.
Step 1. n individuals are randomly opted from the population; Step 2. Non-dominated position of each individual is obtained; Step 3. The crowd distance of rank equaled solution is calculated; Step 4. Finally, the solution with the smallest rank is selected. In addition, the individual with the largest crowd distance should be chosen in the case that more than one individual has the lowest level.
Similar to NSGA-II, the step1, step2 and step3 of the algorithm MOPSO are initialization, fast non-dominated sequencing and congestion distance. The fourth step is to update the speed and position of particles (chromosomes) through formula (13) and formula (14): where w is inertia weight, and coefficients c 1 and c 2 are the required acceleration constants of pbest and gbest. r 1 and r 2 are random uniform numbers in [0, 1]. By studying the parameters of inertia weight, a linear decreasing inertia weight is introduced into the initial particle swarm optimization algorithm, which largely enhances the performance of the algorithm [50]. Moreover, the linear distribution of inertia weight is shown below [51]: where NOG is the maximal iteration numbers and iteration is the current iteration number s. Eq. (15) gives the updating method of inertia weight, which considered that w max and w min are primary weight and final weight. In this study, we use the parameters w max = 0.9 and w min = 0.4, which Naka et al. [50] as well as Kennedy et al. [51] have been investigated in their work. Specially, the arranges and crowd distance of new chromosomes are determined firstly. The pseudo code of MOPSO is described in Appendix.

F. COMBINED FORECASTING THEORY
The combination prediction model is a kind of prediction approach that selects the optimal weight to combine each of the prediction models according to the forecasting results of different forecasting models. The main purpose of this method is to diminish the disadvantage that single prediction models may meet and enhance the prediction ability.
The combined of single model could take full advantage of wind speed message due to the different prediction models own different forecasting abilities. Each prediction model possesses its special features, reflecting different aspects of the prediction object. This kind of feature does not represent the whole feature of the prediction object, but cannot be ignored. Better prediction performance can be obtained by properly combining different individual artificial networks. The framework of the composite model is as follows: Definition 6: The conventional prediction combined theory tries to get the optimum weights of the combination model on the basis of minimizing SSE: where L = (l 1 , l 2 , · · · , l m ) T is the weight vector, R = (1, 1, . . . , 1) T is a column vector whose entire elements are 1. E ij = e T i e j , for e i = (e i1 , e i2 , · · · e iN ) and E = E ij m×m named the error matrix.
The weight coefficients has no limit in the interval [0, 1] in Eq. (19). The final results indicates that when the weight vector is obtained the value in the interval [−2, 2], the combined model can get ideal results. This work supplies a weight determination approach that is evaluated by numerical simulation rather than theories testify. VOLUME 8, 2020

G. COMBINATION SYSTEM ESTABLISHMENT
According to the mythologies discussed above, the established system is mainly established by three modules, including data preprocessing module, individual forecasting module and weight optimization module.
One: Data preprocessing module.
The SSA preprocessing technique is selected in the established model to obtain a reconstructed sequence by refining and identifying the period and vibration parts of the original signal. Through this method, a time series with less noise signal and random volatility can receive to apply in the following forecasting steps.
Two: Individual forecasting module.
Three individual forecasting models-MLP, LSTM and ARIMA-are selected to carry on wind speed forecasting, respectively. And three forecasting results are obtained by this operation. There are two neural networks for nonlinear prediction and one linear prediction, and the models of both the linear and nonlinear prediction is excellent in predicting wind speed.
Three: Weight optimization module.
For the sake of obtaining the weight coefficients of each model, a kind of decision-making weight method on the basic of MOPSO algorithm and nonnegative constraint theory is proposed to gain the optimal consequences. Say concretely, the last three days of the training set are retained to obtain the weights of selected models. It is remarkable that when the algorithm reaches the maximum iteration number or the minimum fitness function value, it will stop. According to the weight coefficients of each model, the prediction consequences of each model are combined for getting the final wind speed prediction results.

III. NUMERICAL EXPERIMENTATION
In this section, the study area of three datasets and the data structure are introduced firstly. Then the model parameter selection and evaluation metrics are displayed. The last is the experimental results for three datasets.

A. STUDY SITE INTRODUCTION AND DATA ANALYSIS
As exhibited in Fig.4, Penglai is situated in the northeast of Shandong, China. Due to its geographical feature of along the coast and monsoon climate, it has rich and huge wind energy resources. It is reported that, the power generation capacity of Penglai electric grid has more than 200 million kilowatts in June 2017, ranking in the forefront of Shandong electric system. In this paper, number seven wind turbine of wind farm of Penglai is selected as the experiment object. The data point is located in the coastal hilly area, with a measurement height of 100m. The average sampling period is ten minute, and the rate of scanning is 144 times every day.
For the sake of assess the practicality, effectiveness and universality of the established new combination system, three datasets are selected from three stations in Penglai for the numerical simulation. Each dataset contains 3600 data points of ten-minute wind speed data, and these observations are separated into training set and testing set. Currently, there is no clear theory explain the method to choose the quantity of train samples and test samples. Too few samples cannot train the neural network well, while too many samples easily make the network over fit. In fact, two thirds of the data in the dataset is usually used as training data, and one third of the data is retained as testing data [52]. In this article, the ratio between training and testing is 5:1. In other words, training samples selects the initial 3000 data points and the testing samples selects the rest of 600 data points. The statistical measurement results of the datasets of wind speed data sampling, including minimum, maximum, mean value and standard deviation, are shown in Fig.4.

B. PARAMETER OPTION OF EXPERIMENT AND FORECASTING
The three datasets A, B and C of wind speed data are applied to evaluate the effectiveness of the new combination system. Three prediction models, EEMD-LSTM, EEMD-MLP and EEMD-ARIMA, are used for comparison, which are the member model of the combined model. National energy administration issued the energy industry standard NB/T31046-2011 and formulated the wind energy measurement rules in 2011. The wind speed time provided by wind farm must be more than 10 minutes, the maximum error predicted by wind energy curve on the next day should be no more than 0.25, and RMSE (root mean square error) must be no more than 0.2.

1) OPERATING ENVIRONMENT
The operating environment of the ARIMA method and MOPSO algorithm is: 3.20 GHz CPU, 8.00 GB RAM, Windows 7, and MATLAB R2016a. The operating environment of the MLP and LSTM is: 3.08GHz CPU, 4GB RAM, Windows 10, Anaconda 3, Tensorflow 1.2 and Keras 2.0. Moreover, the parameter settings in the model are specified below.

2) PARAMETER SELECTION
Through many trials, the model parameters are seted as follows: (a) For LSTM, the network is established by applying Keras. The dimensions of each layer are: input layer is 4, the first hidden layer is 50, the second hidden layer is 100, and the output layer is 1. Setting some parameters as Activation='relu', loss='mean squared error', optimizer='nadam', epochs=400, batch size=16, verbose=2.
(b) For MLP, the network is established by applying Keras. The dimensions of each layer are: input layer is 4, the first hidden layer is 12, the second hidden layer is 8, and the output layer is 1. Setting some parameters as 'activation ='relu', loss='mean squared error', optimizer='nadam', epochs=400, batch size=16, verbose=2. (c) For MOPSO, the fitness function is MAPE and RMSE. Number of decision variables is 3, the lower bound of variables is −2, the upper bound of variables is 2, the maximum iterations are 1000 and the population size is 200.

C. EVALUATION METRICS
For the sake of effectively testify the prediction ability of the combination prediction model with changing weights, the model is usually evaluated based on evaluation indicators. In this study, eight performance evaluation criteria are adopted to estimate the prediction effectiveness of the combined model, which contain AE (average error) [53], MAE (mean absolute error) [54], MAPE (mean absolute percentage error) [55], MSE (mean square error) [56] and RMSE (root mean square error) [57] to evaluate the average error between the forecasted and actual value, and the I MAE , I MSE , I MAPE represent the percentage error of the benchmark wind speed forecasting models in this study compared to the developed combination prediction model. If the value of these indexes is the smaller, the developed model is considered as owning the excellent prediction performance. See Table 1 for the details of the mentioned metrics.

D. EXPERIMENT I: THE PROPOSED MODEL COMPARED WITH INDIVIDUAL MODELS
In the trial, using the dataset A, B and C (10-min wind speed data), while each dataset selects 3000 samples as the training set, the combination model is trained to predict 600 samples in the future rested. The time series of the predicted value is shown in Fig.5 and Fig.6.

1) EXPERIMENTAL RESULTS DESCRIPTION
The experiment is set as a comparison the forecasting performance between proposed model and three single models, including EEMD-MLP, EEMD-LSTM and EEMD-ARIMA. The results are shown in the Table 2 and Table 3, as well as the Fig.5 and Fig.6. From the table and figure obtained, the information of experiment results can be shown as: (1) Table 2 describes the forecasting results statistical properties of the four models. This statistical description VOLUME 8, 2020  can show the prediction state of each model more intuitively. Table 3 shows the prediction error of EEMD-MLP, EEMD-LSTM, EEMD-ARIMA and combined model for dataset A, B and C, which can reflect the prediction ability of the model. From the prediction results, the developed model is superior to all others in all evaluation indexes. For example, The figure on the right shows how discrete the forecast data is to the real data. for dataset A, the MAE, MSE, RMSE and MAPE (%) of the combined model are 0.2259, 0.0994, 0.3153 and 3.5271%, respectively, while the errors of the other models are higher than them. For three comparison models, EEMD-MLP is the best model, and its error index MAE, MSE and RMSE are all 0.01 higher than the combined model, and the MAPE is also 0.212 higher. Other datasets can reach the same conclusion. The worst performance is EEMD-ARIMA. Therefore, the combined model is considered to improve the prediction accuracy. . It is worth noting that when the observed wind speed value encounters difficulties in the slope, the high accuracy prediction result is carried out. In this case, the prediction effect is deteriorated.
(3) Fig.6 displays the actual plots and the predictions wind speed plots of the combination model and other models of dataset A, B and C. The conclusion is that the combination model obtains accurate predictions in most cases. The sequence of prediction errors is shown in part A, and these errors always fluctuate around 0. In addition, it can be seen from part B that the combined model results in a strong relation between the observed wind speed and the predict wind speed. In the same time, the prediction result for dataset B is divergence than the other two datasets, and it can also be confirmed it in Table 3.
Remark: According to the results, the developed model has the best performance and keeps the original curve form better for prediction. Moreover, the proposed model is the best according to the five error indexes. In general, the comprehensive model utilizes the advantages of the other three modes, reduces the redundant information and improves the local optimal value. The prediction accuracy is improved obviously, and the prediction error fluctuation is small.

2) EXPERIMENTAL RESULTS ANALYSIS
Through comparing the combination model with three benchmark models, the evalution metrics are calculated and VOLUME 8, 2020  shown in Table 4 and Fig.7. Through the analysis, the conclusion can be obtained as follows: (1) From Table 4, it can be clearly seen that based on the three standards of I MAE (%), I MSE (%) and I MAPE (%), the proposed combination model is more accuracy than other models discussed in this work. The results present that the model could get more messages from the wind speed and get the prediction results more precisely. More concretely, taking the prediction results of dataset C as an example, compared with the three benchmark models, the improvement of the combined model improved MAPE by 4.8046%, 17.8058%, 37.6125%. Also, MAE and RMSE increased by 6.1584%, 17.1700%, 37.4185%, and 13.3891%, 34.6316%, 63.7266%, respectively. Comparing with other models, the prediction ability of combined model is largely improved, which shows that the combined model does make use of the other three models and improve the prediction accuracy effectively.
(2) Fig.7 includes five parts, where the top three boxplots show the errors of EEMD-MLP, EEMD-LSTM, EEMD-ARIMA and combined model in the three datasets. The results show that the error of the proposed model is the minimum, which means the predicting ability of the developed model is excellent compared with the other three models. Each point errors of the combination model are not only closely related to the zero axis, but also have a very small dispersion level. Part MAE presents that from left to right the surface chart is raising along the X axis, which means that for the same dataset, the combined method produces the smallest MAE, and the EEMD-ARIMA yields the largest MAE value. The other two parts, RMSE and MAPE, can also prove that the combination model is better than the individual model in each aspect.
Remark: Compared with the other three models, the conclusion can be drawn that the combination model owns the highest precision of prediction results. The comparative experiment of EEMD-MLP, EEMD-LSTM and EEMD-ARIMA shows that the prediction ability of EEMD-MLP is better than the other three methods, and the prediction performance of EEMD-ARIMA is the worst. No matter dataset A, B and C, EEMD-MLP always perform best than others.

E. EXPERIMENT II: THE EEMD COMPARED WITH OTHER PREPROCESSING METHODS
This experiment designs the comparison based on different preprocessing methods. The preprocessing methods include EMD, SSA and WDD, as well as EEMD applied in the proposed model. These models are built to emphasize the importance of selecting an excellent data processing technology. Compared with the above methods, the superiority of the model proposed in this study is further illustrated. The prediction error of the experiment is shown in Table 5 and Table 6. The details of the experiment are as follows: (1) For Table 5, the proposed model has the best MAE, MSE, RMSE and MAPE (%) respectively. Secondly, EEMD-combined model, EMD-combined model, SSA-combined model and WDD-combined model are the models with high to low prediction accuracy, for the MAPE value is gradually decreasing, which shows that EEMD is the most suitable preprocessing method for this data. In addition, there are still a few SSA and WDD do not follow this level, but EEMD is always the best preprocessing method. The MAPE of EEMD-combined model is about 3%-5%, which is the best in three datasets. The results show that EEMD-combined forecasting model is the best one at present.
(2) It can be clearly seen from Table 6 that for the three standards of I MAE , I MSE and I MAPE , the effect of EEMD-combined model is significantly improved compared with other preprocessing methods. Specifically, taking dataset A as an example, EEMD improved MAPE by 21.0003%, 42.4767% and 43.5149% compared with the other three preprocessing methods. MAE and RMSE increased by around 30%-50% respectively. For the other two datasets, the prediction accuracy of the proposed model is significantly improved, which shows that EEMD-combined method is indeed more suitable for processing wind speed data than other preprocessing methods, and the signal decomposition is more complete, which can effectively increase the prediction ability of the predict model.
Remarks: It can be seen from this experiment that EEMD-combined model has the highest prediction accuracy and the best MAPE value, and significantly improved MAE, MSE and MAPE. In addition, on the basis of different preprocessing methods, it shows that the proposed model is superior to other models, thus verifying the effectiveness of the model.

IV. ANALYSIS AND DISCUSSION
A great amount of statistical and machine learning models can be adopted to forecast time series, including economic growth, currency inflation, and wind speed time series. This section is absorbed in the efficient characteristic of the developed model in terms of meta-heuristic algorithm performance and computational ability of the system.

A. META-HEURISTIC ALGORITHM DISCUSSING
In order to testify the performance multi-objective optimization algorithm, this section will discuss the fitness functions and convergence of the MOPSO algorithm in detail. VOLUME 8, 2020

1) FITNESS FUNCTIONS
The accuracy and stability of prediction are two commonly selected assessment standard to assess the performance of prediction model. Therefore, it is not enough to consider only one aspect of prediction results, whether accuracy or stability. In the combination prediction model, the goal of weight coefficient optimization should be both good accuracy and stability.
The framework of bias-variance error [58] is used to evaluate both the prediction accuracy and stability of the prediction model. Among this architecture, the accuracy and stability of the value prediction model are respectively reflected by the value of Bias(ŷ) and Variance(ŷ). According to the bias-variance framework, the fitness function contains accuracy and stability of the optimization algorithm is established. The difference between the original and the predict value is considered as the error caused by bias. The variation of prediction results is considered as the error caused by variance.
The y i −ŷ i is regarded as the difference value between the original value y i and the predict valueŷ i . The predicted expected value is calculated as E(ŷ) = 1 N N i=1ŷ i , the expected value of the observed value is expressed as E(y) = 1 where N is the data point numbers to be compared. The bias-variance architecture is decomposed below: Therefore, the fitness function target of the combined model is the setting that minimizing the accuracy and stability of the prediction, and it is obtained as: where a smaller Bias(ŷ) proved that the prediction model has high prediction accuracy. In the same way, a smaller Variance(ŷ) represents more stable. Specially, the parameter definition of object 1 and 2 is shown in Table 7.

2) CONVERGENCE
From the Fig.8, we can see that as the iteration numbers increases, the fitness rate drops rapidly. EEMD-MLP and  EEMD-LSTM reached convergence in 400 iterations, and deep neural network convergence performance was superior to other comparison models. In the initial stage, the fitness of EEMD-MLP is about 0.45, indicating that the performance of the model using random initial parameters is poor. However, as the iteration numbers increases, the fitness rate decreases rapidly, indicating that the neural network finds better parameters in a short time. After 200 times, there was no significant change in fitness, indicating that the model obtained the best parameters. The fitness function value of EEMD-LSTM is very small in the initial state, and it converges to the best area in the first 10 times. It can be seen that the deep neural network is highly efficient in data feature mining during the learning process. In addition, the global optimization ability of MOPSO in the iterative process can be seen from the right side of Fig.8. The final global optimal solution is the point marked in red in the figure by defining the two objective functions as the fitness function from precision and skewness. And use these two aspects to control the final convergence position. During the experiment, the overall optimization process is greater than 20 minutes (see Table 8), but for the three groups of dataset, the first 5 minutes locked the convergence interval, and the latter iteration moved almost within the non-inferior  Table 8, we can see that the dataset B has a higher MAPE than 0.05 when using multi-objective optimization, and RMSE is greater than 0.5, which is larger than the values of the other two datasets. This indicates that the nonlinearity of dataset B is stronger and the forecasting trend is difficult to grasp. Inference can also be confirmed from the experimental results. The proposed combination method has a better search for the convergence performance of each hidden neuron optimal threshold in a highly nonlinear prediction than other methods.

B. ISCUSS FORECASTING SYSTEM
This section conducts five profound discussions of the forecasting system we established, including data processing, train-to-verify ratios, Diebold-Mariano Test, forecasting effectiveness and bias-variance test.

1) DATA PROCESSING
EEMD is applied to preprocess initial time series prior to the forecast the wind speed by the models. Compared with the original data, it is obviously that processed data have accurate and stable display because of the removing of random perturbation. The theory of EEMD has been expounded in section III. Fig.9 is the details of data preprocessing.

2) THE TRAIN-VERIFY RATIO
The train-verify ratio can indicate the extent to which the latest sequence is being used, or its impact on the forecast results. We have configured a number of sequence train-verify ratios to study the influence of train-verify ratios on the results. For the wind speed data of the datasets, the train verification ratio is respectively configured as 1:1, 2:1, 3:1, 4:1 and 5:1. A large ratio means more samples are put into training. On the other hand, a small ratio indicates a small number of samples are involved in training. The experimental results show that the better precision can be obtained by increasing the ratio. It is beacuse appling updated data can enhance the accuracy of training. Nevertheless, that doesn't mean the train-verify ratio can be infinitely expanded in practical application, because in the case of few verification samples, the train verification ratio lacks reliability. Therefore, we suggest to choose a higher ratio when considering the number of samples.

3) DIEBOLD-MARIANO TEST
For the sake of further assessing and discussing the effectiveness of the developed combination prediction model, this section discusses the Diebold-Mariano (DM) test for prediction availability and evaluation of prediction performance [58].
The DM test is used to verify the difference of the prediction effectiveness between the established model and other VOLUME 8, 2020 In the fomula, F represents the loss function of prediction error. And e 1 t , e 2 t are the error sequence predicted by selected model. In addition, the statistics of DM test can be defined in the following ways: In which, S 2 is the estimate of the variance of d i = L(err 1 i ) − L(err 2 i ). Assuming a certain significance level α, the obtained value DM is in comparison with that of z α/2 . Once DM statistics exceed the interval [−z α/2 , z α/2 ], H 0 can be rejected. This shows the predictive performance of the establishd model and that of the comparative model are significantly different, which means that H 1 will be accepted.
In this part, we use the DM test to testify the validity of the proposed model. The comparison are set between all of the following models and the proposed model, that is, EEMD-LSTM and EEMD-ARIMA compared with EEMD-MLP, EEMD-ARIMA compared with EEMD-LSTM. On the basis of the fundamental principle of DM test, no significant difference between the two models of the forecasting effectiveness is the main idea of the zero hypothesis, while there exists significant difference in the forecasting effectiveness of the two models is the main idea of the alternative hypothesis. Table 9 shows the average DM test results for the three datasets.
As can be seen from Table 9, in the multi-step prediction, at the 1% significance level, the established combination model is different from others. In addition, at 5% significance level, for the comparison result of models with different preprocessing methods, the minimum value of |DM | is 0.629747, which can makes the zero hypothesis be rejected. Moreover, for some traditional individual models, including EEMD-MLP, EEMD-LSTM and EEMD-ARIMA, all the values are far greater than Z α/2 (Z 0.005 = 2.58, Z 0.025 = 1.96). Hence, at the 1% significance level, the difference compared the combined model with the individual model is significant. Hence, the established combination model is remarkably superior to other comparative models.

4) FORECASTING EFFECTIVENESS
This part is mainly used to verify the prediction efficiency of the developed model. The forecast effectiveness of the model can be obtained not only by the average prediction error, but also by the mean variance of prediction accuracy [59]. The next section gives the main idea of prediction validity.
The k-th order prediction efficiency unit is obtained by where Q i is the discrete probability distribution and A i is the prediction accuracy at time i, and n i=1 Q i = 1, Q i > 0. An exception is that Q i is defined as Q i = 1 n , i = 1, 2, · · · , n, when the previous information of Q i can't be known under certain circumstance.
Afterwards the k-th order prediction efficiency is judged by H m 1 , m 2 , · · · , m k , among which H is a continuous function, and it has a series of unit. The 1-order prediction efficiency is obtained asH m 1 = m 1 in the case of H (x) = x be the continuous function of one variable. Then if H (x, y) = x(1 − y − x 2 ) represents a continuous function of two variables, H m 1 , m 2 = m 1 1 − m 2 − m 1 2 is called the 2-order prediction efficiency.
In this section, the prediction effectiveness is applied to assess and compare the prediction accuracy of the developed combination model and other comparison models. If the prediction effective value of the model is large, the prediction ability of the model is relatively effective. The mean value of the three datasets is presented in Table 10.
The specific details in Table 10 shows, on the one hand, the predicted effective results of the combined model is always higher than the predicted effective value of other models, regardless of 1-order or 2-order. Hence, the proposed combination model is obviously more effective than the whole other comparative models. On the other hand, the results show that the prediction validity value of EEMD-MLP model is slightly lower than that of other comparison models, which is level at the second effective model compared with the other comparison models.

5) FORECASTING EFFECTIVENESS
The bias-variance architecture is selected to assess the effectiveness of the combination model and other single models. The validity of prediction model is a comprehensive standard to measure the accuracy and stability of forecasting model. When evaluating the validity of the forecasting model, the accuracy and stability of the forecasting model are very significant. No matter accuracy or stability, it is not enough to consider only one standard. The average difference between the observation value y n and the prediction valueŷ n on all the observation and prediction data is the composition of the absolute value of Bias(ŷ). This indicates that if the absolute value of Bias(ŷ) is small, the predict ability of the prediction model is more accurate. In terms of variance, the smaller the Var(ŷ), the more stable the prediction results of the prediction model. Therefore, the bias values in Table 11 indicate that the bias absolute value of other models is greater than the absolute deviation value of the combined model, indicating that the combination model show more accuracy than other models. The values of variance analysis show that the combination model has the most stabilization performance. The results show the combined model is the most accurate and stable model in wind speed forecasting, and its prediction effect is obviously superior to that of a single model.

V. CONCLUSION
With the increasing demand of renewable energy for pollution free energy, the requirement for renewable energy utilization and management into the electric system is also increasing. It should be noted that in the field of prediction, accuracy and stability should be equally important. Therefore, it is imperative to develop a technology which could obtain satisfactory accuracy and stability simultaneously. Nevertheless, as a result of the randomness and intermittence of wind speed, it is hard to get both the accuracy and stability by a single model. In order to conquer this tricky issue, this paper develops a combined model on the basis of EEMD-MLP, EEMD-LSTM and EMD-ARIMA model, and uses the MOPSO algorithm to calculate the weight of the combined model of wind speed prediction. Specially, in each EEMD-ANN model, the EEMD is selected to process the wind speed sequence to improve the prediction performance.
In view of the prediction performance, in dataset A, the mean value of MAPE of the EEMD-MLP, EEMD-LSTM and EMD-ARIMA and combined model are 4.2318%, 3.7391%, 5.5545% and 3.5271%, respectively. In dataset B, the mean value of MAPE for these models is 5.7187%, 5.1124%, 9.2347% and 5.0176%, respectively. In dataset C, the mean value of MAPE of above models are 4.8838%, 4.2168%, 6.4343% and 4.0142%, respectively. Moreover, the error fluctuation of each prediction point of the wind speed combination model is the minimum, which shows that the combination model could enhance the accuracy and stability of the prediction. In this paper, one of the deep neural networks LSTM is adopted to wind speed prediction, and the LSTM is the recycle neural network (RNN) optimized algorithm. Furthermore, to enhance the accuracy and stability of wind speed predicting, a deep multi-layer perceptron is given. They all take a good express in forecasting accuracy and stability.
An example based on power grid shows that improving the prediction accuracy and stability is of great significance for wind power grid connection. The combination model has higher precision and steady performance, which can be used in electric system dispatching and has a wide range of economic and social benefits. For instance, timely adjust the scheduling plan, reduce the system reserve capacity, ensure the power quality, reduce environmental pollution, etc. For the future development direction, the combination prediction model proposed in this study can be applied to stock index prediction, traffic flow prediction, power load prediction and product sales prediction, as well as other forecasting fields. At the same time, flexibility can obtained according to the required accuracy and stability.