A Combined Strategy for Wind Speed Forecasting Using Data Preprocessing and Weight Coefficients Optimization Calculation

Wind speed forecasting is an essential procedure in electric grid dispatching. Short-term wind speed forecasting can be a great challenge and an intractable issue in increasing wind energy output and guaranteeing power safety. The current wind speed forecasting models are based on a single model, which is generally an artificial intelligence model or a statistical forecasting method. However, these models cannot perform well in all cases. An effective combined model is proposed in this paper, and this model includes four parts: weight coefficient optimization calculation based on the nonpositive constraint combination theory, singular-spectrum analysis, combined forecasting and discussion of results. The developed model can decrease the negative influences of the component models and maximize the advantages of each component model. To evaluate the forecasting accuracy of our proposed model, ten minutes of wind speed data from Shandong Peninsula, China, were used as test cases. It is clearly demonstrated that the developed combined strategy outperforms the individual forecasting methods in terms of forecast performance and stability.


I. INTRODUCTION
The background and motivation, literature review, contributions of our work and organization of this paper are detailed and introduced in this section.

A. BACKGROUND AND MOTIVATION
Due to the increasing amount of industrial and economic activity, people are becoming increasingly dependent on energy. However, traditional fossil fuels can no longer meet the demands of human pursuit of sustainable economic development; thus, people are changing their focus to renewable energy [1]. Wind power, with the advantages of being renewable and pollution-free, has been universally considered to be one of the most important renewable energy sources [2]. According to the statistics, until 2018, the global cumulatively installed wind capacity was approximately 591.55 GW; 209533 MW was contributed by China, whose proportion of cumulatively installed wind capacity increases annually [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Xiaowei Zhao. However, as the most important factor of wind power, wind speed, is fluctuating and uncertain, wind power generation may be seriously affected by variable and uncontrollable wind speed conditions [4]. Thus, it is significant to establish a wind speed prediction system for capturing the random and erratic characteristic of wind speed as well as possible.

B. LITERATURE REVIEW
Recently, with the goal of efficient forecasting, many scientists have presented important forecasting strategies, which can be divided into five categories: i) physical methods, ii) statistical methods, iii) artificial intelligence methods, and iv) combined methods [5]. More specifically, the physical methods use physical considerations (e.g., temperature, topography, and roughness) to achieve wind speed forecasting [6]. For example, Di et al. [7] aimed to increase wind speed forecasting precision by optimizing the parameters of numerical weather prediction (NWP) based on a surrogate modeling-based optimization (ASMO) approach. To further emphasize the significance of parameter selection, elaborate physical interpretations were given to illustrate the necessity of using the best parameters for the improvement of wind speed forecasting. Zhao et al. [8] realized that there were inevitable errors in numerical technologies, thus, for calculating and assessing the inherent errors, the residuals between forecasting results and realistic series were investigated, and a nonlinear and nonparametric method was presented to correct the raw forecasting results. The experimental results indicated that the error correction of numerical methods could greatly enhance prediction accuracy. However, compared to statistical methods and artificial neural networks (ANNs), physical strategies are not reliable for short-term prediction because they always need more computational time and too many physical considerations [9], [10]. The statistical forecasting methods use statistical equations to predict wind speed. Compared with other predicting methods, statistical strategies are easier to model, and they always exhibit satisfactory real-time performance [11]. In a retrospective of previous studies, Aasim et al. [12] proposed a novel repeated wavelet transform (WT) system based autoregressive integrated moving average model (ARIMA) aiming to enhance wind speed forecasting capacity in a very short period. By comparing it with other relative models at different time scales, the proposed model was shown to be the most effective for very short-term forecasting. Pearre and Swan [13] developed two forecasting methods; one method was based on a statistical correction, and the other method relied on interpolating correction topographies and instantaneous forecast errors. Before forecasting, classification in terms of site-specific corrections was conducted according to wind speed and direction, which effectively enhanced forecasting performance. With the improvement of forecasting technologies, artificial intelligence methods are being developed and shown to represent an improvement over statistical models with regard to wind speed forecasting [14]- [16]. Artificial intelligence strategies mainly include support vector machines (SVMs) [17], fuzzy logic methods [18], and ANNs [19]. Among these distinguished methods, ANNs have wider applications in wind speed forecasting, in which forward and recurrent networks are key components. In the first component, back-propagation neural network (BPNN) plays a dominant role due to its ability to approximate nonlinear functions. In recurrent networks, the Elman neural network (ENN) performs well in terms of sensitivity to the original series. This good performance is mainly due to its feedback connection, which always enhances its dynamic properties and nonlinear mapping capacities [20]. Support vector machines have also developed rapidly over the past few years; their forecasting capacity has been shown to be satisfactory. For instance, it was demonstrated that an SVM was superior to an ANN model and an adaptive neuro-fuzzy inference system model [21]. Although intelligent techniques are adept in capturing the nonlinear characteristics of time series, there are still intrinsic challenges, considering the randomness and instability of raw wind speed data [22]. Therefore, increasing numbers of studies have adopted data preprocessing technology to extract the main features of time series and filter noise to remarkably improve wind speed forecasting performance and accuracy.
Data preprocessing technology is adopted to filter out high frequency noise in the original data, which will greatly enhance the precision of the predicting method. Wavelet decomposition (WD) is a widely utilized technique for denoising nonlinear and unstable time sequence data. Liu et al. [23] put forward a method including WD, a genetic algorithm (GA) and an SVM. In this method, WD is used to decompose the wind speed data. As a frequently used wind data processing technique, empirical mode decomposition (EMD) uses the Hilbert-Huang transform to sift the data into a series of intrinsic mode functions (IMFs). Liu et al. [24] integrated EMD and an ANN to predict wind speed. It was verified that the effectiveness of the EMD-ANN method is superior to that of an individual ANN. Considering of the drawback of EMD, such as model mixing, EEMD was proposed. In addition, there are many improved methods, such as fast ensemble empirical mode decomposition (FEEMD) and wavelet packet decomposition (WPD). Singular spectrum analysis (SSA) is a new denoising technology; it is efficient in decomposing original data into independent and interpretable components, such as trend, cyclic, oscillatory and noise components [25]. SSA is currently widely used in different fields, particularly in the wind speed forecasting, pollution, financial, and economic fields. For instance, Gao et al. [26] developed a novel hybrid forecasting strategy integrating SSA, a firefly algorithm and an artificial neural network. The developed model provides satisfactory forecasting performance compared with a single artificial neural network. Liu et al. [27] applied SSA to decompose raw data into several components. With the trend component forecasted by convolutional neural network (CNN) and the detail components forecasted by SVR, the developed hybrid model was shown to be better than compared methods in enhancing wind speed prediction capacity.
The overall disadvantages of the mentioned approaches are summarized below: (1) Relative to other forecasting methods, a physical method is not the best choice for short-term wind speed prediction. Moreover, substantial physical information is needed, which increases the consumption of time and materials.
(2) Wind speed series are nonlinear and unstable, indicating that a forecasting model should perform under the assumption of nonlinearity. However, the assumption of statistical modeling is that the original series is linear; thus, it is difficult to greatly improve forecasting accuracy.
(3) The artificial intelligence method is an outstanding improvement over other compared models. However, there are still intrinsic drawbacks including overfitting and falling easily into a local optimum.
(4) To filter out the noise in volatility sequences, many data preprocessing approaches are taken into account. However, we should not ignore the disadvantages of the previous 33040 VOLUME 8, 2020 methods, such as the model mixing in EMD and the residual noise in EEMD.
To find more high-accuracy prediction models, a great deal of studies have been devoted to combined forecasting methods that integrate the abovementioned individual forecasting models. A combined forecasting method was developed in 1969 by Liu et al. [28], which in general. sets the weights of each component model in accordance with its past prediction accuracy [29]. Combined methods can overcome the limitations of individual models and achieve satisfactory accuracy, and they are considered an advanced approach to perfect forecasting abilities. For weight coefficient optimization calculation in a combined model, recently, many studies have aimed to find a way to determine the weights of the component models to achieve high accuracy. For the traditional combination method, which operates on the basis of the minimum sum of the prediction error square, depending on whether the combination methods have different or similar forecasting accuracies, the individual model that provides the best performance is assigned a weight of 1, and the rest are assigned 0. Considering these drawbacks of the conventional method, a novel combination strategy using nonpositive constraints and artificial intelligence algorithms with a weight-determined combination method (AI-CM-NNTC) was developed by Xiao et al. [30]. The AI-CM-NNTC combined method was verified to be superior to previous combination strategies and component models in prediction performance. For example, Jiang and Liu [31] proposed a new combined system employing variable-weight combination theory for wind speed prediction. An ELM optimized by the multi-objective grasshopper optimization algorithm was applied to combine the predicted values of all hybrid models. The experimental results demonstrated the validity of the proposed combined strategy. In addition, other AI-CM-NNTC combined methods, including an ELM optimized by a multi-objective salp swarm algorithm (MSSA) [32] and a least-squares support vector machine (LSSVM) optimized by a multi-objective ant lion algorithm (MOALO) [33] have also been employed for wind speed prediction.

C. CONTRIBUTIONS
To achieve better forecasting results in our study, a combined model was developed, which used the cuckoo search (CS) algorithm to optimize the weight coefficients of each component model based on nonpositive constraint theory. For the reasons that an individual forecasting model cannot always supply stable and high accuracy in any situation and that artificial intelligence models always have high precision with regard to wind speed forecasting, the combined model integrates SVM, Elman, BP optimized by the firefly algorithm (FA-BP) and BP based on particle swarm optimization and the genetic algorithm (GA-PSO-BP) into the developed system. Meanwhile, SSA is adopted to preprocess the original data, and ultimately, we employ the CS to optimize the weight of each component model based on nonpositive constraints.
The experiments demonstrate that the proposed combined model is always more effective for wind speed forecasting because it is stable and has high accuracy.
The primary contribution of our research for wind speed forecasting is illustrated as follows: (1) Two individual ANNs based on a data preprocessing strategy and two hybrid systems based on a data pretreatment strategy and different optimization algorithms are employed to establish the proposed combined model, which guarantees that the advantages of each component model are maximized. SSA-SVM, SSA-Elman, SSA-FA-BP, and SSA-GA-PSO-BP are simultaneously employed to represent the nonlinear characteristics of the original wind speed series and to improve forecasting performance.
(2) Three different optimization algorithms are used in our study: one algorithm is used to provide the optimal weight for each forecasting component, and the other two algorithms are used to optimize the weights and thresholds of the benchmark neural network in hybrid component models. The cuckoo optimization algorithm can be adopted to determine the optimal weight due to its excellent optimization ability and simple operation process. GA-PSO is developed to find the global optimal solution and improve the optimization efficiency of the hybrid component model. (3) Considering that the parameters of the data preprocessing technology can influence forecasting performance to a great extent, a comparison experiment with the combined model and the component models is conducted to determine the appropriate range of the correlation coefficient and the number of components r. This approach of defining the r value can be applied to the forecasting process well.

D. ORGANIZATION
The main content of this paper can be summarized as follows. Section II gives the definition of the SSA methodology. Section III introduces the component forecasting models (SVM, Elman, FA-BP and GA-PSO-BP). The combined model theory is proposed in Section IV. The simulation process and forecasting results are analyzed in Section V and Section VI, respectively. Finally, Section VII draws conclusions.

II. SINGULAR SPECTRUM ANALYSIS METHODOLOGY
SSA is helpful in decomposing raw time series into a few IMFs, which contain the ''clean'' sequences, oscillations and noise based on singular value decomposition [34], [35]. The specific process is demonstrated below.

A. FIRST STAGE: DECOMPOSITION
Step 1: Transform the original time series X = (x 1 , x 2 , ···, x N ) of length N into a trajectory matrix. Subsequently, the trajectory matrix can be defined by Eq. (1).
The matrix T represents a Hankel matrix whose dimensionality is L * K , K = N − L + 1, and all the x ij on the anti-diagonals where i + j is a constant (i = 1, 2, . . . , L, j = 1, 2, . . . , K ), are equal.
For example, if we assume that N = 10 and L = 5, then K = 6, and the trajectory of the matrix is shown in the following equation: where the dimensionality of this Hankel matrix is L * K (5 * 6), and x ij = x i−1,j+1 . Specific, when i = 2 and j = 3, Step 2: The eigenvalues of TT T are computed and sorted in decreasing order by SVD, that is, λ 1 ≥ λ 2 ≥ · · · ≥ λ L ≥ 0. The homologous eigenvectors can be represented by (U 1 , · · ·, U L ). Subsequently, SVD can be written as Eq. (3), where

B. SECOND STAGE: RECONSTRUCTION
The reconstruction stage is subdivided into two parts: grouping and averaging.

2) AVERAGING
The last step is transforming the grouped matrices T S into a novel sequence whose length is N . Define Y 1 = (y 1 , y 2 , · · ·, y N ) as the converted one-dimensional sequence; the elements in Y 1 are given in Eq. (4).
The operational process of SSA is summarized in Fig. 1, A.

C. PARAMETER SELECTION
There are two significant parameters that play a vital role: the window length L and the number of components r. The selection of these two parameters directly influences the decomposition of the time series and, in turn, influences the forecasting precision.
• The window length determines the dimension of Hankel matrix. The L value is in the range N /3≤ L ≤ N /2.
• The number of components r defines the number of decomposed series employed for reconstruction. The general criterion for r value selection is based on the contribution of T i to the trajectory matrix of T, and the sum of component contributions is at least a preset threshold, for instance 90% above [36], 37]. In our experiments, we use historical data to determine r and then apply them to the final wind speed forecasting; the detailed method will be discussed in Section V.

III. WIND SPEED FORECASTING MODELS
There are numerous ANNs that have been adopted to wind speed prediction, including several classical forecasting models and several forecasting models based on stochastic heuristic optimization algorithms. In our paper, Elman, SVM, FA-BP, and GA-PSO-BP are chosen as component predicting strategies to build the developed combined model.

A. ELMAN NEURAL NETWORK (ENN)
The ENN is a partial recurrent neural network introduced by Elman in 1990 [38]. An ENN consists of four layers: input, hidden, output and feedback layers. The trait of time-delay memory increases the sensitivity of the ENN to historical data, a local feedback network internally enhances the capacity of the ENN to address dynamic information, and these together give ENNs an advantage over static neural networks in modeling hydraulic systems [39], [40].

B. SUPPORT VECTOR MODEL (SVM)
The SVM was first developed by Vapnik in 1979 [41] on the basis of structured risk minimization and statistical machine learning theory. For nonlinear regression of SVM, the fundamental ideas of changing the data to a high-dimensional feature space and transforming the nonlinear regression issue into linear regression issues are realized by a kernel function.

C. FORECASTING MODELS BASED ON A BPNN
BPNN is a multilayer mapping network with numerous applications. However, the shortcoming of BPNN is that the network cannot forecast precisely, due to the training result easily falling into a local optimum. For this reason, many stochastic heuristic optimization algorithms, such as CS, the FA and the PSO algorithm, have been used for optimizing BPNN. These optimization algorithms have the ability to overcome the shortcomings in BP of easily falling into local optimization, missing the globally optimal solution and slowing convergence. Li et al. optimized the initial weights and bias in BPNN [42] with an adaptive GA. In addition, some improved algorithms or hybrid algorithms have been proposed to address the weaknesses of the basic algorithm; for instance, [43] proposed improved cuckoo search (ICS) by changing the method of adjusting p α and α. The experimental results verified the validity of the developed algorithm.
Chen et al. [44] used a hybrid algorithm that combines PSO with an artificial fish swarm algorithm for neural network training. The efficiency of their proposed hybrid algorithm is quite good. In this paper, we used the Firefly algorithm and hybrid GA-PSO algorithm to optimize the initial parameters in BP.

D. BACK PROPAGATION NEURAL NETWORK BASED ON THE FIREFLY ALGORITHM
The firefly algorithm (FA) was initially proposed by Yang [45] for solving optimization problems; it originates from the flashing behavior of fireflies, in which the fireflies are attracted to each other by flashing lights. The FA can be used to address the most difficult optimization problems.

E. BACK PROPAGATION NEURAL NETWORK OPTIMIZED BY A HYBRID GA-PSO ALGORITHM
PSO was suggested by Kennedy and Eberhart in 1995 [46]. The flowchart of PSO is presented in Fig. 1, B. However, the diversity of PSO is insufficient, so it is difficult to find a global optimum with a small population. Nevertheless, the selection, crossover and mutation in GA can compensate for this deficiency. The genetic algorithm, developed by Holland [47], is a heuristic and stochastic optimization algorithm using evolutionary and genetic theories. The genetic algorithm can quickly converge to achieve optimization by genetic operations. Fig. 1, C, shows the flowchart of GA. Our hybrid algorithm using GA-PSO is developed in this paper; the GA-PSO algorithm can use the advantages of the component algorithms to improve optimization efficiency.
The steps of the developed optimization algorithm can be summarized as: Step 1: There are N particles in each generation, and their fitness values are assessed and ranked. N /2 particles are generated by performing PSO steps to obtain the best particles, and the N /2 offspring are used in the next generation. Step 2: The remaining N /2 particles are excluded. In addition, in the GA process, N /2 individuals are created by crossover and mutation between the genetic variants and the surviving N /2 PSO members.
Step 3: A novel population is formed with N particles whose fitness values will be assessed and ranked once more.
A rudimentary GA-PSO algorithm is outlined in Algorithm 1.

IV. THE COMBINED FORECASTING MODEL
From a conceptual perspective, weighting-based combined methods can be defined as combining several independent forecasting models and providing them with an appropriate value that represents the effectiveness of each component. The general prediction process is shown in Fig. 2, A. The process of these approaches can be summarized as follows.
whereF t+h is the final predicted consequence for h steps ahead at time t;f i,t+h|t is the predicted consequence of the i th component at h steps ahead at time t; M is the number of individual strategies; and w i is the weight coefficient of the ith individual model, which satisfies w 1 + w 2 + · · · + w M = 1. For weight coefficient calculation, the traditional combination method based on the minimum sum of the squared forecast error has been shown to be inefficient; therefore, this paper develops a combined model using nonpositive constraint theory and CS to decide the weights of combined models.
Definition 2: The combined approach, on the basis of nonnegative constraint theory and the CS algorithm weight-determined combination method, is given as follows: The weight vector has a maximum value of 1 and a minimum of 0, and the sum of the weights is approximately 1.
T be the current solution for the ith cuckoo; then, the new solution x t+1 i is generated as follows: d) Keep the best solution X = (x 1 , x 2 , x 3 , x 4 ) T , which represents the weight coefficients. Thus, the combined forecasting result is: Fig.2 demonstrates the construction of the proposed combination strategy. The rudimentary CS algorithm is outlined as the pseudocode in Fig.2, B. Table 1 presents the parameters of the CS.

V. SIMULATION PROCESS AND ANALYSIS
For this study, to further investigate the stability and credibility of the developed strategy, wind speed data from three different wind turbines in Penglai, China, are used to demonstrate the performance of the developed model.

A. DATA COLLECTION
Penglai city lies in northeast China, in Shandong Province, and it is adjacent to the Yellow Sea and Bohai Sea. The location of the research sites is presented in Fig.3, A. The city enjoys a monsoon climate owing to distinct continental climate features; its annual average wind speed is 5.2 m/s and it is not affected by tropical storms, typhoons or other extreme weather. VOLUME 8, 2020  With the aim of fully verifying the proposed combined model in terms of stability and universality, in the selection of wind speed data, we choose data with different turbines from different months, which contributes to showing the combination's applicability to different months and turbines.   The total available wind speed data include 1748 samples, of which 1508 samples (approximately 251.3 hours) are training data, and 240 samples (40 hours) are test data. A rolling forecasting strategy is applied to predict the wind speed over the next 40 hours (240 test points). To forecast more accurately, 8 previous ten-minute data are employed to predict wind speed in the next step; that is, the input datasets are , and x(t − 1)} and the output datasets are {x(t)}, t = 9, 10, . . . , 1748.x denotes the wind speed time series, and the real value is continuously employed to update the latest value. This principle applies to all models presented in this paper. The rolling process can be seen in Fig.4, part 3.

B. ASSESSMENT INDICATORS FOR PREDICTION EFFECTIVENESS
To assess the forecasting ability of our developed combined strategy, three performance evaluation indices must be introduced to comprehensively measure the proposed model.
The three indices adopted in this study are the mean absolute error (MAE), mean square error (MSE) [48] and mean absolute percent error (MAPE) [49]. The detailed definitions of the indicators are given in Table. 2.

C. EXPERIMENT PREPARATION
Before the experiment, the parameters in the model and optimization algorithms must be defined. For the Elman neural network, the numbers of input layer and output layer neurons are set to 8 and 1, respectively, and the number of hidden layer neurons is set to 9. The training function is trainlm, the hidden layer function is tansig, and the output layer uses purelin as the output function; the details are given in Table 3.
For the FA-BP model and GA-PSO-BP model, based on Kolmogorov's theorem that 2i + 1 hidden neurons can map any function for i inputs, the number of nodes in the hidden layer is set to 17. Table 4 shows the parameters of FA.
The number of variables for the GA-PSO is eight. The details are given in Table 5.

D. DATA PREPROCESSING AND SIMULATION
To eliminate instability and to guarantee that the forecasting performance is reliable and does not depend on randomly initialized parameters, we repeated the experiments for each model ten times and then averaged the values for the three indices.    The experiments were conducted in MATLAB R2014b on Windows 7 with a 3.30 GHz Intel(R) Core (TM) i5-4590 64-bit CPU with 8 GB of RAM.

1) DATA PREPROCESSING IN OUR DEVELOPED MODEL
As the two most important parameters in the SSA algorithm, the values of L and r directly affect the accuracy of forecasting. The general rule for L value selection is that it is in the range N /3 ≤ L ≤ N /2, where N denotes the length of sequences, so in this paper, the window length L is set to 700. Because the wind speed is elusive and changeable, we should not only keep the main characteristics and the integrity of information but also filter out noise in denoising. Therefore, in order to find an excellent r value, we define r by studying historical data to apply them to the final forecasting. The idea is to use BP to forecast the previous wind speed by adjusting the r value in five cases and recording the corresponding MAPE values. Finally, we compare the MAPE values and determine the r value. Using a BP neural network to find the r value is done for two reasons: one reason is that in this paper, the combination of models has two component models based on a BP neural network; the second reason is that four of the component models are artificial intelligence models, so using a BP neural network defines a more representative r value.
First, the historical data (1748 data) in the five cases are divided into two parts: one part is adopted to training, and the other part is applied to forecasting, and the number of nodes in each layer and other parameters in the BP neural network are consistent with the above. Second, we let r = 50, adjust the r value at 10-min intervals, and record the forecasting accuracy and compare precision under the different r value. Finally, we obtain a superb r value. By means of experiments, r was set to 220, 220, 220, 240, and 210. The process of denoising is shown in Fig. 4, part 2.

2) SIMULATION PROCESS OF OUR DEVELOPED MODEL
To demonstrate the prediction capacity of our proposed model, multiple experiments are performed to compare the accuracy of all forecasting consequences. The experimental process consists of the following steps: (a) Use SSA to decompose and reconstruct the raw data, where the r value is chosen by studying the historical data (train data). We conduct this step with the aim of enhancing the forecasting precision of our proposed model. (f) Based on experimental studies, the proposed model achieves wind speed forecasting computation. Fig. 4 is the schematic of our developed combined strategy. VOLUME 8, 2020

VI. EXPERIMENTAL ANALYSIS AND COMPARISON
In this section, to test the proposed model globally, four simulations are carried out to demonstrate the effectiveness of the developed model.

A. EXPERIMENT I: ANALYSIS AND COMPARISON OF INDIVIDUAL MODELS
The optimization algorithm can determine the optimal parameters for BP to enhance its global search performance; thus, hybrid algorithms based on FA and GA-PSO are used. Table 6 presents the forecasting results of BP, FA-BP, and GA-PSO-BP in the five cases.
As we can see from Table 6, the performance of individual BP is always stable; it follows that the set of experimental parameters in the BP model is reasonable. On the other hand, the values of the MSE, MAE and MAPE of the hybrid approaches are superior to those of BP. For instance, in Case 1, the MAPE of BP is 6.71%; however, the MAPE of FA-BP and GA-PSO-BP are 6.65% and 6.66%, respectively.
The results of simulation for SSA-BP, SSA-FA-BP, and SSA-GAPSO-BP are listed in Table 7. Fig. 5 clearly lists the MAPE values obtained from each involved model in every case. Table 7 and Fig.5 show that: (1) The SSA can achieve better performance, and all the models have more satisfactory forecasting precisions compared to Table 6, thereby proving that the r value obtained in this paper is very efficient.
(2) The overall performance of SSA-BP is not completely satisfactory, and higher volatility appears when SSA-BP and algorithm optimization are combined.
(3) BP already has forecasting values after the SSA technique, so the optimization algorithm can only rarely optimize the result. It is clear that, compared with the optimization algorithm, SSA contributes more to improving the forecasting accuracy. Table 8 presents the forecasting results of Elman, SSA-Elman, SVM and SSA-SVM in two different situations.
In the five cases, the value of MAPE is reduced by 1.42%, 2.87%, 2.03%, 2.26% and 2.39% for Elman and 2.56%, 2.52%, 2.64%, 3.74% and 2.70% for the SVM. Obviously, SSA greatly reduces the forecasting errors, which also proves 33050 VOLUME 8, 2020  that data preprocessing technology is vital for wind speed forecasting. In addition, for wind speed forecasting, the data preprocessing technique is extremely necessary. For example, the average MAPE value of the proposed strategy in Case 1 is 3.35% lower than that of the weighting-based combined model without SSA.

C. EXPERIMENT III: COMPARISON OF DIFFERENT WIND TURBINES AND MONTHS
To fully test the proposed combination model for stability and universality, Cases 1-3 provide measurement data at the same time for different turbines, and Cases 4 and 5 are from different months. We can see from Table 10 that the combined method using SSA and the weight coefficient optimization calculation shows a higher accuracy standard. For all turbines and months, under the five sets VOLUME 8, 2020  of data, the MAPE values are always approximately 3%, and the performance of the combined model is always stable.

D. EXPERIMENT IV: TEST WITH R VALUES
To determine whether the r value obtained by studying the historical data is appropriate for the final test set and to identify a reasonable range of the correlation coefficient as well as the sum of component contributions for future wind speed forecasting in Penglai, we carried out Experiment IV. The experimental results are shown in Tables 11-15. The MAPE (BP) represents the performances of FA-BP and GA-PSO-BP. The correlation coefficient denotes the degree of relevance between the reconstructed time series and the true time series. The MAPE is the mean absolute percent error between the reconstructed time sequences and the true time sequences. Table 11 shows when the r value is 50, the forecast errors of all the models are high; with the increase in r, the forecasting accuracy becomes more desirable, but when r exceeds 220, the forecasting accuracy becomes undesirable. Thus, we can see that the proposed approach of defining r is effective and reasonable.
It is noteworthy that when r is small, the reconstructed time series is a poor description of the original series, which leads to poor forecasting performance; thus, in order to save time in the experiment, we set r = 170 as the initial value in the succeeding cases.  For Cases 2 and 3, as Tables 12 and 13 shown, when r = 220, all the models have the best forecasting accuracy, and the trends presented in Tables12-13 are in accordance with those presented in Table 11. Table 14 indicates that all the models provide the best forecasting accuracy when r = 240. However, for the SVM, when r = 190, 200, and 240, the MAPE values of the SVM are all 3.83%, and the forecasting results of the SVM show a different trend.
The test results of the single model for different r in Case 5 are shown in Table 15. When r = 210, the forecasting accuracy of BP and Elman are the best across the experimental cases, SVM also obtain a higher accuracy. This situation shows that the method of defining r has some limitations; it cannot obtain the best r value for the entire model, but it can apply an appropriate r value to reduce forecasting error.
In addition, through the series of experiments and Fig. 6, it can be concluded that: (a) When the r value is small, the forecasting accuracy is undesirable because the reconstructed time series is a poor description of the original series. With the increase in r, the forecasting accuracy improves or is sustained.   When r exceeds a certain value, the forecasting accuracy becomes close to the forecasting accuracy defined by the original time series data because the reconstructed time series data is almost the same as the original time series data.
(b) The method of defining the r value proposed in this paper can apply to the forecasting process well, even if it cannot apply the best r value for every model.
For wind speed forecasting in the Penglai area, the appropriate range of the correlation coefficient between the original series and the reconstructed series is approximately 99.5%, and the sum of the r component contribution is at least 99%; these results can provide a reference for the selection of r for the wind farm in Penglai.

E. EXPERIMENT V: THE DETERMINATION OF FORECASTING SUB-MODELS
In this section, to verify the effectiveness of the selected forecasting sub-models of the combined model, we conducted  two simulation experiments in terms of the forecasting performance of several individual forecasting models (ARIMA, BP, SVM, ELM, Elman, and a general regression neural network (GRNN)) and some hybrid models using different optimization algorithms (GA-BP, PSO-BP, FA-BP, and GA-PSO-BP). To enhance the reliability of the forecasting results, we also repeated the simulation for each model ten times and then took the mean values of the three evaluation criteria as the final criteria values. The detailed forecasting results are shown in Tables 16 and 17. From these tables, we can observe that: (1) When we compare the forecasting accuracy of the six individual forecasting models, we can find that the SVM can provide the best forecasting performance in Cases 1 to 5, followed by BP and Elman. The forecasting capacity of ARIMA, ELM, and GRNN are inferior to SVM, BP and Elman in all cases. For instance, the mean MAPE values of SVM, BP, Elman, ARIMA, ELM, and GRNN are 7.11%, 7.31%, 7.52%, 7.96%, 8.47%, and 8.94%, respectively. Thus, the SVM and Elman, as two effective individual forecasting models, are used to construct our developed combined model.
(2) Moreover, to overcome the drawbacks of BP (i.e., it easily falls into local optimization, it misses the globally optimal solution and it converges slowly) and further enhance the forecasting performance of sub-models, the weights and thresholds of the BP neural network are optimized by  optimization algorithms, including GA, PSO, FA, and GA-PSO. The comparative results of hybrid models integrating the BP neural network and these four optimization algorithms are presented in Table 17, from which we can observe that optimization algorithms can enhance the forecasting capacity of a BP neural network to some extent, with lower MAE, MAPE, and MSE values. Furthermore, the optimization performance of FA and GA-PSO is better than that of GA and PSO. Thus, the hybrid FA-BP and GA-PSO-BP are also adopted as sub-models to build the final combined model. In brief, the SVM, Elman, FA-BP, and GA-PSO-BP are used as benchmark forecasting models to structure the combined model.

F. EXPERIMENT VI: COMPUTATION TIME OF THE TRAINING MODEL
In this section, we used 1508 samples as training data and calculated the running time for training the proposed combined strategy and all compared models in all datasets. The computation results are presented in Table 18, from which we can obtain the following results: (1) The running time of training individual models, such as BP, ELM, and Elman, is obviously shorter than that of the hybrid models and our proposed combined model.

G. TESTS WITH DIFFERENT CAPACITY
To further demonstrate the effectiveness of our developed combined model in wind speed forecasting with different capacity, we conduct a discussion in this section, which considers the original wind speed data with lengths of 874, 1748, 3496, and 5244. The detailed results are presented in Table 19. From the table, we can observe that the proposed combined model can be used in wind speed forecasting with different capacity, because the evaluation criteria values in the five cases vary within reasonable limits. For example, the MAPE values are within the range of 3%∼5%. Moreover, when the data length is 1748, the forecasting capacity of our developed combined model is best with MAPE values of 3.20%, 4.01%, 3.98%, 3.12%, and 3.84% from case 1 to case 5, respectively. Too few data samples, such as 874, may cause a high forecasting error due to inadequate training, while too many data samples can enhance forecasting accuracy but result in a much higher cost. Thus, in our paper, we adopted 1748 samples to verify the forecasting performance of our developed combined model. The simulation results demonstrate that our developed combined model can provide accurate and stable wind speed forecasting results.

VII. CONCLUSION
Anticipating the characteristic of wind speed with precision and accuracy is significant because of the increasingly crucial effect of wind energy in power systems. Wind speed data are intermittent and random, which increases forecasting difficulty to a great extent. Most previous studies have made efforts to develop a precise and stable forecasting model. However, many of these studies cannot achieve the desired accuracy across various sites and various wind speed time series. Motivated by recent progress in combined model forecasting, we developed a combined strategy based on data preprocessing and parameter optimization calculation, which is able to achieve precise forecasting for different turbines and months. In this paper, ten-minute wind speed series of three wind turbines in Penglai, China, were adopted as cases to validate the effectiveness and stability of the model.
We evaluated and compared the combined model with the component models and obtained the appropriate range of the correlation coefficient and the sum of the contribution of r components to define the r value. Based on empirical results and discussion, the following conclusions can be drawn. (a) The forecasting capacity of our proposed model is best across different component forecasting models, with MAPE values of 3.20%, 4.01%, 3.98%, 3.12%, 3.84% from case 1 to case 5, respectively, which improved 2.49%, 1.29%, 1.74%, 2.36%, and 1.60%, respectively, compared with the worst comparative model, SSA-Elman; (b) The SSA data pretreatment technique shows strong performance in dealing with erratic and fluctuating wind speed series, which contributes greatly to improving forecasting performance and accuracy. The MAPE values of the proposed combined model with and without SSA, are 6.55%, 6.53%, 7.34%, 7.48%, and 7.33%, and 3.20%, 4.01%, 3.98%, 3.12%, and 3.84%, from case 1 to case 5, respectively; (c) The method of defining the r value proposed in this paper also applies to the forecasting process. In addition, for wind speed forecasting in the Penglai area, the appropriate range of the correlation coefficient between the original series and the reconstructed series is approximately 99.5%, and the sum of the contribution of the r component is at least 99%. Our developed combined model can effectively integrate the advantages of different sub-models and further enhance the forecasting accuracy and stability. Therefore, the proposed model can be utilized in wind speed forecasting with great promise in the future and provide a reference for the management of large wind farms.
The main drawbacks of the proposed combined system can be attributed to two factors. One factor is that other factors affecting wind speed forecasting are not contained; the other factor is that only forecasting accuracy is considered in the fitness function of the optimization algorithm, ignoring the forecasting stability, which is also important in optimization problems. To overcome these drawbacks, more potential factors that may impact wind speed forecasting performance must be investigated. Moreover, to enhance optimization ability, more novel and valid optimization algorithms with high universality must be explored in future research.
TONGJI GUO is currently pursuing the Ph.D. degree with the Jiangxi University of Finance and Economics, Nanchang, China. His research interests are energy economic analysis and health statistics.
LIFANG ZHANG received the bachelor's degree in economics from Qingdao University, Qingdao, China, in 2018. She is currently pursuing the master's degree with the Dongbei University of Finance and Economics, Dalian, China. Her main subjects of interest concerns economic loss assessment of haze pollution in big data environment, wind speed forecasting, data mining, and machine learning, which tend to be multidisciplinary in approach. She has published two academic articles in SCI retrieval.
ZHENKUN LIU received the bachelor's degree in science from Shandong Technology and Business University, Yantai, China, in 2018. He is currently pursuing the master's degree with the Dongbei University of Finance and Economics, Dalian, China. His research interests include time series analysis, applied statistics, artificial intelligence, high-dimensional data analysis, energy prediction theory, and method. He has published two academic articles in SCI retrieval, which has been published in top international journals.
JIANZHOU WANG is currently working as a Professor at the Dongbei University of Finance and Economics, Dalian, China, and a Doctoral Supervisor of big data at the School of Engineering and Information Technology, University of Technology Sydney, Sydney. He has published more than 150 academic articles in his academic career, and in the past five years, he has published 100 articles in SSCI and SCI retrieval, most of which have been published in top international academic journals. His research interests include economic loss assessment of haze pollution in big data environment, big data mining, artificial intelligence prediction theory and method, energy economic analysis, financial mathematical model, and complexity analysis. VOLUME 8, 2020