Short-Term Wind-Speed Forecasting Based on Multiscale Mathematical Morphological Decomposition, K-Means Clustering, and Stacked Denoising Autoencoders

Wind energy plays an increasingly important role in economic development. In this study, we propose a hybrid short-term wind-speed forecasting model comprising multiscale mathematical morphological decomposition (MMMD), K-means clustering algorithm, and stacked denoising autoencoder (SDAE) networks. First, in contrast to traditional signal-decomposing tools, the original wind-speed sequence is decomposed into a series of subsequences with different frequencies and fluctuant levels using the adaptive multiscale mathematical morphological algorithm directly in the time domain. The signal does not need to be transferred from the time domain to the frequency domain; hence, the accuracy can be considerably improved. Moreover, this is the first study that uses a time domain signal-decomposing tool in a hybrid wind forecasting model. Next, the data are split into different clusters of similar frequencies and fluctuant level subsequences using the K-means algorithm. The characteristics of each cluster are then captured using the SDAE as the core forecasting unit. Finally, the predictions of all subsequences are aggregated to obtain the final wind speed. The data from two real wind turbines are used to evaluate the performance of the proposed model, and the forecasting results are compared with five different benchmark models, namely, backpropagation neural network (BPNN), stacked denoising autoencoder (SDAE), mathematical morphology–backpropagation, mathematical morphology–SDAE, and K-means–SDAE for multiple scales, and two novel hybrid wind forecasting models namely, wavelet transform (WT)-K-means-SDAE and variation mode decomposition (VMD)-K-means-long short-term memory networks (LSTMs). The results of the comparison demonstrate that the proposed model provides a short-term wind-speed forecasting method whose prediction accuracy decreases with time; however, the proposed model achieves a better performance in comparison with other exiting models. At same time, the proposed model significantly increases the prediction accuracy of wind-speed forecasting and can be a reference for future research in this area.


I. INTRODUCTION
Developing a low-carbon economy and striving for sustainable development has become a common aspiration of a The associate editor coordinating the review of this manuscript and approving it for publication was Canbing Li . rapidly progressing human society. Recent developments in renewable energy have made wind the third-largest power source after coal and hydropower. However, wind power is naturally characterized by rapid fluctuations and intermittency that severely restrict its large-scale development. Therefore, formulating a reasonable wind power scheduling VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ plan based on accurate prediction and making it cooperate with traditional energy supplies are the most important steps to realize wind power scaling and regularization [1]- [3]. Wind-speed forecasting forms the basis of wind power forecasting and has been studied separately. The wind speed not only characterizes abundant noise, nonlinearity, and nonstationarity, but also, more importantly, presents wideband multiscale characteristics. Therefore, extracting robust and stable characteristics from complex wind-speed signals is quite challenging. The current level of wind-speed prediction is insufficient to meet the requirements of actual engineering applications [4]. Many recent studies focused on wind-speed forecasting models at multiple scales [5]. These forecasting models can be separated into very short-term, shortterm, medium-term, and long-term forecasting based on time scales. Very short-term forecasting ranges from a few seconds to 30 min ahead. Short-term forecasting ranges from 30 min to 6 h ahead. Medium-term forecasting ranges from 6 h to 1 day ahead. Lastly, long-term forecasting ranges from 1 day to 1 week or more ahead. These models can be further classified into physical, statistical, intelligent, and hybrid models based on forecasting approaches [6], [7].
Wind-speed prediction is performed by depicting detailed physical discretion of the atmosphere using physical approaches. The most common method is numerical weather prediction (NWP), which has been widely adopted to predict wind-speed conditions. NWP predicts the atmospheric motion state and weather phenomena for a certain period in the future by solving the equations of fluid mechanics and thermodynamics describing the weather evolution process through numerical calculation. However, the computation process is time-consuming, and the performance suffers for short-term predictions [8], [9].
Statistical methods use statistical theory to establish the forecasting model based on historical data. Such methods are easy to model and are not based on any predefined conditions. The conventional model is an autoregressive integrated moving average (ARIMA) model. Yatiyana et al. [10] constructed a wind power forecasting model based on ARIMA time series. Their method achieved good prediction results, but the prediction process was time-consuming owing to a single ARIMA model being employed to simultaneously complete multiple tasks. In [11], many different statistical models are proposed to predict wind speed and power. However, statistical methods are not suitable to deal with nonlinear patterns.
Intelligent models have demonstrated good performance in processing nonlinear patterns, and artificial neural networks (ANNs) are the most prevalent predictors in intelligent models. Peng et al. [12] proposed using a stacked denoising autoencoder (SDAE) model to predict wind power. They used the data from real wind turbines to evaluate the performance of the proposed model compared to two different models, namely backpropagation neural network (BPNN) and support vector machines, in terms of prediction results. The comparison illustrated that SDAE has a more robust prediction ability to deal with nonlinear data. A bidirectional gated recurrent unit-based deep learning model demonstrated superior wind power forecasting in [13], and the results were verified using real data from a wind farm. Wang et al. [14] introduced a deep belief network (DBN) on a multi-dimensional phase space to predict wind power. Compared with other benchmark models, using data from a real wind turbine, the DBN model showed better ability when processing nonlinear systems. Lin et al. [15] used an improved DBN model with genetic algorithms for wind-speed prediction with increased accuracy.
Conventional ANNs have serious drawbacks such as falling into local minima and overfitting. Several hybrid models combining different approaches have been widely applied to solve these problems. The signal-decomposing technique has been widely used to construct hybrid models. Sun et al. [16] introduced a hybrid model integrating variation mode decomposition (VMD), K-means clustering, and long short-term memory networks to forecast wind power. The prediction result demonstrated its superior performance when compared with six different benchmark models. In [17]- [19], wavelet transform (WT) was used as a signal-decomposing tool for the input signal. However, both VMD and WT accomplish data processing in the frequency domain, which requires transferring the data from the time domain to the frequency domain and back, thereby increasing signal error. Chen et al. [20] proposed wind-speed prediction model based on multiscale mathematical morphology and support vector regression. The mathematical morphology algorithm can decompose wind-speed sequences into a series of subsequences directly in the time domain with large improvements in accuracy. However, the structure of their entire model is simple and offers poor prediction performance.
This study introduces a novel short-term wind-speed forecasting model based on multiscale mathematical morphological decomposition, K-means clustering, and SDAE. The contributions of this study can be summarized as follows: 1. A multiscale mathematical morphology algorithm is used to decompose wind-speed sequences into a series of subsequences in the time domain. 2. A K-means clustering algorithm is applied to classify the data into different clusters with similar frequencies and fluctuant level subsequences. 3. SDAE, a deep learning method, is proposed as the core forecasting unit to capture each cluster's characteristics. 4. The predictions of all subsequences are aggregated to obtain the final predicted wind speed. 5. Data from two real wind turbines are used to comparatively evaluate the performances of the proposed model, five different benchmark models, and two novel hybrid wind forecasting models. The remainder of this article is organized as follows. Section II presents the methodologies used in this study. Section III evaluates the case studies of the proposed wind-speed forecasting framework. Finally, Section IV presents concluding remarks.

A. MULTISCALE MATHEMATICAL MORPHOLOGICAL DECOMPOSITION
Mathematical morphology (MM) is a nonlinear analysis method based on strict mathematical theory. The basic principle is to use a probe, called a structuring element (SE), to move through the signal and perform basic operations to extract useful feature information. The basic operations of the MM include dilation, erosion, opening, and closing [21]- [23].
In the decomposition process, the SE has a function similar to that of a filtering window for general signal processing. The signal is extracted when the shape of the signal matches the shape of the SE. In practical applications, the shape of the SE can be determined according to the target signal characteristics. The common SE types include linear, triangular, circular, and cosine.
Let f (n) be the pending signal, which is the discrete function over the domain D f = {0, 1, 2, · · · ,N} and let g (n) be the 1-D SE, which is the discrete function over the domain D g = {0, 1, 2, · · · , P }. Both N and P are integers, and N ≥ P. The morphological operators (i.e., erosion and dilation) can be defined as where ⊕ and denote the erosion and dilation operators, respectively. Two other basic morphological operators, namely, the opening and closing, can be further defined based on dilation and erosion, as follows: where • is the opening operator, and • is the closing operator.
The nonlinear and nonstationary characteristics of the wind determining the wind speed have multi-time scales. We introduce a multiscale morphology analysis based on the traditional single-scale morphology filter. By defining different SE sizes, we perform omnidirectional scans of the wind-speed curve and extracts fluctuation characteristics at different scales. In this case, we can depict the morphological characteristics of the wind-speed curve hierarchically.
Let f , g, and T denote a discrete pending signal, the SE of the MM, and multiple morphological operator, respectively. The multiscale morphological operation is based on the set { T s | s > 0, s ∈ Z}, where s is a positive integer representing the scale of the SE, and Similarly, multiscale MM operator erosion and dilation can be respectively expressed as follows: where sg = g ⊕ g ⊕ · · · ⊕ g (s-1 times). The open and closed operations, respectively, of the multiscale morphology, are defined as The signal decomposition process can be regarded as a multiple filtering process. The SEs of different scales can adapt to different signal shapes. Before the signal decomposition, two preparatory works need to be performed: choosing suitable size of the SE and filter.
The size of the SE is determined by the signal length and height. The local peak values of the original signal X = {x n | n = 0, 1, 2, · · · , N − 1} are calculated, where N is the signal length. Let P = p n n = 0, 1, 2, · · · , N p be the series of peaks, where N p is the number of peaks. The peak interval I is defined as I = {i n |i n = p n+1 − p n , n = 0, 1, 2, · · · , N p-1 }. L max and L min are defined as the minimum and maximum lengths of the SEs, respectively: where and represent the operators rounded toward infinity and rounded toward minus infinity, respectively.
The length and height of the SEs at different scales j are then obtained as follows: h To compensate for the shortcomings of the morphological OC and CO filters, they are often combined to form a hybrid alternating filter known as the OCCO filter, expressed as In this study, the signal characteristics of each scale can be effectively extracted. Moreover, more ideal morphological features can be obtained using the weighted multiscale morphology filter (WMMF).
Let the scale of SE be S = {S 1 , S 2 , · · · , S k }. We can realize the OCCO morphological filtering h occo (f ) s i at k different effective scales, where i = 1, 2, · · · , k. The WMMF is defined as, where ω s i and σ 2 s i represent the weight factors and variances of each SE, respectively. The corresponding ω s i is also small because of the weak denoising ability of small-scale SEs. The WMMF combines filters with different-scale SEs, ensuring the preservation of the original signal characteristics as much as possible. Fig. 1 shows the raw input wind-speed processing after completion of the two preparatory works. Step 1: Choose triangular-type SEs to finish the filtering task, then design the SEs in different scales, G j = (SEtri) j . From (12) and (13), we obtain Step 2: Let the input signal be denoted as F; the output y j (x) at scale j is obtained using (17).
After decomposition, the original signal sequence is decomposed into a series of m + 1 detail components, f 0 − f m and a principal component f m+1 to obtain a total of m+ 2 subsequence layers. According to (19) and (20), m depends on j, which is the scale coefficient of the SEs. Equations (10) - (12) show that j depends on the peak interval of the original input signal. In other words, the number of layers in subsequence depends on the original input signal.

B. K-MEANS ALGORITHM
K-means is a simple and classical clustering algorithm based on distance. The idea of K-means was first presented by Hugo Steinhaus in 1957 [24]. As a data-mining approach, the K-means algorithm automatically groups the input data into the corresponding predefined clusters by minimizing the distance function in an unsupervised manner. The points in the cluster are connected as closely as possible. The distance between the clusters is as large as possible [25], [26]. Figure 2 illustrates the procedure of the K-means algorithm [27]. Step 1: Determine the k different clusters in advance.
Step 2: Randomly choose K points as the initial clustering centers.
Step 3: Calculate the Euclidean distance D (x, y) between each point x i and the clustering centers y i , and assign each point to the cluster with the shortest distance using the Euclidean distance formula as Step 4: Redefine the cluster centers by calculating the mean vectors based on the following equation: where µ j is the center vector of the j th cluster; data k i is the i th data in cluster k; and N k is the sample amount of each cluster.
Step 5: Repeat steps 3 and 4 until the center vectors converge.
The K-means algorithm can be used on each subsequence of the decomposition result F to cluster the wind-speed data into different categories.

C. STACKED DENOISING AUTOENCODER
The autoencoder (AE) is an ANN with three layers-the input, hidden, and output layers-which are mainly used for dimensionality reduction or feature extraction. The hidden layer leads to dimensionality reduction and can help reconstruct the input data. Fig. 3 displays the basic structure of an autoencoder [28], [29]. The autoencoder maps the input data into a hidden representation f using where σ is the sigmoidal activation function. θ f = {W, b} is the parameter set containing the transformed weight matrix W, and bias vector b. f θ (x) is a function that encodes the features from the input layer to the hidden layer. Then, the hidden layer maps back the feature to the output layer based on where v g = {W , b } is the parameter set containing the transformed weight matrix W' and bias vector b'. The function g v (y) decodes the feature back to the output layer. The decoding weight matrix W' = W T . The AE system is trained by minimizing the loss function as follows: where n is the number of samples. However, the structure of AE is very simple, leading to overfitting and thereby reducing the efficiency of the feature extraction. The denoising autoencoder (DAE) is an improved version of the AE, with similar network structures and operating process. However, DAE reconstructs the input signal by corrupting one to get a more robust system ( Fig. 4(a)) [30].
The original input x is stochastically corrupted tox, and the encoding process of the corrupted input is An SDAE is made up of multiple DAEs, with the aim of building a deep architecture. In general, the SDAE has two learning steps: an unsupervised pretraining step and a supervised fine-tuning step (Figs. 4(b) and (c), respectively). The learning procedure starts with a greedy layer-wise pretraining procedure. Each DAE layer is trained in the same manner. The output of each DAE is the input of the next DAE. The first step performs unsupervised training of each DAE layer separately to minimize the error between the input and reconstruction results. After pretraining of the DAEs, all hidden layers are trained. A logistic regression layer is then added on top of the hidden layers; subsequently, the data with labels are used to fine-tune the network further through a BPNN algorithm. (1) For the given training dataset, the input signals are decomposed into a series of subsequences with different frequencies and fluctuant levels using the MMMD. The triangular SE shape and WMMF filters are then selected.
(2) The K-means algorithm is used for each subsequence of the decomposition result F to cluster the wind-speed data into several categories. The Euclidean distances between each point and clustering centers are calculated, and each point is assigned to the cluster with the shortest distance.
(3) Optimal SDAE models are established based on the clustering result. The predictions of all subsequences from the hybrid model are then aggregated to obtain the final result.
The model proposed in this study is for short-term windspeed forecasting; therefore, the effective prediction time scale is 30 min to 2 h. Beyond this prediction range, the prediction sensitivity decreases with time. When the VOLUME 8, 2020 time scale exceeds 48 h, the model will not provide a valid prediction.

III. CASE STUDY
The experiments were run on a platform with the following configuration: AMD Ryzen 2600 Six-Core Processor, 3.40 GHz, 16.0 GB RAM. The models were applied on Python 3.7, TensorFlow-GPU 1.15.0, and Keras 2.1.4. A comparison of the total computation time of different models is shown in Table 1. As a three-layer hybrid model, the computation time of the proposed model is 513 s; therefore, the computation efficiency of the proposed method is very high compared to other models. A. DATA DESCRIPTION Data from two randomly selected wind turbines collected for the last quarter of 2017 from Hebei Province, China were used. Approximately 25,000 SCADA data were available for each wind turbine. Of these, 15,000 data units were randomly selected as the training data, and the remainder were used as the testing data. The proposed method can be universally applied to all wind farm scenarios; therefore, there are no special requirements for the datasets. The cut-in wind speed of the wind turbine was 2.5-3 m/s, the cut-off wind speed was 25 m/s, and the rated speed of the wind turbine was set to 11 m/s. The mean relative error (MRE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE) were selected as indicators to compare the proposed model with the benchmark models. The MRE represents the average of the absolute error between the predicted and actual values. It is a linear function, and all individual differences have an equal weight on average. The RMSE indicates the sample standard deviation between the predicted value and the actual value, demonstrating the dispersion degree of the predicted value. The MAPE is similar to the MRE, which represents the percentage of the error in the actual value. All three indicators are used to evaluate prediction accuracy, which decreases with an increase in the values of these three indicators. They can be calculated as follows: where x(t) is the actual data; x r is the rated value; x (t) is the predicted data; and N is the number of forecasting samples. For further comparison of the performances of two different models, the improved performance of the MRE (P MRE ), RMSE (P RMSE ), and MAPE (P MAPE ) were introduced and calculated as The P MRE , P RMSE , and P MAPE represent the result of comparing two different forecasting models. The P MRE , P RMSE , and P MAPE are presented as percentages; thus, the performance of these two models can be further analyzed.

B. DECOMPOSITION RESULTS OF THE ORIGINAL WIND-SPEED SIGNAL
The first step in the prediction process is the decomposition of the original wind-speed sequences into several subsequences. The wind-speed signal was decomposed into seven and five subsequences for wind turbines #1 and #2, respectively, based on the original input signal. The testing data of the wind-speed sequences were obtained from the two wind turbines (Fig. 6). Figs. 7(a) and (b) illustrate the decomposition results of the two wind turbines using the MMMD technique. For wind turbine #1, f 0 -f 5 are the detail components,  Tables 2 and 3 present the estimated errors of the different models.
Figs. [8][9][10][11][12][13] show that the results from all forecasting models share the same characteristics. To further prove the performance of the proposed hybrid model, P MRE , P RMSE, and P MAPE were introduced to compare the eight different models. Tables 4-13 present the comparison results for wind turbine #1, whereas Tables 14-23 show the comparison results for wind turbine #2. Using the proposed model as a reference, a comparison of the P MRE , P RMSE , and P MAPE of all the other models for the two wind turbines is shown in Tables 24 and 25.
The following inferences can be drawn from the comparison results given in Tables 2-25.

1) SDAE VS. BPNN
The SDAE approach can provide more accurate forecasting results than the BPNN, as indicated by the P MRE , P RMSE , and P MAPE for the 30

2) MMMD-SDAE VS. SDAE
The SDAE model demonstrates a stronger generalized ability when performing the MMM decomposition technique before forecasting. Wind turbine #1 had time scales of 30 min, VOLUME 8, 2020   1 h, and 2 h. The improvements in P MRE were 72.01, 64.20, and 56.62, respectively, whereas those in P RMSE were 70.84, 64.34, and 55.61, respectively, and those in P MAPE were 47.16, 40.17, and 34.14, respectively. For wind turbine #2    subsequences directly in the time domain; hence, it can avoid errors during the signal transfer process and greatly increases the forecasting accuracy and stability.

3) K-MEANS-SDAE VS. SDAE
Relative to the traditional SDAE model, the accuracy of the prediction result will improve when adding the K-means clustering algorithm. For wind turbine #1, the time scales were 30                MMMD-K-means-SDAE model were closer to the actual values.

5) MMMD-K-MEANS-SDAE VS. BPNN
Compared to the BPNN approach, the proposed hybrid MMMD-K-means-SDAE produces a better forecasting performance. For wind turbine #1, the time scales were 30 min, 1 h, and 2 h. The improvements in P MRE were 86.22, 80.25, and 76.78, respectively, whereas those in P RMSE were 85.20, 79.81, and 77.16, respectively, and those in P MAPE

10) MMMD-K-MEANS-SDAE VS. VMD-K-MEANS-LSTM MODELS
Compared to the VMD-K-means-LSTM model, the proposed hybrid MMMD-K-means-SDAE produced a better forecasting performance. For wind turbine #1, the time scales were 30 min, 1 h, and 2 h, respectively. The improvements in P MRE were 13.45, 12.92, and 23.04, respectively, whereas those in P RMSE were 11.94, 9.62, and 15.26, respectively, and those in P MAPE were 6.96, 6.69, and 12.27, respectively. For wind turbine #2 at the same time scales, the proposed hybrid model could improve P MRE by 2.55, 17.31, and 25.62, respectively, P RMSE by 1.32, 14.01, and 23.37, respectively, and P MAPE by 1.28, 9.06, and 13.76, respectively. The prediction results from the proposed MMMD-K-means-SDAE model are closer to the actual values.

IV. CONCLUSION
This article proposed a hybrid MMMD-K-means-SDAE model for short-term wind-speed forecasting on multiple scales. In the proposed model, MMMD analysis and K-means clustering are used as signal-decomposing and data-mining algorithms, respectively, and SDAE is utilized as the core forecasting unit to capture the characteristics of each cluster. The predictions of all subsequences are then aggregated to obtain the final predicted wind speed. The complexity of the proposed model is constant, and it does not vary with the size of the datasets. However, when the amount of data increases, the operating cost of the model, such as the computing time and hardware cost, will increase; therefore, the total cost of the model will change with the size of the datasets.
Five different benchmark models and two novel hybrid wind forecasting models were implemented for comparison with our proposed model. First, the results show the superior ability of the SDAE in processing nonlinear and nonstationary wind-speed signals compared to the traditional BP network. Second, we use MMMD technology to decompose wind-speed sequences into time-domain subsequences in order to avoid errors during the signal transfer process and significantly promote the reliability and precision of wind-speed forecasting. Third, the K-means algorithm, as a clustering analysis approach, can further enhance the prediction ability of the MMMD-SDAE model. Finally, the main advantage of the proposed model is that its prediction accuracy is higher than that of five benchmarks models, particularly two novel hybrid models. Furthermore, we use MMMD to decompose the original wind-speed sequence directly in the time domain; this is the first time a time domain signal-decomposing tool has been used in a hybrid wind-speed forecasting model However, many outliers exist in the data owing to the natural characteristics of wind speed. When the volatility of wind speed is high, the amount of outlier data will increase, and the accuracy of the prediction results as well as reliability of the entire system will be significantly reduced. Furthermore, the effective prediction time range is short (30 min to 2 h). Beyond this prediction range, the prediction sensitivity decreases with time. When the time scale exceeds 48 h, the model cannot provide a valid prediction. These two limitations are the main topics that will be addressed in future research.
WEICHAO DONG was born in Shijiazhuang, Hebei, China, in 1989. He received the B.Sc. degree in electrical engineering and the M.Sc. degree in electrical and computing engineering from Cornell University, NY, USA, in 2012 and 2013, respectively. He is currently pursuing the Ph.D. degree in control theory and control engineering with the Hebei University of Technology, Tianjin, China. He is a Lecturer with the College of Electrical Engineering, Hebei University of Science and Technology, China. He has published two articles on SCI. His research interests include optimization of wind engine structure, wind power, and the application of deep learning in wind energy.
HEXU SUN (Senior Member, IEEE) received the Ph.D. degree in automation from Northeastern University, Shenyang, China, in 1993.
He has been a Professor with the School of Control Science and Engineering, Hebei University of Technology, Tianjin, China, and the School of Electrical Engineering, Hebei University of Science and Technology, Shijiazhuang, China. He has authored five books and more than 130 journal and conference papers. He holds 13 U.S. patents and five computer software copyrights. His current research interests include robotics and complex engineering systems.
Dr. Sun was a recipient of many prestigious national awards from China. He has been the Director in many societies and committees in China. He is currently the invited Plenary Speaker and the General Co-Chair of many international conferences. Since 2007, he has been a Lecturer, an Associate Professor, and a Professor with the School of Electrical Engineering, Hebei University of Science and Technology. From July 2013 to July 2014, he was a Visiting Scholar and a part-time Faculty with the College of Engineering, Wayne State University, USA. He is the author of more than 160 published articles. His current research interests include design, analysis, and control of novel motors and actuators, intelligent control, and power electronic.
Dr She has been a Professor with the School of Chemical Engineering, Shijiazhuang Tiedao University, since 2002. In 2012, she was a Visiting Scholar and a part-time Faculty with the College of Chemical Engineering, University of Delaware, USA. She has authored more than 30 journal and conference papers. Her current research interests include nano material, materials science, and application of materials science in new energy sources. She was a recipient of many prestigious national awards from China.