Wind Speed Forecasting Using the Stationary Wavelet Transform and Quaternion Adaptive-Gradient Methods

Accurate wind speed forecasting is a fundamental requirement for advanced and economically viable large-scale wind power integration. The hybridization of the quaternion-valued neural networks and stationary wavelet transform has not been proposed before. In this paper, we propose a novel wind-speed forecasting model that combines the stationary wavelet transform with quaternion-valued neural networks. The proposed model represents wavelet subbands in quaternion vectors, which avoid separating the naturally correlated subbands. The model consists of three main steps. First, the wind speed signal is decomposed using the stationary wavelet transform into sublevels. Second, a quaternion-valued neural network is used to forecast wind speed components in the stationary wavelet domain. Finally, the inverse stationary wavelet transform is applied to estimate the predicted wind speed. In addition, a softplus quaternion variant of the RMSProp learning algorithm is developed and used to improve the performance and convergence speed of the proposed model. The proposed model is tested on wind speed data collected from different sites in China and the United States, and the results demonstrate that it consistently outperforms similar models. In the meteorological terminal aviation routine (METAR) dataset experiment, the proposed wind speed forecasting model reduces the mean absolute error, and root mean squared error of predicted wind speed values by 26.5% and 33%, respectively, in comparison to several existing approaches.


I. INTRODUCTION
Renewable energy plays an increasingly imperative role in the global energy market [1]. Among renewable energy resources, wind energy has attracted much attention due to its mature technology, low cost, and climate change impacts regarding reducing environmental pollution. Currently, wind power is one of the fastest-growing renewable energy technologies, and according to the global wind energy council report [2], 2019 witnessed new wind power installations surpassing 60 GW globally, representing a 19% increase compared with 2018, bringing the total installed capacity to The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Asif . 650 GW, a rise of 10 % compared with the preceding year. The world's top market in new installations in 2019 was China, with the installation of more than 2.3 GW offshore wind in a single year. Generally, accurate wind speed forecasting is a precondition for constituting an advanced and economically viable control strategy in a modern power system, e.g., model predictive control [3]. In addition, forecasting errors can significantly affect the cost of balancing the power system [4]. Therefore, accurate short-term wind speed prediction is essential for reducing wind farm operations and maintenance costs [5]. The generated power will depend on wind speed and the design of the turbine. Forecasting wind speed instead of generated power avoids dependencies on generator design.
In recent decades, a large number of forecasting models for wind speed have been developed and used to compute estimates of wind energy. Wind speed prediction models can be divided into three categories: physical models, statistical models, and machine learning models. Physical models consider weather conditions such as wind speed time series, air pressure, humidity, and temperature to obtain wind speed forecasts. These numerical weather prediction models (NWPs) divide the atmosphere into 3D cubes and solve weather parameter equations for each atmospheric variable at each grid point. The weather research and forecasting model (WRF) [6], the COSMO model [7] and MM5 [8] are examples of prediction models in this category.
In general, NWP models are not suitable for short-term wind speeds because of their complex calculation processes and poor performance [9]- [11].
Statistical models extract rules that govern the relationship between sequences of previous measurements and use these rules in prediction. When compared with physical methods, these methods can provide more accurate results for shortterm wind speed forecasting. Examples of methods used in this category include the autoregressive integrated moving average (ARIMA) [12], Markov chains [13], and Kalman filtering [14].
Machine learning models are the third class of forecasting algorithms. They have been widely applied in predicting wind speed with good learning ability and nonlinear mapping ability [15]. Examples of these methods include neural networks, fuzzy-based systems, and decision trees. For instance, Zhou et al. proposed a long-short-term memory (LSTM) based lower and upper bound estimation model to construct the prediction intervals of wind power [16]. Kisvari et al. [17] presented a predictive model that uses gated recurrent units (GRU) and 12 features for wind speed forecasting. Examples of these features include wind speeds at four different heights, generator temperature, and gearbox temperature. Extreme gradient boosting (XGBoost), which is a form of gradient boosting decision tree, was used in [18] for wind speed forecasting.
Hybrid models can be formed by combining methods from the preceding categories. The main reason for integrating forecasting models is that a single forecasting method often has some essential weaknesses that lead to poor prediction results that do not adapt to the complex and changeable environment [19]. For example, a hybrid deep neural network model based on a stacked autoencoder and an LSTM network to forecast wind speed was proposed [20]. The presented hybrid model reduced the mean absolute error in prediction by 13% compared with a nonhybrid version of the model. In [21], a short-term multimodal wind-speed prediction framework was proposed based on rough artificial neural networks and stacked denoising autoencoders. Khosravi et al. [22] combined an adaptive neuro-fuzzy inference system with a particle swarm optimization algorithm to predict wind speed, wind direction, and a wind turbine's output power. The combined model showed considerable improvements in prediction accuracy compared with models that use either adaptive inference systems or particle swarm optimization. Statistical and neural network-based approaches were combined to predict hourly wind speed data [23]. The hybridization of ant colony optimization and particle swarm optimization for forecasting wind speed was proposed in [24]. This hybridization approach showed better wind speed forecasting results than other nonhybrid models.
Signal decomposition algorithms, such as the wavelet transform and empirical mode decomposition, provide efficient means for adapting the learned forecasting models to the different components of the original wind speed series. These methods were used successfully to enhance the performance of several wind speed forecasting methods. For example, Hu et al. [15] proposed a prediction model based on variational mode decomposition and an improved and echo state network. The proposed algorithm outperformed nine comparative models in four wind speed datasets. Variational mode decomposition (VMD) was used in [25] to decompose wind speed data into different subbands. LSTM units were used to predict the main trend, and a kernel density estimation method was used to perform predictions on the residual part. A similar approach was offered in [26], where VMD decomposition signals were used as input to convolutional LSTM prediction units.
The wavelet transform and packet decomposition are often combined with other AI models for wind power and wind speed forecasts [27]. For example, in [28], wavelet soft threshold denoising and gated recurrent units were combined to forecast wind speed. In the proposed model, denoising by wavelet soft threshold was used to filter the noisy samples from the wind signal, and a gated recurrent unit was used as a forecaster. Liu et al. [29] used the empirical wavelet transform to decompose raw wind speed data into several sublayers. The signal's low-frequency components were forecasted using a long short-term memory neural network, and an Elman neural network was used to predict high-frequency components. Aasim et al. [30] combined the wavelet transform with an autoregressive integrated moving average (ARIMA) model to forecast wind speed data. The wavelet representation was combined with adapted LSTM units to predict wind speed in [31].
The existing hybrid models based on the wavelet decomposition technique and neural networks treat each wavelet subband individually. However, significant correlations exist between these subbands [32], and improved forecasting results can be achieved by developing algorithms that exploit these dependencies between wavelet subbands.
To address this limitation, we propose using quaternion vectors to represent wavelet subbands. A quaternion valued neural network (QVNN) is used in this paper to model the relationship between quaternion inputs and outputs representing previous and forecasted wind speed values. QVNNs have quaternion inputs and outputs and use quaternion weights and bias parameters. Representing wavelet subbands as quaternion vectors forms a unifying representation that avoids separating the naturally correlated sequences. QVNNs have achieved improvements in several tasks in image, speech, and signal processing [33]. They were recently used for timeseries forecasting [34], where they achieved performance levels surpassing real-valued neural networks.
In the proposed system, wind speed data are first decomposed into wavelet levels using the stationary wavelet transform ( Fig. 1). The resulting coefficients are used to train the QVNN to predict the wavelet subbands representing the forecasted value. The inverse stationary wavelet transform is then used to reconstruct the predicted wind speed. The adaptive quaternion learning rate algorithm is developed and used to enhance the performance of the model. Finally, the performance and convergence speed are improved by using a softplus function within the RMSProp based optimizer [35]. The introduction of the softplus function was shown to calibrate the learning rate and lead to improvements in convergence speed.
The major contributions of this paper are as follows: • We propose a novel wind-speed forecasting model that combines the stationary wavelet transform (SWT) and quaternion-valued neural networks (QVNN). To the best of our knowledge, the hybridization between the QVNN and SWT has not been proposed before in the literature. This hybridization provides a compact and unifying representation of the different wavelet subbands.
• We propose a quaternion version of RMSProp, the adaptive learning rate algorithm, to improve accuracy and convergence speed of the proposed quaternion neural network model.
• We enhance the developed quaternion RMSProp algorithm with a softplus function to further improve the performance and convergence speed of the proposed model. In addition, the proposed softplus function prevents the forecasting model from overfitting. The developed wind speed forecasting method is tested on wind speed data collected from different sites in China and the United States. The results demonstrate that the developed model outperforms several widely used and recently proposed wind speed prediction models.

II. PROPOSED METHODOLOGY
This section describes the three main components used in the presented model: the stationary wavelet transform, the QVNN, and the quaternion soft plus RMSProp learning algorithm.

A. THE STATIONARY WAVELET TRANSFORM
The stationary wavelet transform (SWT) [36] is designed to solve the shift-invariance issue in the discrete wavelet transform. The wavelet transform decomposes the source signal into different levels; the resulting subsignals are of a length that is equal to 1/2 of the approximation signal at the preceding wavelet level. The SWT removes the downsampling operator from the usual implementation of the DWT. The resulting subsignals in the SWT have the same length as the source signal, this is a desired property for the proposed wind speed forecasting model because this equality in length in wavelet subbands allows the formation of a quaternion vector representing the four different wavelet components of the signals at each time step. This paper develops a quaternion-based forecasting system where each input signal is represented by a quaternion vector containing information from the four different wavelet subbands ( Fig. 1). Similar to the DWT, the SWT decomposes the input series into sets of low and high-frequency coefficients called approximation and detail coefficients; however, the output signal is nondecimated (i.e., downsampled). The approximation components present the general trend of the time series, whereas the detail coefficients describe the small variations in the series (i.e., high-frequency components). The decomposition can be demonstrated as a dyadic tree [37]. Fig. 2 shows an example of a two-level decomposition based on the SWT. For a given signal u (t), the SWT decomposes it into two coefficients: approximation coefficients A 1 (t) and detail coefficients D 1 (t). These coefficients represent the convolution results produced by low-and high-pass filters. This decomposition process is repeated using approximation coefficients as input in each subsequent decomposition level.

B. THE QVNN AND THE QUATERNION SOFTPLUS RMSPROP LEARNING ALGORITHM
The proposed wind speed forecasting system uses a QVNN with three layers. The input layer receives the quaternionvalued form of the two-level SWT approximation and detail The hidden layer has m neurons, and the output layer with one neuron produces a one-step forecast of the SWT approximation and detail coefficient levels (Fig. 1). The inverse SWT is used to compute the predicted time series by combining the three predicted coefficients, i.e., A 2 (t), D 1 (t), and D 2 (t).
The QVNN layers are connected with quaternion-valued weights w I and w II . The hidden and output layers have quaternion-valued biases b I and b II .
The predicted quaternion QVNN output can be computed as: where Re is the real part of the predicted vectorŷ, Im i , Im j and Im k are the imaginary parts in the i, j, and k complex dimensions of the quaternion vector, respectively; and ϕ is the nonlinear sigmoid function given by the following equation: The quaternion vectorỹ is computed using where p = 1, . . . , m and h p is the p th hidden neuron's output given as follows whereh p is given byh where u is the input which contains the quaternion-valued representation of the two level SWT approximation and detail coefficients.
The objective of this model is to find the network's optimal weights and bias parameters that minimize the sum-squared error at the output layer, which can be written as The superscript ' * ' represents the conjugate operator, and H is the Hermitian operator.
where e l is the lth error between the l th desired output y and the lth estimated outputŷ, l = 1, . . .N . We develop a quaternion version of the RMSProp learning algorithm to train the proposed QVNN. We also equip this algorithm with a quaternion softplus function (Algorithm 1) to accelerate the convergence rate [35]. v θ (k) is the secondorder quaternion momentum calculated as a combination of previous and current squared stochastic gradients.
To compute the gradient of the bias, we have: Algorithm 1 Quaternion SRMSProp 1: Inputs: θ, u, y(k + 1) ∈ Q, learning rate η = 0.001, Parameters β 2 = 0.999, β = 50, 2: Initialize: m θ (0) = 0, v θ (0) = 0, 3: for k = 1 to T do 4: Compute the stochastic gradient of all weights and biases For the weight we have: For the bias term, we have: For the weights, w I = Re w I + iIm i w I + jIm j w I + kIm k w I we have: The real-valued RMSProp gradient update equation given by [41]: where , θ can be any weight or bias of the QVNN and ε is a small value that is set to equal 10 −8 . The moving average parameter β 2 should be strictly greater than 0 and less than 1.
RMSprop converges if β 2 is large enough, i.e., near to one. β 2 = 0.999 was found to be optimal for most applications.
In the proposed forecasting model, we extend Eq. (17) into the quaternion domain and equip it with a softplus activation function. The softplus RMSProp gradient update is computed using [35] In this paper, we use a quaternion softplus function, which has the same form as its real-valued analog and is given by where Q log is the quaternion natural logarithm and is defined as where q p = x 2 1 + x 2 2 + x 2 3 is the magnitude of the imaginary component in the quaternion number q = x 0 + ix 1 + jx 2 + kx 3 , and |q| = x 2 0 + x 2 1 + x 2 2 + x 2 3 is the magnitude of the whole quaternion number. The parameter β(eq. 19) can be any non-negative and non-zero real number.
The square root of a quaternion is computed using Euler rotation angles [42] and is given by where [ϕ, ϑ, ψ] We use quaternion divisions to divide the quaternion gradient ∇ θ E by the term Qsoftplus √ v θ (k) . The quaternion division of a quaternion number S is computed using 1 S = S * SS * = S * |S| 2 . The quaternion weights and biases are initialized as random uniformly distributed unit quaternions.
The performance of the network is strongly influenced by the number of neurons in the hidden layer and the parameter β (eq. 19). The relationship between these two parameters is studied in detail in Section III.

C. PROPOSED FORECASTING METHOD
The proposed forecasting method uses a training stage, where the developed system is trained, and a testing stage where the developed system is used to forecast wind speed values. These stages are summarized below.

1) TRAINING STAGE
1) Obtain a window of n wind speed measurements. This window represents a single training sample. In our experiments, we used a value of n = 100. 2) The SWT with two decomposition levels is applied to the wind speed values (Fig. 2).

3) Use the SWT coefficients representing the last sample,
i.e., sample n, to train a QVNN to forecast the SWT coefficients representing the next sample n + 1. 4) Apply the inverse SWT to a vector composed from the SWT coefficients predicted in step 3 and concatenated with the past n coefficients. 5) Use the (n + 1) th value of the inverse SWT output as the predicted wind speed value. 6) Compute the mean squared error between the actual and predicted wind speed values. 7) Update the network parameters. 8) Shift the training window by one sample to obtain a new vector of wind speed measurements and iterate until the desired number of epochs is achieved or the mean squared error is below or equal to 0.001.

2) TESTING STAGE
1) Obtain a vector of n past wind speed samples (100 samples in our case). 2) Use the pretrained QVNN to forecast the SWT coefficients representing the next sample n + 1 based on the SWT coefficients representing sample n. 3) Apply the inverse SWT to a vector composed of the SWT coefficients predicted in step 2 and concatenated with the past n coefficients. 4) The predicted wind speed value is the (n + 1) th value of the inverse SWT output vector.

D. EVALUATION CRITERIA
The model's performance is evaluated using the root mean squared error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE). These metrics are given as follows: where y k is the k th sample value in y,ŷ k is the kth forecasted value, and N is the total number of samples.

III. RESULTS AND DISCUSSIONS
To evaluate the proposed QVNN-SWT-SRMSProp model, we use four wind datasets, three of which are obtained from the MERRA-2 project for three different areas in China: Peng Lai, Hebei, and Inner Mongolia [46]. The fourth dataset is a real-world wind speed dataset provided by Meteorological Terminal Aviation Routine (METAR) [47].

A. CASE STUDY I: FORECASTING WIND SPEED AT THREE SITES IN CHINA
This example investigates the developed system's performance on data from three different areas in China: Peng Lai, Hebei, and Inner Mongolia. The real-valued neural network (RVNN) or the so-called multilayer perceptron, complexvalued neural network (CVNN), the long-short term memory (LSTM) network, and quaternion-valued neural networks (QVNN) are considered for comparison purposes. The stopping criteria are either a root mean squared error less than 0.001 or a number of epochs higher than 100. The data are sampled at a standard height of 10 meters, with a temporal resolution of 10 minutes.
Wind speed values sampled at the standard height are first extrapolated to values at the hub height using the following empirical power law [48]: where v 1 and v are the wind speeds at the standard height h 10 = 10 meters and the hub height h in meters, respectively, and α is the roughness factor equal to 1/7 [48]. By substituting h = 50 meters and the measured wind speed data at the standard height of 10 meters into Eq. (26), the wind speed values at the hub height are obtained. The wind speed data was time-averaged over an hour and timestamped with the central time of the interval, starting at 00:30 UTC [46]. Wind speed data for the three areas were obtained from May 15, 2019, to May 15, 2020. The first nine months of data (i.e., May 15, 2019, to Feb 15, 2020) are used to train the networks, and the remaining three months of data are used to validate the network's performance. Wind speed data were scaled to be in the range of 0.1-0.9. The quaternion weights and biases were initialized as random uniformly distributed unit quaternions.
We first studied the effects that the parameter β and the number of hidden neurons have on the performance of the developed model. We trained the developed system on wind speed data from the Inner Magnolia region using numbers of hidden neurons that varied between 5 and 200. In addition, we tested the performance of the forecasting network using different values of the parameter β = {0.1, 0.5, 1, 5, 50, 100, 500, 1000}. The results are shown in Fig. 3, where one can observe a surface of multiple groups of local minima that are close in magnitude at β values between one and five. The overall minimum in terms of MAE (MAE = 0.056) and RMSE (RMSE = 0.90) is achieved using β = 5 and 110 neurons. The MAPE value at these parameters is close to the MAPE global minimum. This pair of values will be used in the remaining experiments in this paper. In addition, we tested different wavelet families and the Haar wavelet achieved the best results, so we use it in all experiments.
The results from the three sites are presented in Table 1, in which we can see that our model outperforms all other models in all metrics. In this table, methods that don't use the SRMSProp algorithm are optimized using SGD (i.e., a fixed learning rate). We would like to note CVNN-SWT results indicate overfitting, and this can be fixed by reducing the number of hidden neurons n is less than or equal to 60. The performance of the proposed system at this setting remained better than the other forecasting models.
In general, models that use the stationary wavelet transform and the SRMSProp optimizer outperform the remaining methods. Tables 2 and 3 (Table 2), and significantly improved the results when n = 10 (Table 3).
Comparing the proposed model with LSTM-SWT-SRMSProp, which is the model that produced the closest results, the proposed QVNN-SWT-SRMSProp model  with high-frequency content. Similar observations can be observed in the results from Hebei and Inner Mongolia in Fig. 6 and Fig.8.
A zoomed-in comparison between the proposed model and its real-valued counterpart for the Hebei and Inner Mongolia sites is presented in Fig. 7 and 9. The proposed model outperforms the equivalent RVNN-SWT-SRMSPRop model.
Bar graph plots comparing the proposed model with other SRMSProp models in terms of the MAE and RMSE are shown in Fig. 10. The proposed model offers the best wind speed forecasting predictions among the presented systems. This figure indicates that for all sites, the introduction of the SWT improves the correlation values between the measured and forecasted time series and produces outputs with close standard deviation and less root mean square difference.      and New Hampshire weather forecasts provided by the Meteorological Terminal Aviation Routine (METAR) [47]. This dataset consists of 6361 data points that are not incorporated into a standard grid and have an hourly frequency.  We use 5700 samples to train the networks, 300 samples for validation, and 361 for testing. This setup allows comparisons with the existing methods in the literature. The data were measured every 6 h, and the goal was to forecast each hour until the next measurement.
The METAR case results are summarized in Table 4. We computed the Persistence, MLP, and LSTM models; other results are from [47]. We can see that the proposed model outperforms all other forecasting strategies in both evaluation metrics.
with the LI-LW-CNN, which is the model that produces the nearest results to the proposed model.
A bar graph comparing the proposed model with the remaining models in terms of the MAE and RMSE is shown in Fig. 11.
Despite the effective results achieved by the proposed SWT-QVNN-SRMSProp wind speed forecasting model, we discuss two limitations of the proposed approach.
First, our proposed system is learning-based and may fail when faced with circumstances not observed during training. One potential way to alleviate this issue is to dynamically update the model with new training data in an attempt to increase the size and variability of input data. The second limitation is that our proposed model utilizes quaternion multiplications that can slow the training process.
Computing the Hamilton product of two quaternion neurons requires 28 operations while a single multiplication operation is required to multiply two real neurons [55]. In our experiments, we observe that training a single epoch in a realvalued neural network requires approximately 0.3 seconds and 102.6 seconds in the equivalent quaternion-valued neural network. This problem can be mitigated by developing efficient GPU-based implementations of quaternion multiplications.
In this work, the proposed forecasting model relies only on wind speed data, but other weather parameters like temperature or humidity can be used to improve forecasting performance.

IV. CONCLUSION
In this paper, a novel approach for forecasting wind speed is presented. The developed model uses the stationary wavelet transform and quaternion-valued neural networks. Furthermore, a novel quaternion version of the softplus RMSProp learning algorithm was developed to improve the proposed model's prediction accuracy. Four real-world wind prediction datasets were used to demonstrate the excellent forecasting performance of the proposed model.
Experimental results indicate that the proposed model can effectively forecast wind speeds, particularly over the short term. Therefore, the proposed model is reliable and useful for predicting wind speeds in modern power system management systems.  He has published more than 300 papers in multimedia signal and image processing. He has supervised more than 40 M.Sc. and Ph.D. students. His current research interests include different aspects of multimedia signal and image processing, seismic applications, biomedical signal processing, and diverse applications of artificial intelligence and machine learning. He was a recipient of the IEEE Third Millennium Medal. He also received Shauman Best Researcher Award, and both the Excellence in Research and Excellence in Teaching Awards at KFUPM. He delivered numerous invited talks and chaired several conferences, including GlobalSIP-MPSP, IEEE Gulf (GCC), Image Processing Tools and Applications, and TENCON (a Region ten conference). He is the Founding Associate Editor for the International Journal of Sensors, Transducers and Instrumentation Systems. He serves as an Associate Editor for EURASIP Journal for Image and Video Processing, and was with Electronics Letters (IET), International Journal of Digital Signals and Smart Systems, International Journal on Imaging and Graphics, and Electronic Letters on Computer Vision and Image Analysis. VOLUME 9, 2021