Wind Speed Prediction using Hybrid 1D CNN and BLSTM Network

As the world witnesses population increase, the global power demand is increasing and the need for exploring other alternative clean and self-renewable sources of energy such as wind has become necessary. For optimal operation of the wind farms and stability of the grid, wind prediction ahead of time is of key importance. An accurate forecast of wind speed is often difficult due to the unpredictable nature of the wind. In this work, we utilized different machine learning models and proposed a hybrid machine learning approach. This approach combines 1D convolutional neural network (CNN) and bidirectional long short term memory (BLSTM) network for accurate prediction of short term wind prediction at different heights above ground level (AGL). The 1D CNN model extracts high-level features of the input wind speed data. The extracted features are then fed as input to the BLSTM network for wind speed prediction. The wind speed time series data used in this study are measured at 18, and 98 meters AGL. The study further presents a relationship between the utilized models and prediction accuracy at different heights. The forecasting performance of the models tends to increase as the height AGL increases. A real-world case study is implemented to demonstrate the effectiveness of the proposed CNN-BLSTM method in Saudi Arabia. The mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE) are used as performance indices to evaluate the performance of the proposed CNN-BLTSM model. The corresponding results show that the proposed method outperforms other benchmark models.


I. INTRODUCTION
Wind power plays an important role in global energy growth due to its clean, pollution-free, and self renewing nature. To integrate wind energy into the existing power system, prior and future knowledge of the nature of wind speed is required. In the past decades, researchers and electric utility planners have been driven in a trajectory to provide more advanced solution for wind forecasting [1]- [7]. Some of the approaches are physical [8], statistical [9], as well as hybrid framework [10]- [12] for better accuracy and stable deterministic wind forecast. The physical methods use physical model and me-teorological data such as atmospheric pressure, temperature, humidity, dew point, and so on for wind speed prediction. One of the examples of the physical method is the numerical weather prediction method which adopts meteorological data series for prediction. As a result of high computational complexity and high-end computational resources, this approach is less popular for short-term wind speed forecasting. In some cases, the physical models are used as the initial step for wind speed forecast and may be supplied as input to other statistical models. Meteorologists often employ numerical weather prediction (NWP) models to predict large-scale area weather, such models do not work well in short-term prediction scenarios. To achieve more accurate results, NWP solves the conservation equation numerically. Moreover, to enhance the model accuracy topology may be represented in the model by incorporating digital elevation models [13]. Finally, the remaining error is further reduced by using model output statistics (MOS). The authors in [14] discussed an automatic online forecasting system that utilizes the NWP for large-scale forecasting that employs the wind atlas analysis and application program (WAsP) and also considers the effect of obstacles and roughness. The shadowing effect of the turbine is tackled with the park program. Finally, the results are supplied to the MOS model. The results show that the proposed model significantly outperforms the persistence model after 6 hours and the mean absolute error of about 15% of the installed capacity. The authors in [15] attempted to use the NWP model for short-term prediction with the help of digital elevation models (DEMs) and MOS correction. However, the performance deteriorates in the case of very short-term fame.
The second category is the statistical approach. These methods utilize numerous historic data for wind speed forecasting and are easy to implement. Some popular methods among these categories are the Auto regressive (AR), AR integrated moving average (ARIMA), fractional-ARIMA (F-ARIMA), and seasonal-ARIMA (SARIMA) methods. The authors in [16] used the AR model to predict wind speed at three Mediterranean cities in Corsica. The authors demonstrated that the mean statistical characteristics of the observed wind speed data can be reproduced using the AR model. Another study [17] employed the F-ARIMA model for wind speed prediction in North Dakota for a day-ahead and twodays-ahead scenario and concluded that the proposed model performed better compared to the persistence model. The authors in [18] adopted a combined model based on nonlinear AR model and Gauss-Newton algorithm to forecast ten-min and one-hour ahead wind speed. The Kalman filter model is another notable statistical model which is highly suitable for online forecasting of wind speed. The model works by establishing a state-space model which considers wind speed as a state variable. The authors in [19] employed the Kalman filters to predict wind speed for hourly data and 5min time step. The results showed that the Kalman filter model outperformed the persistence method in the case of 5 min time step. However, the persistence method performed better in the case of hourly data. The linear nature of the statistical methods serve as a hindrance in tackling wind speed prediction with nonlinear factors.
In recent decades, several wind speed forecasting models have emerged in the literature [1], [20]- [25]. These models can be classified into three categories based on their forecast time horizon. The first category is the short-term, which involves forecasting of wind speed from several minutes to hours ahead. Most real-time electricity market, grid regulation, and economic dispatch depend on this type [26]. The next category is the medium-term forecasting which ranges from several hours to weeks. The medium-term forecasting is mainly used in reserve market and unit commitment. The final category is referred to as the long-term prediction which ranges from a week to years ahead. These are mainly applied for maintenance planing, expansion, and study of wind power plants.
In recent years, artificial intelligent methods have become popular for wind speed forecasting due to the development of soft computing technologies. This technique helps in capturing the randomness, non-stationarity, and non-linearity associated with the wind speed data which then facilitate accurate prediction ahead of time. The artificial neural network (ANN) has recorded much success in wind speed prediction problems. Generally, the current artificial intelligence-based prediction methods can be divided into two classes as conventional machine learning and deep learning methods [27]. A dominant example of the conventional method which has been highly exploited for wind speed prediction is the support vector machines (SVM) [28]- [30]. For example, the authors in [28] and [29] discussed the particle swarm optimization (PSO)-based reduced SVM and the genetic algorithm (GA)based SVM for wind speed prediction respectively. The authors in [30] employed the cuckoo search algorithm (CSA) to alleviate the influence of parameters variant-SVM (v-SVM) and as well enhance the prediction results of the v-SVM. Other examples in the conventional group are the ANN-based methods which include Bayesian network (BN) [31], backpropagation (BP) [32], radial basis function [33], wavelet neural network (WNN) [34], Elman neural network (ENN) [35], and extreme learning machine (ELM) [36].The ANN has been widely used for wind speed prediction [37]. The authors in [38] developed an RBFNN model or short term wind speed forecasting. The proposed model outperformed four other models developed based on ANN. The authors in [39] proposed a model based on ELM The nmerical results show that ELM is effective for ind speed forecasting.
The other class of artificial intelligence-based prediction methods namely the deep learning approaches have received wide acceptance among researchers. For example, the authors in [40] developed six different LSTM networks for wind speed prediction with a 10 min and 1 hour intervals. In recent times, the recurrent neural network (RNN) methods have been widely adopted by the researchers. The RNN can maintain the state between two different inputs which gives it an edge in handling time series or sequences. However, it has a shortcoming of vanishing and exploding gradients. To address this problem, a variant of the RNN, known as the long short term memory (LSTM) is used. The vanishing and exploding gradient problems are solved using memory cells that control a gate. The LSTM has been applied extensively for time series prediction [41], [42]. Recently, an RNN type that uses both past and future context to obtain its output has emerged. This network is termed as bidirectional LSTMS (BLSTM). It connects the output from two different hidden layers to the same output making them bidirectional. As a result of fluctuating nature of wind speed and hence the power, the higher feature detection capability of convolutional neural network (CNN) and the advantages of the BLSTM, wind speed prediction using a combination of these two algorithms seems to be interesting for accurate results. The Convolutional neural network is another interesting deep learning approach that has been used to solve different problems such as object detection, natural language processing, pattern recognition, image classification, and many others. As a result of the increasing demand for deep learning accuracy and efficiency, many researchers have proposed the use of CNN with other deep learning approaches to form a hybrid network that can improve time series forecasting performance. The authors of [43] proposed a deep learning approach for the automatic recognition of workers' unsafe actions using a hybrid CNN and LSTM. The method outperformed the state-of-theart descriptor-based method for detecting points of interest in images. The authors in [44] used a hybrid CNN LSTM approach to increase the accuracy of emotional models by 20 percent compared to the conventional MLP model using raw data. The authors of [45] improved urban traffic flow prediction using a hybrid neural network and the greedy algorithm. The authors of [46] proposed LSTM fully connected (LSTM-FC) network to predict PM2.5 concentration. The authors of [47] used a hybrid CNN-LSTM model to predict the next day's ozone concentration. It is important to point out that there are other hybrid approaches. The authors in [48] proposed a hybrid method based on discrete wavelet transform (WT), twin support vector regression (TSVR), random forest regression (RFR), and CNN deployed for onshore, offshore, and hilly site for wind speed predictions. The results showed that the proposed hybrid methods outperformed the SVM, ANN, and ELM methods. The authors in [49] proposed a hybrid machine learning method which deployed variant of SVR built on wavelet transform. The results showed the effectiveness of the proposed approach. The authors [50] developed a hybrid framework that estimates the prediction intervals of wind speed using machine learning methods in combination with a multi-objective salp swam algorithm. The results showed that the proposed method outperformed other benchmark methods significantly.
Accurate and reliable prediction of wind speed is often a challenging task due to its stochastic nature, high rate of changes, atmospheric pressure, terrain, dependency on elevation, and atmospheric temperature which influence the wind speed at any given time. A hybrid algorithm that is capable of mitigating the above effects is thought to be useful to resolve the present problem. It well established that wind speed varies with height due to several obstacles available at different height within the environment. Such obstacles does not just affect the wind speed but also influence its flow direction. Furthermore, machine learning hybrid algorithms have evolved in the quest for obtaining higher forecasting accuracy. The main motivation of this work is to examine the forecasting error at different height using a hybrid algorithm. Hence, we developed a hybrid CNN-LSTM model for forecasting wind speed. The accuracy of the proposed approach is compared with other methods. Finally, four performance indicators are used to evaluate the models, which are mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE).
The main contributions of the proposed work can be summarized as follows: • Exploring the bidirectional processing and long range memory of the BLSTM as a deep learning architecture for wind speed forecasting which has the merit of capturing the detailed deep temporal characteristics of the data. • Verifying the ability of CNN to extract high-level feature from the input wind speed data and ensuring high prediction capability, accuracy, and stability of the proposed hybrid model that combines the merits of CNN and BLSTM. • Investigation the forecasting performance of the proposed method and comparing the performance with some of the other existing methods for different wind speed time series measured at different heights AGL.

II. METHODOLOGIES
In this section, the fundamentals of convolutional neural network (CNN) and long short-term memory (LSTM) methods are discussed. These methods have been used extensively for image analysis, speech recognition, text understanding [51] and natural language processing. The main advantage of the CNN method is to extract high-level features [52] and the LSTM has the advantage of mining time series data [53]. Hence, the two models are combined to build a hybrid model to take advantage of the CNN high-level feature extraction and time-series mining of the LSTM. Next, the baseline and linear models which are used as a benchmark, and then a more complicated LSTM, CNN, and DNN models are discussed.

A. BASELINE MODEL
To have baseline performance, the study developed a baseline model to compare the performance with the machine learning models. The baseline model returns the current value of the wind speed as the forecast with no changes. This is sometimes reasonable if the wind speed changes slowly. However, it deteriorates as one try to predict further in the future.

B. LINEAR MODEL
Here, the main idea behind the linear model is to establish a linear transformation between the input and the output. It is often the simplest trainable model that can be used. The Linear model is easy to interpret and the output from a time step depends only on that time step.

C. LSTM
The long short term memory (LSTM) [54] is a subclass of RNN model which is formed by adding a memory cell into VOLUME 4, 2016 the hidden layer to control the memory information of the time series. It consists of three different control gates namely, forget, input, and output. The state of the memory cell of the LSTM is controlled by two of these gates. The forget gate indicates how much memory of the last moment can be saved while the input gate determines how much input of the current moment can be saved and also controls the fusion of information and stimulus. The output gate is mainly used to control the amount of information that is sent for cell status. The transmitted information passes through the controllable gates to different cells in the hidden layer. This enables the control of the memory and forgetting extent of the prior and current information. In contrast to the RNN, the LSTM has the long-term memory function and does not have the problem of gradient disappearance. Figure 1 shows the structure of the LSTM network. σ is the sigmoid function shown in Equation (1)- (3) and has a value between zero and one. where 0 indicates that nothing passes while 1 means everything passes. The hyperbolic tangent function is used to overcome the problem of gradient disappearance.
The subscripts i, f and o represent the input, forgetting, and output respectively, and the subscript t represents the time step-index. The equations are given as follows: where W f , W i , W o and W c are matrices representing the weights of the forgetting gate, input gate, output gate and the memory cell; respectively. U f , U i , U o , U c are the matrices representing the weights of the recurrent connections of the forgetting gate, input gate, output gate and the memory cell; respectively. x t denotes the input vector to the LSTM network at a time step t, f t denotes the forget gate's activation vector, i t represents the input gate's activation vector, and o t is the output gate's activation vector, represents the element wise multiplication.c t and c t represent the cell input activation vector and cell state activation vector. Here, the b f , b i , b o and b c represent the forgetting gate bias vector, input gate bias vector, output gate bias vector, and memory cell bias vector; respectively. The sigmoid function σ(x) and the hyperbolic tangent function σ h (x) are defined as follows:

D. CNN MODEL
The CNN models were originally developed for image classification. These models accepts two-dimensional input image with color channels to learn its features. Such models are deep learning methods and have achieved tremendous success in the past. A one-dimensional version of CNN is termed as 1D CNN. The 1D CNN is mainly applied to onedimensional sequence of data. It extracts important features from the input sequence data and maps the internal feature of the sequence. The 1D CNN has been successfully applied for time series and fixed-length signal data analysis such as audio recordings and natural language processing. Figure 2 shows the CNN model architecture. The CNN consist of a 1D convolutional layer, a pooling layer, a flatten layer, and an output layer. The input signal can be either multivariate or univariate time series. The width of the time series depends on the number of features K and the length N of the series. The convolutional filters have the same width as the width of the time series but their lengths may be different. The filters are designed to move in one direction while performing a convolutive operation from the starting point of the time series to its endpoint. The convolutional layer consists of new filtered times vectors whose numbers depend on the number of convolution kernels. This layer also captures the features of the initial time series. The next stage involves the pooling of each time series vector of the convolutional layer to form new vectors. The layer responsible for pooling is termed as the pooling layer. The vectors from the pooling layer are passed to the flattened layer or fully connected layer. In the present case, the output of the flattened layer is passed to the LSTM neural network as will be discussed later.

III. HYBRID 1D CNN-BLSTM MODEL
In this section, the hybrid architecture of CNN and BLSTM is presented for the time series prediction as shown in Fig.  3. The CNN extract important high-level features from the input time series. These features are sent as input to the BLSTM to support prediction after pooling and flattening. The convolutional layer is initialized with 32 different kernels of the same size (3 times 3) and the output of this layer is passed to the ReLU activation function. To reduce the sensitivity of feature map to location, the max pooling is employed to select the maximum value and hence reducing the size of feature maps. An BLSTM of 32 output units is used. The network output is obtained from the dense output layer. It is noteworthy to know that the output of the network could be increased to multiple feature prediction.

IV. PERFORMANCE EVALUATION INDEXES
In this work, several performance indices are used to comprehensively evaluate the forecasting capabilities of all the models. The performance indices used included MAE, MSE, RMSE, and MAPE. The mathematical expression of the aforementioned indices are given as follows: where y(i) is the measured wind speed data at time i,ŷ(i) is the predicted value of the wind speed, N is the number of data points. The smaller the values of these indices means better the performance of the model.

B. EXPERIMENTS
The proposed CNN-BLSTM network has CNN of 32 units connected to a BLSTM of 32 units, 100 epochs are used and a batch size of 32. The simulation of each model is performed 100 times to reduce the influence of randomness and avoid a premature solution. A computationally efficient optimizer known as Adaptive Moment Estimation (Adam) is employed to minimize the loss function and obtain the weights and biases of the network. The Adam is an adaptive optimizer that has a better performance than other optimizers such as stochastic gradient descent (SGD), RMSProp, Adadelta, Adagrad in most practical applications. To test the wind speed forecasting performance at different heights using the CNN, Dense, LSTM, and the CNN-BLSTM models, the models used three wind speed time series a multi-step of three hours to predict the next hour. The performance of the models is compared with that of the baseline and linear models. The error evaluation performance of different forecasting models is illustrated in Table 1

Flatten Layer LSTM Layer
Output  • There is significant consistency among the performance indices trend at different heights. In general, the performance of all the models tends to improve with increasing height.
• The CNN-BLSTM model performs best i.e, has the least error for all the performance indices followed by the LSTM, the CNN, and then the dense model. Although the performance of the Dense model is competitive compared to the CNN model.

VI. CONCLUSION
To manage the operation and schedules of a smart grid, it is necessary to have an accurate knowledge of the availability and variation of wind speed. In the present work, we proposed a hybrid deep learning based model for short term wind speed forecasting. First, the CNN is employed to effectively extract high-level features of the input wind speed time series. Next, the BLSTM is incorporated to capture the deep temporal characteristics. The proposed model is used to forecast wind speed measured at different height (98m, and 18m) and compared comprehensively to multiple benchmark. The proposed CNN-BLSTM and other machine learning benchmark results show that the performance of the models decreases with a decrease in height. The results also show that the CNN-BLSTM model outperformed the other models for all the time series at different heights. In the future, the model could be enhanced further through the use of more features such as ambient temperature and atmospheric pressure. The use of data clustering is another key approach that needs to be investigated for improvement