Time Series Forecasting Based on Empirical Mode Decomposition and the Varying-Coefficient DBN-AR Model

In an attempt to improve the prediction accuracy for a time series, a new one-step-ahead hybrid model is proposed in this paper that combines empirical mode decomposition (EMD) with the DBN-AR model and back propagation (BP). The proposed approach first uses EMD, which can be used to decompose the complicated original time series data into several intrinsic mode functions (IMFs) and a residue. The IMF components and residue are then modelled and forecasted using the DBN-AR model. Finally, the predicted results for all IMFs and a residue are combined by a BP neural network to obtain an aggregated output for the time series data. To evaluate the performance of the proposed hybrid model, Beijing PM2.5 level time series data and the weekly rates of British Pound/US dollar (GBP/USD) exchange rate data are used as an illustrative example. Experimental results demonstrate the attractiveness of the proposed hybrid model based on both the prediction accuracy and efficiency compared with other methods.


I. INTRODUCTION
In recent years, time series prediction has become a very hot research topic. The analysis of a time series is meaningful for research. The prediction of traffic conditions can help one to arrange travel. Mankind can avoid disasters by predicting disasters in advance. However, different time series data contain different characteristics [1]. Some data are highly volatile, for example, wind speed data series, and some data are less volatile, such as annual rainfall [1]. Some data are linear in nature, such as human beings, but most data series are nonlinear in nature [1]. For research into time series, several challenges still need to be addressed, including prediction accuracy for time series. Historical observations of the same variable are analysed to establish a model to describe a potential relationship by the time series The associate editor coordinating the review of this manuscript and approving it for publication was Sajid Ali . prediction method. However, it is difficult to use the same model to predict different time series [2]. Many prediction models or methods have been proposed in the literature for time series forecasting, which can be classified into three categories: statistical models, artificial intelligence (AI) models and hybrid modes [3]. In the first category, the widely used models mainly include autoregressive (AR), autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA) models. In [4], an AR model is used to predict the daily 10.7 cm solar radio flux, and the experimental results show that the AR model performs well. Zhang et al. [5] predicted particulate matter time series in Taiyuan, China by using ARIMA and ARMA models. The results show that the traditional ARMA/ARIMA can reduce the forecasting error. In fact, real-world time series data are often nonlinear, therefore, it is inappropriate to use such linear prediction models when the original time series data is nonlinear. VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ In the second category, many AI models have been proposed to predict time series, such as artificial neural networks (ANNs), support vector machines (SVMs) and deep belief networks (DBNs). For example. Khaniani et al. [6] used a neural network model to predict rainfall, and the results showed that the ANN is highly efficient. In [7], a SVM is used to predict financial series, and the experiments show that the SVM is a useful prediction model. Hu et al. [8] predicted the remaining useful life based on a DBN, and the experiment showed that the DBN model is effective and superior. Shan et al. [9] proposed a new neural network model for mathematical expression recognition. Xun et al. [10] proposed a dynamic ANN model called the meta-ANN for forecasting short-term grid loads, and a numerical study showed that the proposed model can be used to obtain more accurate and robust predictions for grid loads.
However, the original data series predicted by traditional single prediction models cannot be used to describe the complicated relations existing in the data series. Therefore, many hybrid models have been proposed by researchers. For example, in [11], a hybrid local linear neuro-fuzzy model is proposed, and used for nonlinear time series forecasting. A comparison of the results shows the superiority and promising performance of the proposed hybrid approach. Diao et al. [12] proposed a hybrid model for short-term traffic volume prediction, and the simulation results demonstrate that their hybrid model can be used to obtain better prediction accuracy. In [13], neural network and multiobjective optimization combined with a novel hybrid model was used to predict effective load data series, and the experimental results showed that the hybrid model can outperform baseline models.
There is also a special hybrid model, and different data decomposition techniques can be used to predict time series data before forecasting [14], [15]. For example, Wang et al. [16] proposed a hybrid model based on a twolayer decomposition technique and a BP neural network for multistep ahead electricity price forecasting, and the experiment demonstrated that the proposed model has superior electricity price performance. In [17], Qiu et al. proposed a hybrid approach composed of a discrete wavelet transform (DWT), empirical mode decomposition (EMD) and random vector functional link network (RVFL) for short-term electric load forecasting, and the experiment verified the effectiveness of the proposed method.
The aim of this paper is to develop an EMD-based DBN-AR model and BP neural network to predict time series with high accuracy. EMD can be used to analyze the nonlinear and non-stationary properties of time series. EMD is used to decompose the original time series into several modes [18]. The obtained time series for each decomposed mode is predicted by the DBN-AR model, and then the prediction results for all intrinsic mode functions (IMFs) and a residue are combined by a BP neural network to obtain an aggregated output for the time series data. The prediction results of this paper have demonstrated that the proposed hybrid model outperforms the singel model for the PM2.5 time series and the weekly rates of British Pound/US dollar (GBP/USD) exchange rate data.
The rest of this paper is organized as follows: Section 2 briefly introduces the EMD. The nonlinear DBN-AR model is listed in Section 3. The basic framework of the proposed hybrid model is given in Section 4. Section 5 gives evaluation indicators. The experimental prediction results for the PM2.5 time series dataset and the weekly rates of British Pound/US dollar (GBP/USD) exchange rate data are demonstrated in Section 6. Finally, conclusions are stated in Section 7.

II. EMPIRICAL MODE DECOMPOSITION
In 1998, Huang et al. proposed the empirical mode decomposition (EMD) method, which decomposes the original data series into several intrinsic mode functions (IMFs) and a residue [18]. Instantaneous frequency data from nonstationary and nonlinear data series can be obtained by the EMD method. Two criteria must be satisfied by each IMF component, one is that the difference between the number of crossings times and the number of local extreme values is at most one; the other is that the mean value of the upper and lower envelopes of an IMF is zero at any time, which is identified by local maxima and minima. For a given original data series υ (t), the decomposition steps for EMD are given as follows: (1) For a given original data series υ (t), a cubic-spline interpolation of the local minima and maxima is used to create the lower and upper envelopes for the data series.
(2) The value m 1 (t) is the mean of the upper and lower envelopes.
(3) The value of mean m 1 (t) is subtracted from the original data series υ (t), to obtain a removed low frequency data series. After subtracting the value of the mean from the value of the original data, a removed low-frequency time series is obtained h 1 (t) = υ (t) − m 1 (t).
(4) Suppose h 1 (t) is an IMF; then, h 1 (t) is the first component of the original data series υ (t), and steps (1) to (3) are repeated until the stopping criteria are satisfied. Therefore, the first IMF component c 1 (t) is obtained c 1 (t) = h 1 (t).
(5) The original data series υ (t) is subtracted from the first component c 1 (t), and the residue signal is computed by (6) The obtained residual signal r 1 (t) is used as a new original data series υ (t) to find the next IMF. Steps (1) to (5) are repeated until the last residual data series becomes a monotonic function.
Finally, the original data series signal υ (t) is decomposed as follows where n is the number of IMFs, c i (t) is the IMFs and n (t) is the symbolized residue.

III. DEEP BELIEF NETWORK-BASED VARYING-COEFFICIENT AUTOREGRESSIVE MODEL A. STRUCTURE OF THE DBN-AR MODEL
A DBN is stacked by several restricted Boltzmann machines (RBMs) as proposed by Hinton [19]. For each RBM, each subnetwork's hidden layer is the visible layer for the next RBM. Finally, a logistic regression is added to the top of the stack [20]. A schematic of a DBN is given in Fig. 1. The inputoutput relationship for the DBN model is shown in Eq. 2.
where w ( ) n is the parameter weight between the -th layer and the ( − 1)-th layer in the single DBN model, b is the bias in layer , n y is the output lag, and Q represents the number of nodes on layer · N r represents the total number of layers, h ( ) (t) represents the output of the -th layer, ϑ( (t − 1)) represents the output of the single DBN model, ϕ(x) = 1/ 1 + e −x is a sigmoid function, and (t − 1) is the input state vector of the DBN model.
In general, when a nonlinear system is considered, Eq. 3 is used to describe the nonlinear system: where f (•) represents a nonlinear map, y(t) ∈ R is the output, n y is the output lag, and ε(t) ∈ R denotes the noise.
In this paper, a state-dependent autoregressive (SD-AR) model is used to approximate the nonlinear mapping f (•) in Eq. (3), which gives the following expression: where γ 0 ( (t − 1)) and γ y,m ( (t − 1))(m = 1, . . . , n y−1 , n y ) are the state-dependent function type coefficients of the SD-AR model (4) . . , y t − n y represents the state vector at time t, which is the variable used to change the system's working point with time; in some cases, this can be the output data series and/or the input data series.
A set of DBNs are used to approximate the function-type coefficients of the SD-AR model (4), and then the DBN-AR model is obtained [2]. The input-output relationship for the DBN-AR model can be described as in Eq. (5), shown at the bottom of the page. The DBN-AR model (5) is used to predict each IMF components and residue in this paper. It also can be seen from Eq. (5) that the DBN-AR model (5) the biases of the j -th hidden layer in the j-th DBN module, and γ 0 ( (t − 1)) , γ y,m ( (t − 1)) , m = 1, 2, · · · , n y is the output of each DBN module in the DBN-AR model (5), which corresponds to the state-dependent function-type coefficients. κ 1 , κ 2 , · · · , κ n w T denotes the input vector, and n w is the dimension of (t − 1).

FIGURE 1.
Structure of a single DBN model.

B. ESTIMATION OF THE DBN-AR MODEL
First, the original data series y (t) is normalized to y max(y(t))−min(y(t)) , and the value of y • (t) is scaled between 0 and 1. Then the reference output data series for γ 0 ( (t − 1)) and γ y,m ( (t − 1)) m = 1, 2, · · · , n y are assigned for each DBN module in the DBN-AR model (5) based on the pseudo inverse matrix and the least squares solution. For the normalized data series {y • (i) , y • (i + 1) , · · · , y • n y + i }, i ∈ 1, 2, · · · , N − n y , according to the DBN-AR model (5) based on using the least squares solution the reference outputs of γ 0 ( (t − 1)), γ y,1 ( (t − 1)), γ y,2 ( (t − 1)), · · · , γ y,n y ( (t − 1)) in the DBN modules at sample instant n y + i − 1 are calculated. Eq. (6) is used to calculate the output of the DBN-AR model (5): where, as shown in the equation at the bottom of the page, n y +i−1 represents the reference output of the DBN modules, and M n y +i−1 is the coefficient in model (6) at time n y + i − 1. The target value of the DBN modules in the DBN-AR model (6) can be calculated from where M + n y +i−1 denotes the pseudo inverse matrix of M n y +i−1 . Next, the parameters for each DBN module are trained by the deep learning algorithm using the reference output from the DBN modules computed by (7) in this pretraining stage. After this pretraining, the prediction output of the DBN-AR model (5) is calculated using Eq. (8) in this pretraining stage.
Finally, an especially designed back propagation (BP) algorithm (see Appendix) is used to fine tune the parameters of the DBN-AR model (6). After pretraining, the modelling error of the DBN-AR model (6) can be calculated as follows: In this stage, the objective function for the parameter optimization of the DBN-AR model (6) is given as follows where y • (t) is the actual value and y(t) is the predicted value.
Using the normalized training data {y • (t), t = 1, 2, · · · , N }, all parameters of model (5) are fine-tuned by Eq. (A.20) (see Appendix). The predicted value from the DBN-AR model (5) in the fine-tuning stage is recalculated according to the following equation: The parameter fine tuning process will not stop until the value of the mean square error, MSE = 1 N −n y N t=n y +1 (y • (t) −ỹ r (t)) 2 is smaller than a given value.

IV. THE HYBRID PREDICTION MODEL
In this section, the proposed hybrid prediction model based on the EMD, DBN-AR model and BP neural network is used to forecast the time series data in this paper. The main structure of the hybrid prediction model is based on decomposition and ensemble, which is given in Fig. 2. The methods used in the proposed hybrid model are briefly introduced in the following. (1) Firstly, EMD is used to decompose the original time series into IMF components and one residual component.
(2) Secondly, use the DBN-AR model (5) to develop a prediction model for each extracted IMF component and the residual component, and then corresponding predictions are obtained for each component.
(3) Finally, the prediction results of all extracted IMFs and residual componets are obtained by DBN-AR model (5), and all the prediction results are combined to generate an output by a BP neural network, which can be used as the final prediction result for the original data series.

V. EVALUATION INDICATORS
To evaluate the efficiency of the hybrid model proposed in this paper, three evaluation indicators: the root mean square error (RMSE), the normalized mean squared error (NMSE) and the mean absolute percentage error (MAPE) are used, which are expressed as follows. (14) where c (t),c (t) andc (t) is the actual value, predicted value and mean of the actual value, respectively. N is the length of the actual data series.

VI. EMPIRICAL STUDY
The validity of the proposed hybrid model is verified in this section, and a comprehensive experimental evaluation is proposed for the prediction model. Modelling problems for the PM2.5 levels in Beijing and and the weekly rates of British Pound/US dollar (GBP/USD) exchange rate data series are studied in this section, and the modelling results are obtained using a PC with an Intel(R) Core (TM) i7-9700 CPU @ 3.00GHz and MATLAB 2010b. According to the PM2.5 meteorological records, the seriousness of PM2.5 in Beijing has attracted much attention [21], and many prediction methods have been proposed to predict the value of PM2.5 [22], [23], [24]. In this subsection, by learning the PM2.5 meteorological hour records from 1/1/2010 to 3/26/2010, we use the historical data series for PM2.5 concentrations (µg/m^3) to predict the future value of PM2.5 [25]. To facilitate PM2.5 data analysis, some abnormal values were removed. Therefore, a total of 2023 data series are obtained, as shown in Fig. 3. The PM2.5 data series is a high-volatility and uncertain time series. Therefore, EMD is first used to decompose the original PM2.5 series, and the decomposition results are given in Fig. 4. In this paper, the original PM2.5 series is decomposed into 10 components in total, which are named IMF1, IMF2, . . . , IMF9, residuals. Every subseries of the data series is composed of two sections: the first 1500 data points are used as training datasets, and the remaining 523 data points are used as testing datasets. To clearly demonstrate the prediction process for the proposed hybrid method, the single step ahead for forecasting is taken as an example in this section.
After the original PM2.5 data series are decomposed, the DBN-AR model is used to predict each IMF and residual. The input order for the DBN-AR model is chosen as n y = 5, i.e., every five successive PM2.5 data series are applied to predict the sixth series using rolling technology. The number of iterations for each DBN module is 1000 in the pretraining stage, and the number of iterations of the DBN-AR model is chosen to be 2000 in the fine-tuning stage. The structure of each DBN module is chosen as N (j) Q (j) 2 = 1; (j = 0, 1, · · · , 5). Based on these parameter settings, the DBN-AR model is used to predict each IMF component and the residue component. The constructed DBN-AR model is given in Eq. (15), as shown at the bottom of the next page. Figs. 5 and 6 show a comparison of the real values and predicted data for the training and testing dataset series, respectively. It can be observed from Figs. 5 and 6 that the predicted values for each IMF and the residue are almost consistent with the real values.
Next, the predicted results for all IMFs and a residue are combined by a BP neural network to obtain an output for the PM2.5 data series. The input value for the BP neural network is composed of all the IMFs and a residue, and the output value of the BP neural network is the PM2.5. Therefore, the numbers of input and output nodes are selected to be 10 and 1, respectively. After a series of experiments, the number of hidden nodes in this experiment was chosen to be 15. Finally, the predicted result for the proposed hybrid method is shown in Fig. 7 for the testing data series. It can be observed from Fig. 7 that the curves for the forecast value are very similar to the real value.
To verify the effectiveness of the proposed hybrid method for one-step ahead PM2.5 data series forecasting. The prediction results for the traditional AR model, DBN-AR model, EMD and DBN-AR method, and proposed hybrid method are given in Table 1 for comparison. In addition, two criteria, RMSE and NMSE, are employed to evaluate the performance of all prediction models. It can be observed from Table 1 that both the RMSE and NMSE values for the proposed hybrid method are small compared with all the other models, which confirms that the proposed method has better prediction accuracy than the other models.

B. FOREIGN CURRENCY EXCHANGE RATE DATA SERIES
The weekly rates of British Pound/US dollar (GBP/USD) exchange rate data are used to further verify the superiority of the proposed hybrid model. Data set contains 937 observations from the beginning of 1976 to the end of 1993 that is divided into 885 training data and 52 testing data sets. y (t − 2) , . . . , y (t − 5)) T (15) VOLUME 10, 2022 FIGURE 6. The predicted data and the real data for each IMF and a residue from the PM2.5 data series for the testing data. As done in [2] and [26], the input order of the proposed hybrid model is six. These data are obtained from the the database retrieval system of ''Pacific Exchange Rate Service'' (http://fx.sauder.ubc.ca/data.html) [26]. The modeling results of the proposed hybrid model and many other models from other literatures [2], [26] for the testing data are also given in Table 2. RMSE and MAPE values obtained from all testing data are given in Table 2. It can be seen from   Table 2 that the prediction accuracy of the proposed hybrid model is better than that of the other models. Therefore, this a novel prediction model can provide an effective reference for the time series forecasting.

VII. CONCLUSION
In this paper, we propose a hybrid model for time series forecasting composed of EMD, DBN-AR and BP. The original time series signal is first decomposed into several IMFs and a residue by EMD, followed by a DBN-AR model which is used to model each extracted IMF and residue. Finally, the prediction results for all IMFs and a residue are combined by a BP to obtain an aggregated output for the time series data. Case studies indicate that the proposed hybrid model can be used to obtain better modelling accuracy compared to single and hybrid models.

APPENDIX FINE TUNING PROCESS OF THE NONLINEAR DBN-AR MODEL [2]
Using formula (9), the objective function in the fine-tuning stage can be designed as follows where y • (t) is the actual value and y(t) is the prediction value of the DBN-AR model. All the parameters in the DBN-AR model are fine tuned by the specially designed gradient descent method. For the neuron in the last output layer, which is the N (j) r -th layer, Eq. (10) is used to update the gradient for the parameter.
For neuron n r −2)th layer, Eq. (A.10) can be used to update the gradient for the parameter, (A. 10), as shown at the previous page. Let Then, Eq. (A.10) becomes Similarly, the following equation can be obtained as in (A.13), shown at the bottom of the page. Therefore, the local gradient for each neuron of layer j in the j-th DBN module can be obtained by the derivation process detailed above and computed using the following equation: