Traffic Flow Forecast Through Time Series Analysis Based on Deep Learning

Traffic congestion is a thorny issue to many large and medium-sized cities, posing a serious threat to sustainable urban development. Recently, intelligent traffic system (ITS) has emerged as an effective tool to mitigate urban congestion. The key to the ITS lies in the accurate forecast of traffic flow. However, the existing forecast methods of traffic flow cannot adapt to the stochasticity and sheer length of traffic flow time series. To solve the problem, this paper relies on deep learning (DL) to forecast traffic flow through time series analysis. The authors developed a traffic flow forecast model based on the long short-term memory (LSTM) network. The proposed model was compared with two classic forecast models, namely, the autoregressive integrated moving average (ARIMA) model and the backpropagation neural network (BPNN) model, through long-term traffic flow forecast experiments, using an actual traffic flow time series from OpenITS. The experimental results show that the proposed LSTM network outperformed the classic models in prediction accuracy. Our research discloses the dynamic evolution law of traffic flow, and facilitates the decision-making of traffic management.


I. INTRODUCTION
Owing to economic growth and urbanization, many large and mid-sized cities are increasingly troubled by traffic congestion, which brings a series of social problems (e.g. long travel time, frequent traffic accidents, and severe environmental pollution) [1]. Many measures have been developed to mitigate urban congestion, namely, improving transport infrastructure, charging congestion fee, providing route guidance, promoting public transit, and implementing traffic control [2], [3]. However, it is difficult to ease traffic congestion by adding road facilities, if the transport infrastructure in a city is already complete. In this case, the traffic congestion could be effectively reduced by improving the traffic management in a digitized and intelligent manner, i.e. building an intelligent traffic system (ITS) [4]. Being a key aspect of the ITS, The associate editor coordinating the review of this manuscript and approving it for publication was Dalin Zhang. the short-term traffic flow forecast is the basis and premise of traffic management measures (e.g. traffic planning, route guidance, and traffic control). The effectiveness of the ITS relies on the accurate and reliable forecast of short-term traffic flow.
The short-term traffic flow forecast aims to predict the traffic state of a road section or intersection in the near future based on the historical traffic data and travel experience. It has long been a research hotspot in the field of the ITS. The early methods for traffic flow forecast are model-driven, which work only if the data and model parameters satisfy specific assumptions. As a result, the model-driven methods cannot describe the complex nonlinearity of the traffic system, and have not been widely applied [5]. In this era of big data, many scholars have attempted to predict traffic flow based on the massive traffic data, eliminating the need for multiple assumptions, creating and implementing lots of data-driven forecast methods [6], [7]. The typical data-driven forecast methods are machine learning (ML) tools like neural networks (NNs). Currently, novel traffic forecast methods are emerging constantly. Among them, deep learning (DL) strategies stand out for its effective use of massive raw data.
Despite the extensive studies and fruitful results on traffic flow, the complex evolution of traffic flow on surface roads is not yet clear. There is no definite answer to the following questions: Whether the traffic flow time series is continuous? Does the traffic flow time series share the same statistical features with other time series? Whether the traffic flow is predictable? How far in advance could the traffic flow be predicted? The time series analysis provides a possible solution to these questions. Over the years, much attention has been paid to the analysis of classic and nonlinear time series, and the analysis results have been successfully applied to traffic management like traffic control and route guidance.
Considering the superiority of the DL in processing the big data on traffic, this paper identifies the periodicity and stationarity of traffic flow time series measured on roads in Changsha, Central China's Hunan Province, and then constructs a traffic flow forecast model based on the long short-term memory (LSTM) network. The proposed model was compared with two classic forecast models at different prediction periods, in rush hours, and in non-rush hours. Our research discloses the dynamic evolution law of traffic flow, facilitating the decision-making of traffic management.
The remainder of this paper is organized as follows: Section 2 reviews the literature on traffic flow forecast; Section 3 introduces the proposed model and two contrastive models; Section 4 evaluates the performance of our model with real-world data; Section 5 puts forward conclusions and looks forward to the future research.

II. LITERATURE REVIEW
In recent years, the limited road resources in many cities can no longer accommodate the surging number of vehicles. The imbalance between travel demand and transport capability has worsened urban congestion. The ensuring problems severely bottleneck the sustainable development of urban transport, including the rising traffic accidents, growing environmental pollution and reduced travel efficiency.
As mentioned before, traffic flow forecast is the precondition for traffic management measures like traffic planning, route guidance, and traffic control. The accurate forecast of future traffic flow makes it possible to evaluate the economy and effectiveness of the ITS. To ensure the forecast accuracy, the daily mean traffic flow must be sufficiently smooth. The smoother the daily mean traffic flow, the lower the forecast error.
The application of time series analysis in traffic flow forecast can be traced back to the 1970s [8]. Time series analysis enjoys immense popularity in the real world, because its model is simple and only relies on traffic flow time series. There are currently three kinds of traffic flow methods based on time series analysis: model-driven methods, data-driven methods, and combinatory models.
The emerging technique of artificial intelligence (AI) has made it possible to manage traffic based on big data. Thus, the data-driven methods have become the mainstream strategy for traffic flow prediction. The most famous data-driven method is the NNs, which consist of fully-connected layers and radial basis functions (RBFs) [24]- [27]. Nonetheless, the traditional NNs are too shallow to effectively handle a massive amount of long time series.
Recently, more and more scholars have implemented the DL in traffic flow forecast. The DL can effectively utilize the information in the videos, images, time series, and spatial series of traffic data, which greatly enhances the forecast accuracy [28]- [36]. Lv et al. [37] designed a stacked auto encoder (SAE) model for traffic flow forecast, and proved the superiority of the SAE model over multilayer perceptron (MLP). Xiang et al. [28], [38] developed stacked denoising auto encoder (SDAE) to predict the traffic flow with missing data. Xie et al. [39] and other scholars analyzed the transport problem by a deep belief network (DBN) [40]- [43]. Sun et al. [44] and other scholars applied convolutional neural network (CNN) to process time series data for traffic forecast [45]- [47]. Hu et al. [48] and other scholars demonstrated the excellence of recurrent neural network (RNN) in traffic flow prediction based on time series analysis [49]- [55].
Adding a feedback loop to the hidden layer, the RNN can handle the correlations between data at different moments well. If the time series is very long, however, vanishing gradient and exploding gradient may occur in the RNN training.
To solve the problems, Hochreiter and Schmidhuber [56] proposed the LSTM network, in which the traditional hidden layer neurons are replaced with a memory block. The block prevents vanishing and exploding gradients in prolonged trainings. Hence, the LSTM enables the RNN to process long time series. Many LSTM-based DL models have been successfully applied to forecast traffic flow based on time series [57]- [63]. However, it remains a challenge to forecast traffic flow accurately and reliably in the complex, nonlinear, time-varying traffic system.
Each forecast model has its merits and defects, and specific applicable scope. To combine the merits and resolve the defects, single models have been integrated into various combinatory forecast models. For example, Luo et al. [64] proposed a short-term traffic flow forecast model based on the k-nearest neighbors (k-NN) algorithm and the LSTM. Guo et al. [65] combined support vector regression (SVR) and the LSTM into a hybrid forecast model for traffic flow. Based on the RNN and the LSTM, Wang et al. [50] presented a combinatory forecast model for traffic flow on ground-level roads, and verified its forecast performance of the proposed model. Despite their high forecast accuracy, the combinatory models are not widely applied in traffic engineering, because of their complexity and poor real time performance.
To sum up, the model-driven methods predict the future traffic flow by setting up accurate mathematical models based on the statistical features of the raw data. These models are easy to interpret, but many parameters and assumptions need to be determined in advance. Thus, few model-driven methods can adapt to the stochastic traffic flows, or make highly accurate predictions. Combinatory models are more accurate than model-driven methods. However, these models are not widely applied in traffic engineering, due to their complexity and poor real time performance. By contrast, the ML-based forecast methods are not constrained by preconditions and strong in nonlinear approximation and self-learning. Among them, artificial neural networks (ANNs) are well received, owing to their mature theories and good predictive performance. The LSTM, a superior DL network for time series analysis, has attracted a growing attention from researchers engaging in traffic flow forecast.

III. PREDICTION MODELS
The time series of traffic flow is a set of observations arranged in chronological order. If the observations only contain one variable, the series is a univariate time series. To analyze the time series, a basic assumption is required: the future traffic flow is affected by the past values. Unlike other models, time series models take the target variable as predictor variable.
In this paper, three representative time series analysis methods, namely, ARIMA model, backpropagation neural network (BPNN) and LSTM, are compared in the efficiency of traffic flow forecast. The ARIMA model has been broadly adopted to forecast traffic flow, for its strong potential for online implementation. The BPNN is the most popular method to process traffic data without an accurate mathematical model. However, the shallow structure of BPNN has a low efficiency in handling massive data. Many DL models have been developed to deal with massive data in an effective manner. Among them, the LSTM is the most promising one for time series analysis.

A. ARIMA MODEL
The ARIMA model provides a regression method for time series: First, the model judges whether the target time series is stationary; if the time series is non-stationary, it will be subjected to differential treatment and modified into a stationary one for modelling. For the traffic flow on roads, the continuous observations are correlated in the time series. The traffic flow data might be non-stationary, due to the stochasticity and complexity of the traffic system. Hence, the ARIMA model has often been used to predict the traffic flow based on the time series [66].
If the traffic flow time series X t is stationary, the time series can be described as a linear combination of the previous traffic flows by autoregressive moving average (ARMA) model: where, p and q are the orders of the model; ϕ 1 , ϕ 2 , · · · , ϕ p are autoregressive coefficients; θ 1 , θ 2 , · · · , θ q are moving average coefficients; u t is the residual at time t. By introducing the backshift operator B and B j X t = X t−1 , formula (1) can be simplified as: − · · · − θ q B q . Since most traffic flow time series are non-stationary, the differential treatment is implemented by replacing X t with Thus, the ARIMA model can be formulated as: The key to ARIMA modelling is to determine the values of p and q. The values of parameters like φ i and θ i can be derived through least squares estimation, moment estimation, maximum likelihood estimation, etc. Once the parameter values are identified, the ARIMA model can be employed to predict the future traffic flow.

B. BPNN
The BPNN is a multi-layer feedforward NN, in which the signal propagates forward and the error propagate backwards through each layer. In the BPNN, the neurons on the current layer only affect the state of those on the next layer. The signal passes from the input layer through the hidden layer to the output layer. Meanwhile, the error passes through the network in the opposite direction, and adjusts the weight and bias of each layer, thereby optimizing the model continuously. In this way, the prediction of the BPNN gradually approaches the desired value [24], [26].
To describe the iterative learning of the BPNN, the following parameters were defined: the input x; the input of the hidden layer h i ; the output of the hidden layer h o ; the input of the output layer y i ; the output of the output layer y o ; the desired output d o ; the connection weight between the input layer and the hidden layer w ih ; the connection weight between the hidden layer and the output layer w ho ; the bias of each hidden layer neuron b h ; the bias of each output layer neuron b o . Then, the learning process of the BPNN can be explained as follows: Step 1. Network initialization According to the prediction target and the known conditions, determine the number of neurons on each layer, define the biases and initial weights of the hidden layer and output layer, and specify the calculation precision ε and maximum number of iterations M. Step 2. Hidden layer calculation Derive h o from x, w ih and b h .
Step 3. Output layer calculation Derive y o from y i , w ho and b o .
Step 4. Error calculation Derive error of the BPNN e from y o and d o .
Step 5. Weight update Update the weight and bias of each layer based on e.
Step 6. Termination Terminate the iterative learning if the error satisfies the calculation precision or the maximum number of iterations is reached. The trained BPNN is ready to predict new samples.

C. LSTM NETWORK
As mentioned before, the RNN (Figure 1) relies on the feedback loop in the hidden layer to clarify the correlation between data at different moments, where x t , h t and o t are the values of input, hidden layer and output. But the standard RNN may suffer from vanishing and exploding gradients in the training process, if the data form a long time series. Hence, the RNN was extended by Hochreiter and Schmidhuber [56] into the LSTM network. Compared with the RNN, the LSTM network has a memory block instead of hidden layer neurons, which effectively prevents vanishing and exploding gradients in prolonged trainings.
In the LSTM network, several gates are introduced to control the memory of the RNN. During the training, the weight and bias of each gate are learned from the historical time series, and the features of historical states are identified and memorized. On this basis, the trained network can estimate the future state from new input samples. As a result, the LSTM network [28] can fully consider the long-term correlations between traffic flows, and make effective forecast of future traffic flow.
As shown in Figure 2, the memory block, the core of the LSTM network, consists of a memory cell C t , an input gate i t , a forget gate f t , and an output gate o t . The operations of the memory block are explained as follows:

1) FORGETTING
The redundant information in long-term cell states is discarded by the forget gate. Based on the input x t , the forget gate f t output a matrix x t of elements ∈(0, 1). Each element is multiplied with the corresponding element in the cell state matrix C i−1 :

2) MEMORIZATION
The new information is stored as long-term cell states in three steps: first, each tanh function creates a new candidate vector; next, the sigmoid function of the input gate i t updates some elements of each candidate vector; after that, the new information is added to long-term cell states: 3

) INFORMATION OUTPUT
The sigmoid function determines the output information h t : the long-term cell states are processed by the tanh function, and then multiplied with the information filtered by the output gate o t : where, C t is the updated cell state at time t; C t and C t−1 are the outputs at time t and time t − 1, respectively; W c and b c are the weight and bias of the cell, respectively; i t , f t , o t and h t are the outputs of the input gate, the forget gate, the output gate and the hidden layer at time t, respectively; x t is the input at time t; h t−1 is the output of the hidden layer at time t − 1; W i , W f and W o are the weights of the input gate, forget gate and output gate, respectively; b i , b f and b o are the biases of the input gate, forget gate and output gate, respectively. The activation functions sigmoid and tanh can be respectively expressed as:   The original data were transformed into 960 samples ( Figure 3) with 15min intervals, i.e. 96 sampling points per day. Then, the data of the first eight days were taken as the training set to build up forecast models, and the data of the other two days were taken as the test set to verify the effectiveness of each model. Hence, the training set and the test set respectively contain 768 samples and 192 samples.

IV. EXPERIMENTS AND RESULTS ANALYSIS
As most traffic flow time series, the preprocessed data change periodically between days, and have similarities between some periods on the same day [67]. Morning and evening rush hours are clearly seen from the daily variation in traffic flow. The periodicity of traffic flow is a feature of the traffic system that should be considered to achieve accurate prediction of future traffic flow.
To judge if our traffic flow time series is periodic, autocorrelation analysis was carried out. The autocorrelation coefficient r k can be computed by: (12) where, X t−k is the traffic flow time series that lags X t by k intervals; X is the mean traffic flow; n is the length of time series.
According to the autocorrelation curve (Figure 4), our traffic flow time series has significant periodicity. Despite the fluctuations, the data in the series tend to be stable.

B. PARAMETER SETTINGS
The LSTM network for our experiments contains five layers, including an input layer, three hidden layers and an output layer. Every layer contains 96 neurons. The preprocessed data was normalized as a time series with zero mean and unit variance. During training, there is one interval difference between each input and each output: if the input is x t , then the  output will be x t+1 . The maximum number of iterations was set to 200, the gradient threshold was set to 1, and the learning rate was initialized as 0.005. After 100 iterations, the learning rate was reduced to the product between the initial value and a drop factor of 0.2. The dropout on each layer was set to 0.2.
For the ARIMA model, p, d and q were all set to 1. Meanwhile, the BPNN was configured as a three-layer network with one input layer neuron, ten hidden layer neurons and one output layer neuron.

C. RESULTS ANALYSIS
Based on the above parameters, our experiments were conducted on the MATLAB. Each forecast model, namely, the ARIMA model, the BPNN and the LSTM network, was trained by the training set and verified by the test set.
Two plans were designed to compare the forecast effects of the three models: (1) The daily predictions of the three models were compared to reveal how the prediction performance changes with the elapse of time; (2) The 2h-interval predictions of the three models were compared to disclose the difference in prediction performance between rush hours and non-rush hours.
Firstly, the ARMIA model, BPNN and LSTM network were tested by the 96 samples on September 25th, 2013. The predicted values, actual values and residual errors of the three models are displayed in Figures 5-7, respectively. It can be seen that the LSTM network achieved the best prediction performance (i.e. the smallest deviation between the predicted value and the actual value), followed in turn by the BPNN and the ARIMA model.
Secondly, the ARMIA model, BPNN and LSTM network were tested by the 96 samples on September 26th, 2019. The predicted values, actual values and residual errors of    the three models are displayed in Figures 8-10, respectively. It can be seen that the LSTM network achieved the best prediction performance (i.e. the smallest deviation between the predicted value and the actual value), followed in turn by the BPNN and the ARIMA model.
The prediction performance of each model was measured by three metrics: mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square error (RMSE): where, y t is the actual traffic flow;ŷ t is the predicted traffic flow; n is the number of samples. The three metrics are negatively correlated with the goodness-of-fit and prediction Table 1 lists the metrics of each model on daily predictions. It can be seen that the LSTM network achieved the best RMSE, MAE and MAPE, followed in turn by the BPNN model and the ARIMA model. Hence, the LSTM network boasts the best prediction performance. For different time steps, the LSTM predictions on September 26 th , 2013 were less accurate than those on September 25 th , 2013, indicating that a long time step suppress the prediction accuracy. On the samples of both days, the LSTM network outperformed the two contrastive models, showcasing the superiority of the DL. Figure 11 presents the RMSE curve of each model with 2h intervals. The 4 th and 16 th intervals (7-9am) are the morning rush hours of September 25 and September 26, 2013, respectively; the 10 th and 22 nd intervals (7-9pm) are the evening rush hours of September 25 and September 26, 2013, respectively. Obviously, the ARIMA model had a far greater RMSE in the morning and evening peak hours than that in the non-peak hours. Thus, this model cannot accurately forecast the traffic flow in peak hours. The BPNN model also showed a large RMSE in in the morning and evening peak hours, but not as high as that of the ARIMA model. The LSTM network controlled the RMSE at a low level in the morning and evening peak hours, an evidence of its excellent forecast effect on long time series. VOLUME 8, 2020

V. CONCLUSION
Traffic flow forecast is critical to measures of traffic management, such as route guidance and traffic control. The significance of traffic flow forecast is growing with the proliferation of the ITS and autopilot. Currently, lots of traffic flow data are available thanks to the development of information technology (IT) and big data. Thus, it is very meaningful to mine out the evolution law of traffic flow through time series analysis, and make accurate forecast of future traffic flow. Therefore, this paper compares the traffic flow forecast effects of the LSTM network, BPNN model and ARIMA model on time series captured at a single point. The main conclusions are as follows: (1) The contrastive experiments show that the LSTM network outperformed the ARIMA model and the BPNN model in prediction accuracy: the mean RMSEs of the ARIMA, BPNN and LSTM were 61.1699, 26.8773 and 14.4438, respectively.
(2) For different time steps, the RMSE of the LSTM network was 12.9668 on September 25, 2013 and 15.7832 on September 26, 2013. Hence, the prediction accuracy of the LSTM network was higher on the first day, which reflects the high correlation between adjacent data in time series.
(3) For 2h intervals, the ARIMA model had relatively large prediction errors in morning and evening peak hours. This is because the traffic flow tends to fluctuate in peak hours under various disturbances, while the ARIMA is not good at predicting highly volatile data.
(4) As shown in Figure 12, the predicted values of the three models basically agreed with the observed value, due to the relative stability of traffic flow time series under normal conditions. The LSTM network made the most realistic prediction. Besides, the superiority of the DL in traffic flow prediction was confirmed by the results in Table 1: the MAPEs of the ARIMA, BPNN and LSTM were 20.97%, 9.06% and 4.82%, respectively.
The proposed LSTM network can accurately predict the traffic flow based on the relatively stable time series under normal conditions. However, the traffic system on roads is stochastic and complex, and often affected by abnormal factors like bad weather, traffic accident and large events. Therefore, the future research will fully utilize the abilities of the ML (e.g. big data processing and self-learning) to predict the traffic flow under various influencing factors.