Expressway Exit Traffic Flow Prediction for ETC and MTC Charging System Based on Entry Traffic Flows and LSTM Model

The Expressway (controlled-access highways) of China is the longest in the world and plays an important role in people’s daily life. Accurate short-term traffic prediction is essential for travel schedule and active traffic management. There are two coexisting charging systems for expressway in China, Electronic Toll Collection (ETC) and Manual Toll Collection (MTC), which have different passing capacity and variation pattern. In this work, we demonstrate that the exit traffic flow prediction at Shanghai Xinqiao toll station using entry traffic flows from multiple close-related stations with Long Short-Term Memory (LSTM) model. Based on the origin-destination (OD) traffic data of a month, we present a new method to predict the exit station’s traffic flow in the future 5 minutes. After deleting abnormal data, we select 12 of the 109 entry toll stations for the experiment. The traffic flow of these 12 entry stations account for 86% of the total exit traffic flow. This method uses the spatial-temporal matrix to deal with different three scenes that are ETC and MTC charging systems individually, the mix of ETC and MTC. We use the LSTM model with various lengths of flow sequence and amounts of hidden layer neurons for three different scenes. Lastly, we validate our model and carefully select the hyperparameters for better prediction accuracy by three evaluation metrics. The experimental results demonstrate that predicting the ETC is the best in the three scenes.


I. INTRODUCTION
With the development of China's economy, the number of motor vehicles is increasing rapidly, which leads to a series of traffic problems: traffic congestions, traffic accidents, environmental pollution, and so on [1]. To alleviate these traffic problems, researchers pay more and more attention to the Intelligent Transportation Systems (ITS) [2]- [5], which are the set of applications and technological systems created with the aim of improving safety and efficiency in road transport. Accurate prediction of traffic information of the ITS (i.e., travel time, traffic speed, traffic flow, etc.) provides reliable real-time road-traffic information for travelers and managers, effectively reducing environmental pollution, and alleviating traffic congestions and traffic accidents [6]- [8]. Traffic flow prediction is an essential indicator of the ITS, The associate editor coordinating the review of this manuscript and approving it for publication was Shaohua Wan . which can be divided into three categories by the length of the forecast period: short-term forecast, medium-term forecast, and long-term forecast, respectively [9]. The prediction of the traffic flow of less than one hour is often called short-term traffic flow forecasting [10].
In this paper, based on origin-destination (OD) data of the Xinqiao toll station in Shanghai, China, we use the Long Short-Term Memory (LSTM) model for short-term traffic flow prediction. Main contributions of the paper are presented as follows: 1) We utilize the real traffic data for the model. Significantly, we find that the traffic flow of ETC is different with MTC, where the MTC has more peak values than the ETC. Therefore, separating the ETC and MTC datasets will improve the prediction accuracy naturally. 2) We use multi entry stations' traffic data to predict the corresponding exit station's traffic flow in the future 5 minutes.  23. The rest of this paper is organized as follows. Section 2 summarizes the existing literature on short-term traffic flow prediction. Section 3 presents the LSTM architecture and builds the traffic flow prediction network. Section 4 is the data description and data preprocessing. Section 5 gives the experiment results. Finally, section 6 concludes the paper.

II. LITERATURE REVIEW
Over the past few decades, many scholars have been proposed many methods to predict the short-term traffic flow. The existing methods mainly can be classified into three categories: parametric approach, non-parametric approach and hybrid approach [11].
However, due to the stochastic and nonlinear feature for traffic flow [20], more researchers pay attention to non-parametric approaches in the traffic flow prediction field, such as k-nearest neighbor (k-NN) [21], Bayesian network [22], support vector regression (SVR) [23], support vector machine (SVM) [24]. Hybrid models combine the advantages of different methods. For example, Tan et al. [25] proposed an aggregation approach for shortterm traffic flow prediction which was based on the moving average (MA), exponential smoothing (ES), ARIMA, and neural network (NN) models. Hong et al. [26] combined SVR with the algorithm-simulated annealing algorithm (GA-SA). Tang et al. [27] integrated the Fuzzy C-Means (FCM) with the Genetic Algorithm (GA) for missing traffic volume data estimation. Chen et al. [28] examined the impact of periodic component on three statistical models (i.e., space time (ST) model, vector autoregressive (VAR) model, ARIMA model) and three machine learning approaches (i.e., SVM model, multi-layer perceptron (MLP) model, recurrent neural network (RNN) model), results showed that the proposed hybrid prediction approach is effective for both statistical and machine learning models in short-term speed prediction.
However, compared to the hybrid models, a single model is simpler in application and faster in calculation, and the prediction accuracy can satisfy the requirements in actual engineering applications. With the development of artificial intelligence, short-term traffic forecast based on deep learning approach has become a new trend [29]. Deep learning methods can present traffic features without prior knowledge due to the extraordinary ability to capture the uncertainty and complex nonlinearity [30], which achieve better performance for traffic parameters prediction [2]. Common deep learning algorithms include deep residual networks, cyclic neural networks, and convolutional neural networks, which self-learning capability highlights the features of short-term traffic flow prediction [31]. A variety of deep learning methods are used in traffic forecasting. Such as feed forward neural network (FFNN) [32], backpropagation (BP) neural network [33], stacked autoencoder (SAE) model [2], radial basis function (RBF) network [34]. Zhang et al. [35] proposed Deep Multi-Scale Learning Model for trajectory classification and understand the mobility of moving objects based on classification results. The recurrent neural network (RNN) is considered a suitable method to capture the temporal and spatial evolution of traffic flow among deep learning models on traffic flow prediction [36]. However, traditional RNNs suffer a major drawback of gradient vanishing or exploding [37]. To solve the problem, Hochreiter and Schmidhuber [38] proposed long short-term memory (LSTM) architecture. The latest research found that the LSTM has a high prediction effect in the short-term traffic flow prediction. Yu et al. [39] focus on forecasting the short-term traffic based on the data of private cars and minibuses operating on the Chang Tai highway with the deep learning method -LSTM. The prediction results are outstanding not only under the workday and the weekend but also under the unusual traffic status include the festival and the rainy day. Poonia et al. [40] applied the LSTM model on four months of data collected by MNIT College, and the data aggregated into a 5-min interval. Ma et al. [41] compared LSTM with three different typologies of the RNN, other nonparametric and parametric approaches, which are SVM, Time Series, and Kalman Filter, for urban travel time prediction.The numerical experiments demonstrate that the LSTM outperforms other algorithms in terms of accuracy and stability. Tian et al. [42] applied LSTM to short-term traffic flow prediction comparing with other models, such as random walk (RW), SVM, FFNN, and SAE. The result showed that the LSTM prediction model achieved the highest accuracy and generalized best among these models. Wang et al. [43] used the LSTM and the gated recurrent unit (GRU) models on trucks' GPS data. As a result, the average prediction accuracy throughout both peak and off-peak periods, LSTM is better than GRU with improved accuracy of 4.1%. Li et al. [44] compared different models which are constructed by different RNN layers. Experiment results indicate the performances of LSTM is better than GRU model.

III. METHODOLOGY
The LSTM architecture consists of one input layer, one recurrent hidden layer, and one output layer [49]. Additionally, the basic unit of the recurrent hidden layer is a memory block. Each block includes one or more self-connected memory cells and three multiplicative units: the input gate, output gate, and forget gate-that provide continuous analogs of write, read and reset operations for the cells [45].
Short-term prediction of expressway exiting flow can help to recognize the future traffic state of the traffic network [29]. In this paper, based on the previous and current entry stations' traffic data, we predict the next time period's traffic flow of the exit station. First, we get a spatial-temporal matrix for entry stations' traffic data. Second, based on this matrix, we use the LSTM network to traffic flow prediction under different scenes: the mixed ETC and MTC, ETC, and MTC. The detail of the methodology is described in this section.

A. TRAFFIC FLOW FORECAST RELATIONSHIP BETWEEN ENTRY STATIONS AND THE EXIT STATION
The exit station's traffic flow not only relies on the sequential patterns in the temporal dimension but also depends on corresponding entry stations in the spatial dimension. Temporal correlation occurs between the traffic flow of the next time interval and the historical traffic data. Spatial correlation refers to the exit station's traffic flow relies on its corresponding entry stations' traffic flow.
In this paper, we select the Q entry stations and set the length of flow sequence as T. We show the prediction relationship for the ETC and MTC datasets. The prediction correlation of ETC and MTC datasets is shown in Figure 1.

B. INPUT MATRIX OF THE MODEL
Through the analysis, the traffic flow of the previous time t at entry stations as input. We write the input matrix as: The matrix ∈ R T×Q , T is the temporal length, and Q is the spatial length.
is the i th entry station's traffic data at the time period t. The matrix is determined by the time period and works as input data in the LSTM network. It is necessary to normalize the feature data because it is hard to convergent when the value range of some features are varying widely [44]. In this paper, we utilize linear normalization to traffic data to a range from 0 to 1. The normalization function defined as below. (2)

C. SHORT-TERM TRAFFIC FLOW PREDICTION MODEL BASE ON LSTM
In this paper, we use LSTM as our experimental model. The model's structure is composed with one input layer, one LSTM layer, one fully connected layer, and one output layer.
The structure of the model is shown in Figure 2. The data flow at the first moment is shown in Figure 3.
The following equations show that how the LSTM network works.
Equation (3) represents the Forget Gate. h t−1 is the hidden state of this layer at time t-1. X t is normalized as the input at time t. W fh and W fx are weight matrices of the Forget Gate. b fh and b fx are bias vectors of the Forget Gate. The symbol · represents multiplication of two matrices. σ is the sigmoid activation function.
The activation of the Input Gate in the LSTM is described in Equation (4). W ih and W ix are weight matrices of the Input Gate. b ih and b ix are bias vector of the Input Gate.
The output of the Cell State -C t works out by both the Forget Gate and the Input Gate, where C t−1 is the output of the Cell State at time t-1, tanh is the activation function.
Finally, the output of the LSTM layer is h t , which is achieved by the results of the Cell State and the Output Gate. The symbol * represents the element-wise multiplication. W oh and W ox are weight matrices of the Output Gate. Therefore, f t , i t , o t , C t , and h t ∈ R 1×u , W fh , W ih , W ch , and W oh ∈ R u×u , W fx , W ix , W cx , and W ox ∈ R Q×u .
In order to convert h t into the predicted value, one fully connected layer is added after LSTM layer. The calculation formula is shown below: where W T ∈ R u×1 , b T ∈ R.  Lastly, our model uses an effective gradient descent Adam model for fitting, Mean Square Error (MSE) loss function to optimize our model. Adam can be defined by the following equations [46].
where n represents the iterative time step. m n represents the biased first-moment estimation, v n is biased second raw moment estimation,m n is bias-corrected first-moment estimation,v n is the bias-corrected second raw moment estimation. θ n is the parameter [49]. In this paper, the parameters that need to be optimized with Adam are: θ n = {W fh , W ih , W ch , W oh ,W fx , W ix , W cx } and W ox . Furthermore, the settings for the parameters are α = 0.001, β 1 = 0.9, β 2 = 0.999, = 10 −8 .

IV. DATA DESCRIPTION AND PREPROCESSING A. DATA DESCRIPTION
In Shanghai, the most traffic volume appears in Xinqiao toll station, where congests heavily, especially in rush hours of morning and afternoon. This traffic network includes 109 entry stations and 1 corresponding exit station.
The expressway data contains some information for the vehicle entering and exiting expressway toll station including location, time, vehicle type, mileage, and whether the vehicle passes through the ETC toll lane.
In this study, the traffic data are aggregated into 5-min interval, 24 hours a day, from August 1st 2019 to August 31st 2019. Specifically, we choose 12 main stations from the 109 entry stations to predict the corresponding exit station's traffic flow in the next 5-min. 86% of this exit station's traffic flow comes from these 12 main entry stations. We calculate the percentage of traffic flow from each entry station to the exit station and show it in Table 1. The location of selected 12 entry toll stations and the exit station are marked on the map of Shanghai in Figure 4.

B. DATA PREPROCESSING
After the analysis, it is found that the original dataset has some abnormalities. Specifically, due to the damage to some detectors or traffic accidents, traffic data contains the negative, zero records of driving time, and zero mileage. These abnormal data might have a negative influence on the forecasting models and should be filtered out before forecasting [47]. We simply delete the abnormal data in the datasets. Finally, the deleted entry stations' traffic data records account for 3.35%. The deleted exit station's traffic data records account for 3.63%.
For directly visualization, we draw traffic flow curves of 12 entry stations and the exit station for three scenes: the mixed ETC and MTC, ETC, MTC from Figure 5 to Figure 8, respectively. Moreover, we also do some statistical analysis in Figure 9, which presents the box plots of the data. We can see that the variance of ETC is relative greater but the outlier of MTC is significantly more, especially the entry station I 8 , I 9 and I 11 . In a word, the distributions of the ETC and MTC are different. Motivated by this, we are trying to predict the traffic flow by separating the ETC and MTC for better prediction accuracy.
In this study, the traffic flow of entry stations and the exit station are divided into two datasets -the front 75% data for training and the remaining 25% for testing.

V. EXPERIMENTS
To compare the performance among three different scenes, we use Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE) to evaluate our forecasting results. MAE can better reflect the actual situation of forecast error. MAPE represents the prediction error in a percentage, and it is the case of identifying the significant error when MAE is very small [18]. The equations are the following:    where y i represents the ground truth, andŷ i is the prediction value. The lower the value of the evaluation indicator, the better the model performance.
We compare the LSTM network performance under different hyper parameters to indicate that our network architecture is effective. Two hyper parameters are the length of flow sequence T and the amount of the hidden layer neuron u. The value range of T is set to range of {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. The value range of u is u = {10, 30, 50, 70, 90, 110}. VOLUME 9, 2021  All our experiments are implemented with the Keras framework. We build the LSTM network, where the activation function of the LSTM layer is tanh and the output layer is a fully connected layer with the sigmoid activation function. We set the training epochs is to 100. For each T and u, we calculate the average of all the evaluation metrics for five times' experiments. The average results of RMSE, MAE and MAPE are shown in Table 2, Table 3 and Table 4, respectively. The best results are represented in bold font. Here we mainly focus on RMSE because that Chai [48] indicated that the RMSE usually is better at revealing model performance. We also show the trend of RMSE results in Figure 10. All of these results show that: From Figure 10    2) From Figure 10 (a) and (b), the RMSE decreases as the length of the flow sequence change from 1 to 7 and increases after 8. Obviously, 7 is the best length of the flow sequence for our model. When the value of u is from 70 to 110, the network has overfitting. 3) Increasing the number of hidden layer neurons can typically help the network extract more features that can improve the training accuracy. But too many neurons may cause the overfitting problems, which is verified in Figure 10. In our experiment, we choose 50 neurons in hidden layer for better prediction accuracy. As a result, the length of flow sequence is set to 7 and the value of u is 50, which is best for the network.  To show the prediction results in more detail, we highlight the peak hours for prediction results on Monday and Saturday. From the Figure 12, MTC dataset has more peak hours in workdays. As a result, MTC may cause the traffic congestion. Therefore, it's necessary for the government to develop more ETC lanes that traffic management department can alleviate the traffic congestion.

VI. CONCLUSION
In this paper, we focus on the short-term traffic flow prediction based on the traffic data of Xinqiao toll station in Shanghai of China and deep learning method -LSTM. Our application of the LSTM model to the actual traffic data has largely proved that the purpose of creating the model is to adapt to the actual data.
First, we divided the actual traffic data into ETC and MTC sequences. Second, we try different time periods in the three scenes. Finally, we get the best time period is 7, and the experimental results demonstrate that the ETC is the best in the three scenes.
From our results, dividing the traffic data into ETC and MTC condition can improve the prediction accuracy. With the increasing portion of develops the ETC toll lanes, the traffic efficiency of the station is enhanced and we can improve the accuracy for traffic flow prediction. In a word, our study is of great significance to the industry.