Prediction of Traffic Congestion Based on LSTM Through Correction of Missing Temporal and Spatial Data

With the rapid increase in vehicle use during the fourth Industrial Revolution, road resources have reached their supply limit. Active studies have therefore been conducted on intelligent transportation systems (ITSs) to realize traffic management systems utilizing fewer resources. As part of an ITS, real-time traffic services are provided to improve user convenience. Such services are applied to prevent traffic congestion and disperse existing traffic. Therefore, these services focus on immediacy at the expense of accuracy. As these services typically rely on measured data, the accuracy of the models are contingent on the data collection. Therefore, this study proposes a long short-term memory (LSTM)-based traffic congestion prediction approach based on the correction of missing temporal and spatial values. Before making predictions, the proposed prediction method applies pre-processing that consists of outlier removal using the median absolute deviation of the traffic data and the correction of temporal and spatial values using temporal and spatial trends and pattern data. In previous studies, data with time-series features have not been appropriately learned. To address this problem, the proposed prediction method uses an LSTM model for time-series data learning. To evaluate the performance of the proposed method, the mean absolute percentage error (MAPE) was calculated for comparison with other models. The MAPE of the proposed method was found to be the best of the compared models, at approximately 5%.


I. INTRODUCTION
Based on the core technologies of the fourth industrial revolution, smart vehicles are being produced in diverse forms [1]. The role of the automobile has been extended from a simple means of transportation to a living space and finally, to a type of infotainment system that provides new forms of user convenience [2], [3]. With the increase in the demand for smart automobiles, it is extremely important to collect and process traffic information to enable smooth traffic management. Furthermore, it is necessary to take a qualitative rather than a quantitative approach [4]. To this end, research has been conducted on intelligent transportation systems (ITSs) The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Afzal . developed in concert with conventional traffic management systems and information technology [5]- [7]. To improve user convenience, studies on the problems that have arisen with the increased demand for road resources have focused on traffic welfare. Traffic welfare consists of factors including operation service costs, passage of time, accident costs, parking costs, punctuality, and accessibility, with the most important being traffic congestion. As a part of an ITS, traffic surfaces can be put in place to collect traffic information on all roads in real time in order to provide users with information including which regions are congested, traffic volumes, and the locations of traffic accidents. In this way, an ITS can improve the functionality of a road traffic network. An ITS can also provide a real-time traffic-information service. By suggesting an optimal path to each driver, road congestion decreases and traffic is dispersed. An ITS thus focuses on immediacy but achieves relatively low accuracy. To solve this problem, active research has been conducted on real-time traffic pattern predictions based on deep learning models and multiple prediction modes, with a particular focus on traffic predictions based on time-series data.
Weilin et al. [8] proposed a multi-resolution support vector regression (SVR) traffic flow prediction model based on wavelet decomposition and topological space reconstruction. For their experiment, the researchers utilized data collected from January to December 2011 by performance measurement systems, which collect data in 5-min intervals. The mean absolute percentage error (MAPE) rate for their model was 12.8%.
Filmon et al. [9] proposed a nonparametric, data-centric methodology to achieve short-term traffic predictions based on the identification of similar traffic patterns through the improved K-nearest neighbor (K-NN) algorithm. Recently, the weighted Euclidean distance has also been used as a similarity measurement for K-NN. For their experiment, the researchers used 12 datasets from highways in the UK and 24 datasets from highways in the US. A MAPE rate of 22% was achieved.
Zhang [10] proposed a short-term traffic prediction model based on a convolutional neural network (CNN) deep learning framework. In the proposed framework, the optimal input data time delay and amount of spatial data are determined based on the space-time feature selection algorithm. The selected space-time traffic feature is then transformed into a two-dimensional matrix after being extracted from the actual data. The function is learned by the CNN, and a prediction model is constructed. According to a performance analysis, the MAPE rate was approximately 8.3% on average.
The methods described above tend to achieve higher prediction accuracies than those focusing on immediacy. The prediction modes used in these studies are based on one of three models: SVR [11], [12], CNNs [13], [14], and KNN [15], [16]. Because these models fail to consider the features of time-series data, they may be inappropriate. For prediction, this study therefore utilizes the long short-term memory (LSTM) model, which provides accurate predictions and makes it possible to account for the time-series features of traffic data. The LSTM model solves the problem of the long-term dependence inherent in recurrent neural network (RNN) models [17]- [19]. With the LSTM model, the result of a hidden layer is passed to the same hidden layer as an input. Owing to the recursive construction of hidden layers, it is possible to consider sequential or temporal aspects. For this reason, this model is conducive for learning the time-series features of traffic data. Traffic data include outliers or missing values due to unexpected traffic variables. Outliers and missing values lower model performance and therefore should be corrected when designing an accurate prediction model. The correction can be achieved by removing outliers, correcting missing temporal and spatial values, and applying pattern data. Then a system can be established to provide the predicted traffic information to users. With more accurate data, it is possible to increase the accuracy of predictions and to provide a smooth flow of traffic information to users [20], [21]. This paper is organized into the following sections. Section 2 describes the relevant studies on ITSs and ITS-based traffic predictions. Section 3 details the data collection process, data pre-processing, and model design for traffic congestion prediction. In Section 4, an experiment conducted to evaluate the model performance and its results are described. In addition, a comparison of different methods used to verify the performance and a description of the system implementation are also provided. Finally, Section 5 presents the concluding remarks regarding this study.

II. RELATED WORK A. RESEARCH OF TRAFFIC CONGESTION PREDICTION
In traffic data, outliers and missing values negatively influence traffic control and traffic congestion prediction in intelligent traffic systems. To address this problem, many missing value correction methods have been proposed. Conventional methods of missing value correction focus on the correction of individual missing values. Although these methods provide a simple and fast estimate for the missing value, they often produce biased results. To resolve this, historical imputation methods (HIMs) that provide multiple estimation values for one missing value have been proposed [22], [23]. In these methods, a missing value is replaced by the mean value of multiple data points collected at the same position and date. Correction methods based on nearest neighbor imputation (NIM) use the mean value from the neighboring roads to estimate a missing value [24], [25]. However, such methods cannot be applied when there is no data from neighboring roads. The missing value correction method proposed in this study makes it possible to correct a missing value and thereby to design complete data, using past data patterns even when there is no information from neighboring roads. In addition, machine learning and deep learning are applied to model more complicated data for traffic prediction. The deep learning model exhibits better performance since it has more functions and more complicated architecture than the conventional model.
Sun et al. [26] proposed a traffic prediction method using GPS trajectory data based on an RNN. Their method used the missing values from existing road speed data to estimate the average speeds on stretches of road with GPS trajectory data. However, because an RNN fails to memorize past data features and deletes them with a lapse in time, it has problems dealing with long-term dependency. Accordingly, traffic prediction based on the LSTM model, which resolves problems associated with RNNs, is actively being researched. Mou et al. [27] proposed the temporary information improvement (T-LSTM) model to predict the traffic flow on a single stretch of road. In consideration of the similar features exhibited each day by traffic flows at a given time and place, VOLUME 8, 2020 the model extracted the unique correlation between the traffic flow and time information, thereby improving the prediction accuracy. Yu et al. [28] proposed STGCN to solve the problem of previous studies that had ignored spatial and temporal attributes in traffic prediction. They argued that the method was able to obtain a faster training speed with a smaller number of parameters since it formalized the problem in graph and established a model with a complete convolution structure, rather than applying regular convolution and repetition units. Many researchers have tried to increase the accuracy of traffic predictions and reduce the calculation time through their theories and experiments.

B. RESEARCH ON ITS-BASED TRAFFIC PREDICTION
University of Southern California Information Lab has established spatial and temporal data using sensors for road measurements and traffic information (e.g., CCTV and GPS) and uses real-time data and past traffic data to predict on-road traffic [29]. The extent to which a prediction model established using past data depends on the state of real-time traffic is important, and an important task is to evaluate the extent to which models built using past predictions depend on current status data. It is necessary to overcome the limitation of previous data becoming irrelevant in the model over time. To this end, in the USC model, current traffic information is learned in real time and is used as historical data. The framework can predict traffic at an accuracy comparable to that of the most effective prediction-trained model. Fig. 1 shows the transportation prediction system architecture of USC media. The artificial intelligence (AI)-based transportation prediction system offered by Blue Signal in the Republic of Korea provides road map information and predicts traffic flows and accident risk through big data analysis [30]. An AI-based transportation prediction engine was also developed based on transportation theory. Whereas a conventional GPS service provides information such as routes around traffic jams, the shortest travel time, and the shortest path, the prediction engine of Blue Signal predicts the safest and most convenient route. This engine can achieve 98% accuracy for traffic accident prediction on domestic highways.
As shown in Fig. 1, data are received in real time by the User Interface and Data Interface. Through the adaptive segmentation of the Context Space, the effect of each base prediction device is efficiently estimated. In this way, it is possible to predict traffic conditions in diverse situations.

III. PREDICTION OF TRAFFIC CONGESTION BASED ON LSTM THROUGH CORRECTION OF MISSING TEMPORAL AND SPATIAL DATA
The congestion prediction method developed to provide traffic information to users consists of data collection, correction of missing data, and prediction modeling. In this study, the collected data include node/link and traffic speed data provided by an ITS. The node/link data represent a road region or road connection point. The traffic speed data from the ITS are collected by traffic information collectors installed on the roads or along the roadsides. The traffic data include missing values and outliers. An outlier may be generated by an information collection failure, when there are errors in the collectors, or by shaded zones without automobiles travelling in them. The traffic data also include time-series features. For this reason, a missing value makes it difficult to extract the feature values when a deep learning model is used for prediction. Therefore, preprocessing of the outliers and missing values is required [31], [32]. During the data pre-processing, an outlier is processed, and filtering is then applied using the median absolute deviation [33], [34]. Missing data are corrected using spatial trends, temporal trends, and pattern utilization. With the pre-processed data, an LSTM model is used to predict traffic congestion. Fig. 2 shows the entire process of LSTM-based traffic congestion prediction through the correction of temporal and spatial data. Traffic speed data include outliers that distort the flow of the average traffic speed and missing values. An outlier represents a value that is either too small or too large in the context of the average traffic flow on each road. Such values are removed to avoid influencing the feature values at the time of prediction. There are two types of missing values. The first type is missing temporal values that occur when not all of the traffic data (which are collected every 5 min) are gathered. The second type is missing spatial values that occur when data are not collected at each road in a given collection interval. Fig. 3 shows the time in a link matrix with examples of an outlier and each type of missing value.
To correct for outliers and missing values in the traffic data, the outlier removal process is first applied. There are a variety of typical outlier removal methods that use, for example, the median absolute deviation, truncated mean, or Winsorized mean. Methods may be combined depending on the features of the roads and traffic data. This study applies the median absolute deviation to identify and remove outliers. That is, the median value of the collected data is used to detect whether a value is abnormally large or small. When a value is identified as an outlier, it is removed. Algorithm 1 shows the outlier removal algorithm. Fig. 4 shows the outlier removal   process using the median absolute deviation. The missing value correction is an algorithm-based filtering process for correcting data that was removed after being identified as outliers. The missing-value correction methods include the application spatial trends from data from regions with a similar traffic pattern, the use of temporal trends to correct the value in question using past data, and the use of pattern data. Each method corrects missing data values from a temporal or spatial perspective.  The spatial trends are used to correct missing values in regions with similar traffic patterns under the assumption that the traffic flow of the upper regions influences that of the lower regions. Algorithm 2 is a missing value correction method based on the use of spatial trends. For instance, if detector x b has a problem and its data are missing, the mean of the adjoining link data from x a and x c is used for the correction. The use of the spatial trend-based missing value correction process is shown in Fig. 5.

Algorithm 1 Outlier Removal Algorithm
If missing data occur at three continuous points, the spatial trend correction is not possible because there are no adjoining

Algorithm 2 Spatial Data Correction Algorithm
Input: X a (Adjoining Northbound Link) X b (Target Link) Output: X b (Target Link) links. In this case, the temporal trend is applied. The temporal method calculates the mean of the n previous observations at missing observations location. Equation 1 shows the correction equation using the temporal trend. In the equation, Ft is the missing value at the current time t and is to be estimated, A t−k is the detected data at time t − k, and n is the number of past detected observations. In Fig. 6, the use of the temporal trend-based correction procedure is illustrated. Using the temporal trend, the missing values in the traffic data can be fully corrected. Nevertheless, if there are many sequential missing values, when applying the method, the estimated temporal values are constant, and the data pattern disappears, as shown in Fig. 7.  Therefore, if the temporal trend is not useful, the pattern data are applied. This final method estimates the missing values by applying data collected in the connected parts, such as the data entrance and the entrance access parts. For the pattern data generation procedure, the data from previous days are checked to find the passage features of each day, and the data are saved as one of six types: a special day, Sunday, Saturday, Monday, weekday (Tuesday through Thursday), or Friday. The pattern data of each type are generated every 5 min and are updated by applying a weight to the current collection speed.

B. LSTM-BASED TRAFFIC CONGESTION PREDICTION
For traffic speed prediction, we use time series-based deep learning (LSTM or long-term memory) for modeling [35], [36]. The data used for prediction are pre-processed using the method described in the previous section. The input data used for modeling are the mean speeds from 10 min and 5 min earlier, the current speed, and the speed of adjoining upper region. The output data is the predicted speed 5 min after the current time. Fig. 8 shows the LSTM-based traffic prediction process proposed in this study. An LSTM cell consists of a memory cell and gates. Input information is saved in the memory cell, and a gate controls the saved information. The parameters of the proposed LSTM model are shown in Table 1. The learning rate is a Hyper parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. [37]. Dropout is used to prevent overfitting, which can occur during the learning process [38]. In other words, dropout is used when a model lacks flexibility due to overfitting (which means that the error is small when testing with the learning data and large when testing with the test data) and can therefore not be generalized. The batch size represents the data input to the model concurrently with the training data. The optimization function is an algorithm for updating the weights. The number of hidden neurons and layers, the number of epochs, and the loss function, all of which affect performance, are frequently changed to induce improved performance [39], [40].

IV. EXPERIMENT AND RESULTS
The LSTM-based traffic congestion prediction method proposed in this study was implemented using the following hardware and operating system: Windows 10 Pro, an AMD Ryzen 5 1600 6-Core processor, an NVIDIA GeForce GTX 1070, and 16 GB of RAM. In terms of software, a Tensor-Flow back-end engine and the deep learning library Keras were used in the design. The traffic speed data used in this study was collected in Gangnam-gu, Seoul during one month in November, 2018. There are a total of 1,630 links in Gangnam-gu, and data was collected at each link [41], [42]. There were a total of 8,640 observations collected at each link according to the collection cycle (5 min * 30 days) and the collection period. Some data were missing; data may have failed to be collected due to a sensor or software error in the process of data collection. The average missing rate of Gangnam-gu traffic speed data is approximately 33%.

A. IMPLEMENTATION OF TRAFFIC CONGESTION PREDICTION SYSTEM
In this study, a system for pre-processing traffic data and a traffic congestion prediction model were established. Fig. 9 shows the pre-processing system for the traffic data. The table in Fig. 9 shows an example of the speed data for all regions in Gangnam-gu. By entering a LINK_ID in the Setting field at the bottom right, selecting a pre-processing method (outlier removal, correction of missing spatial or temporal values, or the use of pattern data), and clicking the Start button, data pre-processing is applied. The pre-processed region data appear in the bottom left of Fig. 9. It is possible to save the pre-processed data by clicking the Save button. region is selected in the region selection window in the top-left of the prediction system. In the LINK overview window, the description of the selected region (LINK_ID, LINK_NAME, Velocity) is displayed. In the data collection   refers to speeds of 30 km/h or higher, 'congested' to 15 km/h ∼ 30 km/h, and 'very congested' to less than 15 km/h. For highways, 'smooth' refers to speeds of 70 km/h or higher, 'congested' to 40 km/h ∼ 70 km/h, and 'very congested' to less than 40 km/h. These criteria are suggested by the Ministry of Land, Infrastructure and Transport [41]. Numerical information for the expected congestion region is provided in the table below the simulation map. Fig. 10 shows the LSTM-based traffic congestion prediction system [43], [44].

B. COMPARATIVE EVALUATION OF PERFORMANCE ACCORDING TO THE MISSING VALUE CORRECTION METHOD
If a model learns on data that includes missing values, the predication ability can be diminished. For this reason, it is necessary to correct missing values, and the model accuracy may change according to the correction method.
We therefore evaluated the performance of our correction methods through repeated experiments varying the missing rate. In the experiments, historical imputation methods (HIM) and nearest neighbor imputation (NIM) are used as conventional missing value correction methods for comparison with the missing value correction method proposed in this study. The performance comparison was conducted through the data missing rate based MAPE. The data missing rate ranged from 10% to 90% in increments of 10%. Fig. 11 shows the results of the performance evaluation for each of the missing value correction methods.
As shown in Fig. 11, the proposed method performed better in terms of MAPE than the conventional missing value correction methods. HIM corrects temporal missing values but fails to correct spatial missing values. In addition, its performance deteriorates when a large proportion of the data is missing. Unlike the HIM, the NIM cannot correct the temporal missing value, but it is possible to correct the data when the data in the neighboring space is not recorded. In contrast, the data correction method proposed in this study is able to correct both spatial and temporal data and exhibits excellent performance in terms of MAPE.

C. PERFORMANCE EVALUATION OF PREDICTION MODEL
For the performance evaluation and loss function of the model used in this study, the MAPE was used [45], [46]. The MAPE can be applied to overcome the effect of size-dependent error and represents the mean of the absolute error between the actual and predicted values. It was used for the loss function because it is sensitive to small values in low-speed regions such as congested areas. It was also used for the performance evaluation of the proposed method. The MAPE can be calculated by Equation (2), where A i is an actual value and F i is the predicted value. The MAPE is expressed as a percentage by subtracting the actual value from the predicted value and dividing the result by the actual value; this quantity is summed for all of the observations, and the sum is dividing by n. The lower a MAPE value is, the higher the model accuracy is. In addition, the data used in the experiment is the traffic data of a day. The data includes data on an urban area with high congestion and a suburban area with relatively low congestion. The performance of the LSTM model for congestion prediction was evaluated using uninterrupted and interrupted flow regions. An uninterrupted flow region has no external influences that control the traffic flow. An interrupted flow region refers to a region with interrupted traffic flow that has crossroads and trunk lines that cause interruptions due to traffic signals or traffic control facilities. An example of an uninterrupted flow region is a suburban area with highways, while an example of an interrupted flow region is an urban area with traffic signals and traffic control facilities. Fig. 12 shows a graph of the MAPE results for suburban and urban areas. For the suburban areas, three regions were extracted, and the northbound and southbound speeds were predicted. As shown in the graphs of the prediction results, the average MAPE was approximately 4.297%. As for the suburban areas, three regions were extracted from the urban areas, and the northbound and southbound speeds were predicted. The average MAPE for the urban areas was approximately 6.087%. The urban areas showed a somewhat lower accuracy than the suburban areas, and the reasons for this were analyzed. The suburban areas included fewer surrounding buildings and no traffic signals, and the speed limit within these regions was higher than in the urban areas. By contrast, the urban areas included numerous buildings, the large influence of a floating population other than drivers, traffic signals  at crossroads, and numerous variables interrupting the traffic flow. For these reasons, it is more difficult to predict the traffic flow in urban areas. In addition, Fig. 13 shows the results of a comparative analysis of the actual and predicted values in terms of the MAPE for three sections of the city center, while Fig. 14 shows the same comparison for three sections on the outskirts of the city. However, the MAPE reduces the denominator as the actual measurement approaches 0. This results in a significant increase in Absolute Percentage Error (APE) even if the absolute error value is small, resulting in a biased value when the average is taken. Therefore, RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error) is used for measuring performance in order to prevent the distortion of overall prediction performance. MAE calculates results through identical standards in different circumstances. Also, RMSE reduces distortion through route about errors dependent on size, which is the problem of MSE (Mean Squared Error), and displays the average of errors themselves intuitively.
In this study, the performances are compared between urban area and suburban area to evaluate the performances of prediction. Figure 13 and 14 show the results of performance evaluation through RMSE and MAE of urban and suburban areas. In the results of performance evaluation through RMSE in Figure 13, Southbound of Seocho-daero shows the best performance, which is 1.543. The Northbound of National  Route 47 shows relatively low performance, which is 5.524. The results of 12 routes of MAE show 3.27 in average. MAE in Figure 14 shows the best performance in the Southbound of Seocho-daero like RMSE, and the Northbound of Teheran-ro shows the lowest performance, which is 3.83. The results of 12 routes of MAE show 2.24 in average. The results through MAPE show that the congestion in urban areas has poor prediction performance. But, the results of RMSE and MAE show that the performance of some suburban areas is poorer than the prediction of urban areas' congestion. This is because MAE and RMSE do not depend on the speed values or situation changes in urban areas with high congestion level and suburban areas where congestion level is not high, but are the results of calculation through identical standards. Therefore, when the three performance evaluation indexes are comprehensively analyzed, the prediction performance of urban areas except the Northbound of Teheran-ro is mostly better than that of suburban areas.

D. EVALUATION OF MODEL GOODNESS-OF-FIT IN COMPARISON WITH DIFFERENT MODELS
To demonstrate the reliability of the model proposed in this study, the goodness-of-fit of the model was evaluated. The proposed model was compared with other models presented in relevant studies. The data used for the comparison was preprocessed by the method proposed in this study. The performance index used for the comparison was the MAPE. The models used for comparison are RNN, LSTM, and STGCN models. Table 2 presents the prediction results for the different data and models in the comparison. As shown in Table 2, in terms of the MAPE, the proposed method had better goodness-of-fit than the other methods. The RNN performed worse than the LSTM models. This is because the RNN has the problem of long-term dependency. According to the comparison, there is performance improvement of 0.97 for the proposed model over that of Mou et al. [27]. The LSTM model used in this study is therefore good for traffic congestion prediction, since it accounts for temporal features.

V. CONCLUSION
In this study, an LSTM-based traffic congestion prediction method using a correction for missing temporal and spatial data was proposed. Based on experimental results, outliers and missing values in the traffic data influenced the prediction results. To improve the model performance, the outliers were removed, and the data were pre-processed using spatial and temporal trends and pattern data. As a predictive model, LSTM was applied. It is derived from the RNN model and solves the problem of long-term dependency. In the LSTM model, the result of a hidden layer is passed into the same hidden layer as an input. Because the model considers sequential or temporal aspects, it can be applied to learn the time-series features of traffic data. In an experiment to evaluate the model performance, suburban areas were used as an example of uninterrupted flow regions and urban areas as an example of interrupted flow regions. The suburban areas were less influenced by the traffic flows with external interference than the urban areas, and therefore had fewer variables at the time of prediction. The model thus demonstrated higher prediction accuracy for suburban areas. In comparison with relevant models, the proposed method was found to achieve better performance with a difference in the MAPE of 3%-17%. As a future study, we plan to increase the accuracy of the traffic congestion prediction in low-speed regions and urban areas and to establish a model with better user performance.