MF-TCPV: A Machine Learning and Fuzzy Comprehensive Evaluation-Based Framework for Traffic Congestion Prediction and Visualization

A framework for traffic congestion prediction and visualization based on machine learning and Fuzzy Comprehensive Evaluation named MF-TCPV is proposed in this paper. The framework uses DataX and DataV to implement the integration of multi-source heterogeneous traffic data and the visualization of congestion prediction results. A deep prediction model named LSTM-SPRVM based on deep learning algorithms, machine learning algorithms, and Spark parallelization technology for the prediction of traffic congestion features in the future is proposed. In MF-TCPV, traffic congestion is divided into six levels based on Fuzzy Comprehensive Evaluation and traffic congestion features such as average speed, road occupancy rate, and traffic flow density. MF-TCPV is validated based on the real data of Whitemud Drive in Canada. The experimental results demonstrate that MF-TCPV is capable of predicting the traffic congestion accurately and displaying prediction results visually. LSTM-SPRVM is better than other existing deep learning models in terms of prediction accuracy, and MF-TCPV can intuitively visualize the prediction results of traffic congestion.


I. INTRODUCTION
Intelligent Traffic Systems (ITS) is an integrated system that combines advanced science and technology such as electronic information technology, data communication technology, sensor technology, control theory, operational research, and artificial intelligence to improve the transportation industry. Applications and researches of ITS involve highway, railway, civil aviation, water carriage, and other modes of transportation [1]. Traffic congestion prediction, also call as traffic flow state prediction, is an essential part of ITS. Untimely and unreasonable traffic congestion prediction will cause huge economic losses to the society, the significant increase of exhaust gas pollution, and The associate editor coordinating the review of this manuscript and approving it for publication was Nabil Benamar . the reduction of citizens' living standards [2], [3]. Therefore, congestion prediction is of great value to traffic management.
The treatment of traffic congestion should focus on prevention, that is, can predict the changing trend of the traffic state in a short time based on the current information of the traffic flow and to provide early warning of possible congestion. Existing researches on traffic congestion mostly focused on the prediction of traffic congestion features such as traffic volume [4] and average vehicle speed [5] or the determination of the current congestion [6], [7]. A framework for traffic congestion prediction and visualization that can consider most traffic congestion features and predict traffic congestion in the future is needed.
In this paper, we consider the prediction of traffic congestion features and the determination of the current congestion to build a framework for traffic congestion prediction and VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ visualization, called Machine Learning and Fuzzy Comprehensive Evaluation based Framework for Prediction and Visualization of Traffic Congestion (or simply MF-TCPV), which can evaluate short-term traffic congestion in the future by using the prediction results of traffic congestion features. We utilize some of Alibaba's excellent components to implement data integration and data visualization in MF-TCPV. To predict traffic congestion features, we develop a deep prediction model combining deep learning and machine learning. We expand our previous researches to train the deep model in parallel based on parametric parallelism to improve training efficiency [8], [9]. The main contributions of this paper are summarized as follows: • A novel framework named MF-TCPV is proposed for the prediction and the visualization of traffic congestion, which can be divided into three layers--raw data layer, data processing layer, and data presentation layer.
• In the data processing layer, a deep prediction model based on deep learning and machine learning is proposed for congestion evaluation parameter prediction, called long short-term memory networks combined relevance vector machine based on spark parallelization (or simply LSTM-SPRVM). Fuzzy Comprehensive Evaluation is used to divide the traffic congestion into six levels by prediction results of LSTM-SPRVM. The weight of each traffic congestion feature is determined by the entropy method.
• In the data presentation layer, we set up a cache database and design the visualization of traffic congestion prediction by DataV.
• We validate the performance of MF-TCPV based on the real data of Whitemud Drive in Canada.
The remainder of this paper is organized as follows: the related works are discussed in Section 2. Section 3 presents the MF-TCPV framework. Section 4 describes the experimental results. Finally, Section 5 draws the conclusion and future research directions.

II. RELATED WORK A. PREDICTION OF TRAFFIC CONGESTION
The existing frameworks for predicting traffic congestion have significant drawbacks in terms of their ability to maximize the use of traffic flow data and evaluate traffic congestion in the future. Some frameworks which cannot predict traffic congestion in the future directly use clustering and classification algorithms to identify the current traffic congestion situation [6], [7], [10]. Ruiqi et al. [6] proposed a congestion recognition method based on a multi-class support vector machine (SVM). Firstly, the congestion was divided into three kinds: ''traffic,'' ''congestion,'' and ''traffic paralysis'' by using pattern recognition technology, and the road traffic situation was divided into ''traffic'' and ''congestion'' by using SVM, based on this, the quadratic discriminant of ''congestion'' and ''traffic paralysis'' were carried out to ''congestion'' state. Zhang et al. [7] collected traffic flow velocity, traffic flow density, and traffic volume to judge the traffic congestion level by grey relational membership degree rank clustering algorithm. Zhu et al. [10] extracted feature parameters from video data by gray level co-occurrence histogram and histogram of oriented gradient. Then they used SVM to classify traffic congestion levels. These frameworks and methods are simple in process but cannot predict traffic congestion in the future.
Most researchers used traffic congestion features obtained from the original traffic flow data to evaluate congestion. Traffic congestion features can be divided into three categories: traffic congestion features based on the V/C ratio [11], traffic congestion features based on the time ratio [12], and traffic congestion features based on economics [13]. Traffic congestion features based on the V/C ratio include traffic flow, average speed, road occupancy rate, traffic flow density, etc. Due to its operability and collectability, traffic congestion features based on the V/C ratio were widely used. In previous researches, the determination of traffic congestion is based on a specific feature such as vehicle speed and occupancy rate [14], [15]. Islam et al. [14] proposed a smart traffic control system based on the measurement of traffic density using the real-time video processing technique, which can detect the traffic congestion. Yang et al. [15] proposed a framework for the prediction of traffic congestion. In the framework, backpropagation neural network (BPNN) was used to predict traffic volume in the future. Road occupancy rate and traffic flow density were calculated by prediction results. Finally, traffic congestion was evaluated according to road occupancy rate and traffic flow density in the future.
In addition to traffic congestion features, there are other types of data used by some frameworks to describe congestion. These data include real-time monitoring of video data [16], numerous GPS data [17], [18], and road network map data containing traffic flow information [19]. These data are difficult to obtain, resulting in the lack of practicality of the framework which using these data. Furthermore, the existing frameworks' design of data integration and visualization is not detailed enough, which will also affect the practicality of frameworks. Besides, there are some frameworks based on predicting a single traffic congestion feature [20], [21]. Table 1 shows the comparison of MF-TCPV with existing frameworks or methods.
Therefore, there is a need to develop a framework as a benchmark to: (1) use most of traffic congestion features based on the V/C ratio.
(3) have the ability for short-term prediction.
(4) have data integration and visualization functions.
Our proposed framework named MF-TCPV addresses these challenges of existing frameworks in this research work. In MF-TCPV, we use LSTM-SPRVM to predict traffic volume and vehicle speed. Traffic congestion features such as average speed, road occupancy rate, and traffic flow density are calculated based on the prediction result to describe congestion by Fuzzy Comprehensive Evaluation. The calculation amount of Fuzzy Comprehensive Evaluation is small and no need to train a complex classification model. Besides, Fuzzy Comprehensive Evaluation combines with the predicting of traffic congestion features can predict future congestion conditions in the future. MF-TCPV is more practical than other classification algorithms and clustering algorithms in the application scenario of traffic congestion prediction.

B. PREDICTION OF TRAFFIC CONGESTION FEATURES
The prediction problem of traffic congestion features belongs to the time series prediction problem [22]. The prediction accuracy of traffic congestion features determines the availability of Fuzzy Comprehensive Evaluation results. At present, there are two main types of research about predicting traffic congestion features. The researches of the first type were driven by models. They predicted traffic congestion features by using Auto Regression Moving Average Model (ARMA) [23], Autoregressive Integrated Moving Average Model (ARIMA) [24], Kalman Filtering [25], and other models. These methods usually assume that time series data is generated by a linear process and establish the dependence in series data based on a linear relationship. Therefore, although the calculation is simple, the accuracy is poor.
The researches of the second type, such as SVM [26], Recurrent Neural Network (RNN) [27], [28], RVM [4], and other machine learning and deep learning algorithms were driven by data. These models have excellent generalization and suitable for most data. Where RVM has extremely fast test speed and is widely used to predict traffic congestion features. Xin et al. [29] used Genetic Algorithm (GA) to optimize RVM to prove that heuristic algorithms can improve the capability of RVM. Li et al. [30] analyzed the correlation between various traffic flow features and used LSTM to predict vehicle speed. Shen et al. [31] proposed a hybrid algorithm called Chaos Simulated Annealing Algorithm (CSA) and used it to optimize RVM to predict traffic flow. This method has better prediction accuracy than modeldriven methods. But none of them considered the efficiency of heuristic algorithms as parameter optimization algorithms. For common machine learning algorithms such as SVM and RVM, we found that the time consumed by the parameter optimization algorithm accounts for 80%-95% of the overall training time [8], [9]. The low efficiency of parameter optimization will significantly affect the training efficiency of the prediction model.
On the other hand, the deep model that extracts features firstly from the traffic flow data by deep learning algorithms gradually favored by scholars. LSTM-SVR is the most commonly used deep model to fit traffic flow data. Zheng et al. [32] proposed a deep model named LSTM-SVR, which uses the advantage of LSTM in processing time series data to extract features of traffic flow and the high-dimensional expression ability of SVM to predict subway traffic flow. The result shows that LSTM-SVR is superior to LSTM and SVR. Li et al. [33] extracted multi-dimensional traffic flow features through Deep Belief Networks (DBN) to create a new feature dataset. Then input it into SVR to predict short-term traffic flow. The method considers the number of lanes, the time ratio of traffic lights, whether to turn left, and other information to obtains better results. Besides LSTM and DBN, deep learning algorithms such as CNN and GRU had also been applied to feature extraction [34], [35].
According to literature research, it can be found that LSTM and GRU are suitable for extracting single-dimensional and low-dimensional features of traffic flow data. DBN and CNN are suitable for extracting high-dimensional features of traffic flow data. RVM, which is more prominent in machine learning algorithms, is rarely used in deep models. Based on the  above literature review, we propose LSTM-SPRVM as a part of MF-TCPV to predict traffic congestion features.

III. MF-TCPV FRAMEWORK
This section presents the detailed description of MF-TCPV. Fig. 1 illustrates the architecture of MF-TCPV.

A. RAW DATA LAYER
Raw traffic data is scattered in different data sources, such as relational databases (MySQL, Oracle), NoSQL data storage (HBase, Hive), and unstructured data storage (HDFS). It is necessary to collect, organize, clean the data from different data sources and load it into a new data source. To begin MF-TCPV, we introduce DataX into the framework. DataX is a tool for offline synchronization and integration of heterogeneous data sources, which design concept is shown in Fig.2 [36].
DataX is widely used within Alibaba and assumes all Alibaba's offline synchronization services of big data. More than 80,000 simultaneous operations are completed every day by DataX, and the transmission volume of daily data exceeds 300TB. DataX is fully capable of traffic data integration. In MF-TCPV, multi-source traffic data is loaded into a data warehouse by DataX. The advantage of this design is that complex meshed multi-source data links are no longer needed.

B. DATA PROCESSING LAYER 1) LSTM-SPRVM FOR PREDICTING OF TRAFFIC CONGESTION FEATURES
Architecture The principal part of data processing layer is LSTM-SPRVM for the prediction of traffic congestion features. In LSTM-SPRVM, LSTM is chosen for feature extraction of traffic flow data, RVM is selected as the output layer of LSTM-SPRVM, the heuristic algorithm is selected as the parameter optimization algorithm of RVM, and Spark is selected for parallel training of LSTM-SPRVM to improve training efficiency. LSTMs can fit the statistical relationship of time series data through their structure to extract features from original data [37]. Fig.3 shows the feature extraction and prediction process in LSTM-SPRVM.
In LSTM-SPRVM, raw data is transformed into multiple ''windows'' and input them into LSTMs for feature extraction to generate a new feature dataset. According to the experimental data and results, we set n to 10. The LSTM layers and RVM of our deep prediction model as shown in Fig. 4.
To extract traffic data features, a matrix with traffic data is input into a single LSTM layer. Due to the sparsity of traffic data, optimizers with the adaptive learning rate will perform better. So, we choose NADAM as the optimizer of LSTM, which uses Nesterov to update the gradient-based on ADAM [38]. When a neural network is trained with a small data set, it is easy to cause overfitting. To prevent overfitting, the performance of the neural network can be improved by adding a dropout layer in the neural network [39]. In LSTM-SPRVM, 5% of neurons will be disconnected in the dropout layer. Output the extracted features through a Dense layer. Finally, the new feature set is input into RVM for prediction.
In past research, traffic flow data was mostly processed as time series data with intervals of 10-minutes, 5-minutes, or 2-minutes. For the time series prediction problem, the shorter the time interval, the more practical application value it has and the nonlinearity of the data has [40], [41]. Meanwhile, for traffic flow data, short intervals will increase the sparsity of the data. The greater the sparsity of data and the greater the difficulty of prediction it has. Moreover, too short intervals will result in decision-makers not having time to make reasonable decisions in a short decision-making period (e.g., Observation-Orientation-Decision-Action, OODA). After careful consideration, we process traffic flow data with a 5-minute interval. The method of predicting traffic speed is the same as the method of predicting traffic flow.
Parameter optimization Single kernel functions have poor capability when the samples are unevenly distributed in the high-dimensional space [42]. So, we combine single kernel functions linearly to a construct combined kernel function as shown in (1).
There is no need to prove the availability of the combined kernel functions, because kernel functions of RVM do not need to meet Mercer's theorem. The combined kernel function makes RVM get local learning ability of Gaussian kernel and generalization of Polynomial kernel [43]. Where σ is the width of the combined kernel function. d determines the distribution of the data in high-dimensional space. λ is weight coefficient and meets 0 ≤ λ ≤ 1.
σ, d, λ in (1) are also hyperparameters which need to be optimized. The purpose of the parameter optimization algorithm is to find a {σ best , λ best , d best }. Input the {σ best , λ best , d best } as parameters of RVM to calculate the accuracy p Accuracy of the prediction model. The mathematical VOLUME 8, 2020 model of the parameter optimization algorithm is as follows max Accuracy = RVM (P) where −2 σ min and 2 σ max mean variation range of σ best . Generally, σ min = σ max = 3. d min and d max mean variation range of d best . Generally, d min = 0 and d max = ∞.
GA is one of the most commonly used parameter optimization algorithms of RVM [44]- [46]. Moreover, GA's optimization performance is generally better than common algorithms [47], [48]. Therefore, we use GA to solve the mathematical model which is shown in (2).
Spark and MapReduce Currently, the most commonly used parallel computing frameworks are MapReduce and Spark of Apache. When executing tasks, MapReduce uses a multi-process model, while Spark uses a multi-thread model. Moreover, Spark is based on memory computing, which is more suitable for machine learning algorithms, iterative algorithms, and stream computing [49]. Since the heuristic algorithms are equivalent to iterative optimization algorithms, we choose Spark for parallelization.
Parallel design Add parameter optimization algorithms to RVM will double the training time of RVM. For reducing the training time of RVM, we design a parallel training method of RVM based on parametric parallelism. The time-consuming analysis of GA can be seen in our previously published paper [9]. It shows that fitness calculation consumes the most time and accounts for more than 90% of the total time. The overall parallelization process based on parametric parallelism is shown in Fig. 5.
The parallelization method designed in this paper mainly relies on Spark's unique data format--Resilient Distributed Datasets (RDD). In traditional parallel methods based on data parallelism, the training data is created as RDD [50]- [52]. In the parallelization method based on parametric parallelism proposed in this paper, the initial population of GA is created as RDD. The parallelization method based on parametric parallelism is suitable for prediction problems on small and medium-sized data, such as time series prediction. The pseudocode of GA combined with the parallelization method designed in this paper is shown in Procedure 1.
Procedure 1 evaluates the initial population by map function and then updates the population according to the evaluation results. Finally, Procedure 1 finds the index of the best individual that meets the conditions for terminating the iteration and decodes the individual into {σ best , λ best , d best }.

2) CALCULATION OF TRAFFIC CONGESTION FEATURES
This section is responsible for using prediction results of LSTM-SPRVM to calculation average speed, road occupancy, and traffic flow density in the future based. The specific calculation method is shown below. Average speed Average speed refers to the average distance traveled by all vehicles on the road in a unit time. Since one road may correspond to multiple lanes, the calculation method is as follows where v i means average speed corresponding to the ith lane; N means the total number of lanes on the road. In general, the higher the average speed, the smoother the road. The lower the average speed, the more blocked the road.
Road occupancy Road occupancy means the ratio of the actual traffic flow to the maximum capacity in a specific section. It reflects the actual load capacity of the road. The calculation method is as follows where V means current traffic volume; C means the maximum capacity. Traffic flow density Traffic flow density means the number of vehicles in particular length of road within a unit time. The calculation method is as follows where f means traffic volume;v means average speed.

3) FUZZY COMPREHENSIVE EVALUATION
This section is responsible for evaluating the traffic congestion in the future by entropy method and Fuzzy Comprehensive Evaluation method. According to related research such as the ''Highway Capacity Manual,'' we divide traffic congestion into six levels: especially unblocked, unblocked, light congestion, moderate congestion, heavy congestion, and lock up in MF-TCPV. Suppose the prediction results of LSTM-SPRVM are R = (R 1 , R 2 , · · · , R i ) T . We have selected three traffic congestion features (i.e., average speed, road occupancy, and traffic flow density), so i = 3. The evaluation matrix in Fuzzy Comprehensive Evaluation is calculated by   r 11 r 12 · · · r 16 r 21 r 22 · · · r 26 r 31 r 32 · · · r 36   .
For each factor i, the membership degree r ij of the jth level is obtained through a membership function. We choose the trapezoidal membership function in this paper. The trapezoidal membership function is shown in Fig. 6. Where {u 1 , u 2 , · · · , u 5 } is the threshold range of each factor indicator. {k 1 , k 2 , · · · , k 10 } is the proximity threshold of each factor indicator. In the Y-axis, ''1'' means that the element belongs to this level and ''0'' means not. For positive factors such as road occupancy and traffic flow density, the status of the function is from ''especially unblocked'' to ''lock up'' (i.e., The lower road occupancy or traffic flow density, the smoother the road). For negative factors such as average speed, the status of the function is from ''lock up'' to ''especially unblocked'' (i.e., The greater speed, the smoother the road). According to the membership function, the calculation formula of r ij in (6) is as follows Use Entropy method to calculate the weight of average speed, road occupancy, and traffic flow density W = {w 1 , w 2 , w 3 }. Due to the significant difference in traffic flow between peak hours and regular hours, we calculate the weight in peak hours and regular hours, respectively. They marked as W peak and W regular . Calculate the Fuzzy Comprehensive Evaluation matrix by a composite operator, which calculation method is as follow We use M (·, ⊕) as the fuzzy composition operator, which can make full use of information in R. Based on the maximum membership principle, if b 1 = max (b 1 , b 2 , . . . , b 6 ), the level of traffic congestion at that moment will be especially unblocked.

C. DATA PRESENTATION LAYER 1) DATA PRESENTATION LAYER CACHE DATABASE
In MF-TCPV, evaluation results involved in section 3.2.3 are uploaded to a data presentation layer cache database, which VOLUME 8, 2020 only saves the data that needs to be displayed. The advantage of the design is that it can increase data security in the framework and the attack cost of potential attackers. If the data presentation layer is attacked, the raw data will not be leaked, and the attacker can only obtain the data of the data presentation layer cache database. Compared with the amount of the raw data, the data amount that needs to be displayed is much smaller. Therefore, the size of the data presentation layer cache database does not need to be too large.

2) VISUALIZE DATA BY DataV
In MF-TCPV, we choose DataV to visualize data. DataV is a Software-as-a-Service (SaaS) visual deployment tool that enables rapid construction and cross-platform publishing of interactive visualization on large high-resolution displays (LHDs), which consists of four parts, including data importing, visual components, editor toolchain, and application publishing [53].
Datav's development objects are mainly visualization based on LHDs. This is an advantage that other visualization components on the market, such as D3.js, Dygraphs.js, and ECharts do not have.
Firstly, load the data presentation layer cache database into the data importing center of DataV, and then add visual components for visual mapping. Control components and design visual layout through editor toolchain. Finally, use the one-click SaaS publishing function provided by DataV to publish the visualization design to Wide Area Network.

IV. PERFORMANCE EVALUATION AND RESULT ANALYSIS A. INTEGRATION OF ALL COMPONENTS IN MF-TCPV
We implement MF-TCPV based on two servers with 6-core CPUs, 64G RAM, 2T Disks, and CentOS 7.8. The specific information about the software we used is shown in table 2. DataX and Spark were deployed on Server 1. Data warehouse was deployed on Server 2. In LSTM-SPRVM, the activation function is ReLU; Batch size is 32; Epoch is 100; population size is 20; maximum number of iterations is 10; mutation rate is 0.2; crossover rate is 0.6. We use the MSE of RVM as the fitness function of GA, which calculation formula is as follows where N is the population size; s i means the actual value; s i means the predictive value of LSTM-SPRVM. Before predicting, we normalize experimental data based on extremum to compress the data into [0,1] by

B. EXPERIMENTAL RESULTS ANALYSIS
The MF-TCPV framework used traffic flow data from Whitemud Drive motorway in Canada collected by Intelligent Transportation Research Center at University of Alberta to verify its performance [54]. The collection frequency of the data is 20 seconds. These raw data were stored in MySQL. We use the traffic volume data obtained from 6 August 2015 to 27 August 2015 as a training set and the traffic volume data obtained from 0:00 to 24:00 on 28 August 2015 as a testing set. The information about the traffic sections corresponding to the experimental data is shown in table 3. Because the data in station ID 1042 was collected from the Whitemud Drive ramp, so it has high randomness and fit difficultly.

1) FEATURE EXTRACTION
We should justify the choice of LSTM. In this experiment, we design a comparison between LSTM and other types of RNNs, such as GRU and Bi-LSTM. In the process of feature extraction, the change trends of their loss are shown in Fig. 7.
As shown in Fig. 7, for performance, LSTM is better than GRU and slightly worse than Bi-LSTM. We further analyze their efficiency, which results are shown in the in table 4. As table4 shows, for efficiency, Bi-LSTM is significantly lower than LSTM. After comprehensive consideration, we choose LSTM for feature extraction.

2) KERNEL FUNCTION PERFORMANCE
In MF-TCPV, we provide six kernel functions for LSTM-SPRVM. The purpose of the kernel function performance  experiment is to find the kernel function that is most suitable for the experiment data.
In terms of evaluation indicators, we use accuracy (i.e., 1-MAPE) to verify kernel function performance. Besides, the use of LSTM is treated as a variable factor in the experiment. The relevant experimental results are shown in table 5.  As table 5  Where SPGAPSO-CKRVM is the model we proposed before, that is, using parallelized GA and PSO to optimize the combined kernel RVM [9]. LSTM-SVR represents a model use SVR to predict traffic flow after features extraction by LSTM [32]. LSTM-SVR is the most commonly used deep model in recent years and the closest model to LSTM-SPRVM. Table 6 lists the MSE, RMSE, and MAPE of the different prediction models. As shown in table 6, LSTM has the worst performance. Other deep models based on RNN, such as CNN-LSTM, CNN-GRU, and CNN-Bi-LSTM are significantly better than LSTM. LSTM-SVR is slightly better than the above models in particular data sets. In all data sets, LSTM-SPRVM is always superior to other comparable models. Compared with LSTM-SVR, the RMSE of LSTM-SPRVM reduces by 2.14%, and the MAPE reduces by 4.31%. Figure 8, Fig. 9, and Fig. 10 show the prediction results of LSTM-SPRVM. We also use LSTM-SPRVM to predict the traffic speed data in three sections. In this experiment, we use the traffic speed data obtained from 6 August 2015 to 27 August 2015 as a   training set and the traffic speed data obtained from 0:00 to 24:00 on 28 August 2015 as a testing set. The speed prediction results are shown in table 7.
As shown in table 7, the comparison results of speed prediction and traffic flow prediction are similar. The result predicted by LSTM-SPRVM is optimal. Compared with LSTM-SVR, MSE of LSTM-SPRVM reduces by an average of 9.37%, and MAPE reduces by 0.6% respectively.
Next, we compare the papers that used similar data (i.e., the traffic flow data from Whitemud Drive) for experiments. The comparison results are shown in table 8.
In the above comparison experiment, for ensuring the results' credibility, we use the training set and the test set are the same as those used in these papers. As shown in table 8, LSTM-SPRVM is better than other models.
The scalability of the parallelized training method proposed in this paper had been proven in our previous works. LSTM-SPRVM proposed in this paper has good scalability. See [9] for details.

4) TRAFFIC CONGESTION PREDICTION RESULTS USING LSTM-SPRVM
According to ''Highway Capacity Manual,'' the maximum road capacity of the road C in Whitemud Drive is about 180pch per lane. We calculate the road occupancy S and traffic flow density D of ID1027 section in the future by (4) Fig. 11. The ''1,2,3,4,5,6'' on the y-axis in Fig. 11 correspond to especially unblocked, unblocked, light congestion, moderate congestion, heavy congestion, and lock up road conditions, respectively. As shown in Fig. 11, the ID1027 section during the morning peak is more congested than the ID1027 section during the evening peak. There was even lock up road condition in the morning peak. There are 22 moments in error, and the prediction accuracy is 92.36%. Meanwhile, it also  illustrates the rationality of traffic congestion evaluated by Fuzzy Comprehensive Evaluation method.

C. VISUALIZATION OF TRAFFIC CONGESTION PREDICTION
The visualization of traffic congestion prediction results relies on DataV to achieve. The traffic congestion is described by a heat map. We use the data in station ID1008, ID1017, ID1037, ID1034, ID1033, ID1031, ID1029, and ID1019 for traffic congestion prediction. The settings of the training set and test set are the same as the above experiment. The ''1,2,3,4,5,6'' in heat map correspond to especially unblocked, unblocked, light congestion, moderate congestion, heavy congestion, and lock up road conditions respectively. The visualization result at 17:00 on August 28, 2015 is shown in Fig. 12. As shown in Fig. 12, we divide the road into multiple road sections according to the position of the surveillance camera. The visualized results are intuitive and beautiful, which can show the congestion situation well and conform to the actual condition.

V. CONCLUSION AND FURTHER RESEARCH
To construct a framework for traffic congestion called MF-TCPV including raw data layer, data processing layer, and data presentation layer. A deep model for the prediction of traffic congestion features called LSTM-SPRVM was proposed. The real data of Whitemud Drive in Canada were utilized to verify the framework. Experimental results show that LSTM-SPRVM is superior to other methods in accuracy. The prediction accuracy of future traffic congestion can reach 92.36%, and the visualization results of MF-TCPV are intuitive. Further research will include: • Consider more factors affecting traffic flow, such as traffic speed and volume on the adjacent roads. We plan to use a graph neural network to solve this problem.
• We used the original GA in this paper. For parameter optimization of RVM, most heuristic algorithms have the problem that the convergence speed is too fast in the early period of iterations, which leads to a decrease of the population diversity in the later period of iterations. We plan to combine RVM with complex heuristic algorithms, such as Adaptive Genetic Algorithm (AGA) to solve the problem.
• We will try to combine the science of cause and effect to explore the causes of congestion [59]. Construct a traffic congestion cause-effect diagram and transform a data-driven model into a cause-effect driven model.