Combined Prediction of Short-Term Travel Time of Expressway Based on CEEMDAN Decomposition

Travel time is the basis for intelligent emergency control and guidance in expressway networks. To realize its accurate prediction and improve the expressway service level during emergencies, this study uses a combined model to predict the short-term travel time of expressway sections based on the expressway gantry data of Sichuan Province. First, the travel time series was extracted using a data matching algorithm, and the double standard deviation-cyclic elimination (2SD-CE) algorithm was used to clean the data. Then, combined with the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm, the travel time subsequence was extracted, and the frequency of the subsequence was divided by Sample entropy (SampEn) algorithm. Based on this, bidirectional long short-term memory (BiLSTM), long short-term memory (LSTM), and vanilla recurrent neural network (vanilla RNN) models were used to construct prediction combination model 1 (CM1) under the condition of a single feature. Subsequently, the CEEMDAN and empirical mode decomposition (EMD) algorithms were combined with the LSTM algorithm to obtain the combination models (CM2 and CM3) without frequency division. The example calculation and analysis show that under different time granularities (5 min, 10 min, and 15 min) and different highway sections, the combined model can integrate the advantages of all prediction models and has higher prediction accuracy and stability, among which the prediction effect of CM1 can reduce the prediction value of the root mean squared error (RMSE) by 18.8~26.4%, 0.8~41%, 4.1~13.3%.

gency rescue. Therefore, there is an urgent need to improve 23 the intelligent control and guidance ability of expressway 24 networks, strengthen the close cooperation between peo-25 ple, vehicles, and roads, improve road traffic efficiency, and 26 The associate editor coordinating the review of this manuscript and approving it for publication was Mohamed M. A. Moustafa . create an efficient, accurate, and real-time expressway opera- 27 tion system [1]. In recent years, with the continuous develop-28 ment of big data and artificial intelligence technology, traffic 29 data collected by sensors have gradually improved, providing 30 a certain data basis for the construction of machine learn- 31 ing models. As a new research direction in machine learn- 32 ing, deep learning has also been increasingly applied in the 33 field of traffic prediction [2], [3], [4], [5]. Deep learning can 34 not only learn its internal laws and high-order representation 35 from massive traffic data but also has an end-to-end learning 36 method that is suitable for short-term travel time prediction 37 problems with high nonlinearity [ prediction models [13]. Ahmaed and Cook [14] applied the 80 ARIMA model for traffic flow prediction for the first time. 81 Li et al. [15] predicted traffic flow based on an improved 82 ARIMA model. These types of time series models are simple 83 in modeling, but they have high requirements for continuity 84 of data and it is difficult to deal with complex prediction 85 problems of multi-dimensional inputs. 86 Nonlinear theoretical models include nonparametric 87 regression [16] and chaos theory [17]. The k-nearest neigh-88 bor (KNN) algorithm is a typical non-parametric regression 89 model. On the one hand, its algorithm is simple and easy to 90 understand, on the other hand, it relies heavily on training 91 data and has poor fault tolerance to training data. Disbro 92 and Frame [18] introduced the chaos theory into the field 93 of transportation for the first time. Wang and Shi [19] built 94 a nonlinear chaos prediction model based on phase-space 95 reconstruction theory to predict urban road traffic flow. The 96 chaos theory model is based on measured data to obtain the 97 chaotic characteristic parameters of the system, which avoids 98 the influence of subjective factors and has a high prediction 99 accuracy, but it is only suitable for short-term traffic flow 100 prediction. 101 The second category is artificial intelligence (AI) technol-102 ogy based on neural networks. In recent years, with the rapid 103 development of artificial intelligence, deep learning theory 104 and neural network models have provided additional model-105 ing ideas for the study of traffic flow prediction. Wu et al. [20] 106 applied SVR to travel time prediction and compared it with 107 historical mean and other methods, and the results showed 108 that the model could significantly reduce the prediction error. 109 Su et al. [21] proposed a short-term traffic flow predic-110 tion method based on incremental support vector regression 111 (ISVR), and the results showed that the prediction accuracy 112 of this method was better than that of BP neural network 113 model. Luo et al. [22] used the least square support vector 114 machine method to predict the traffic flow, and adopted the 115 fusion optimization algorithm to select the optimal parame-116 ters, which improved the prediction ability and calculation 117 efficiency of the model. Although these traditional neural 118 networks can better learn the characteristics of traffic flow 119 and predict future traffic flow according to the temporal and 120 spatial variation characteristics of traffic flow, most of them 121 use single hidden layer networks, which cannot learn the 122 deeper variation characteristics of traffic flow data, and the 123 prediction accuracy is often lower than that of deep network 124 prediction methods.

125
Deep learning has a strong learning ability for time series 126 and can deal better with spatially or temporally related data 127 structures [23]. The depth of a recurrent neural network 128 (RNN) is not only reflected in the fact that it has multiple 129 hidden layer structures but also has the function of time mem-130 ory. RNN can be used for the recognition of text, speech, and 131 other data sequences, and can be better applied to the relevant 132 prediction field of time series data [24]. Serious gradient van-133 ishing and gradient explosion problems exist in RNN. Subse-134 quently, Hochreiter and Schmidhuber [25] designed an long 135 short-term memory (LSTM) unit to overcome this defect, 136 which enabled the recurrent neural network represented by 137 LSTM to be applied on a large scale in the field of time-series 138 prediction. Ma et al. [26] used the LSTM structure to estab-139 lish a traffic speed prediction model, and used the microwave 140 traffic speed data of Beijing for verification. The experimen-141 tal results showed that the network effectively captured the 142 correlation and nonlinearity of the traffic state time, and the 143 prediction accuracy was better than that of most statistical 144 methods.

145
In conclusion, the deep learning model has a better pre-146 diction effect than the traditional traffic prediction methods. 147 Therefore, in the face of a large amount of diversified traf-148 fic data, selecting the appropriate model or combining mod-149 els with different structures to realize the complementary 150 96874 VOLUME 10, 2022 advantages of the model, extracting significant traffic features, and setting appropriate parameters to further improve 152 the accuracy of expressway short-term prediction is the devel-153 opment direction of deep learning in the field of expressway 154 short-term travel time prediction in the future. The main con-155 tributions of this study are as follows.

156
(1) A short-term travel-time prediction model based on  ing that EMD can achieve improved prediction accuracy. 204 Du et al. [30] proposed a prediction model based on empirical 205 mode decomposition (EMD) and gated recurrent unit (GRU) 206 neural network for a more comprehensive characterization of 207 network traffic, by EMD to the traffic data is decomposed 208 into multiple components, and each component is used to 209 train the corresponding GRU neural network, and finally, the 210 predicted values of all components are combined to obtain 211 the final result. Although the EMD algorithm can effectively 212 cope with nonlinear sequences, the mode aliasing occurs dur-213 ing the decomposition process, which is also a limitation of 214 the EMD algorithm. EEMD [31] solved the problem of mode 215 mixing by additional white noise. Tang et al. [32] compared 216 five denoising schemes and proposed that EEMD is superior 217 to other algorithms. Liu et al. [33] used the EEMD algorithm 218 to decompose the time series and extracted the basic feature 219 subset of each component using the minimum redundancy 220 maximum association feature selection algorithm, and then 221 used deep belief network to each component is trained, and 222 finally the prediction results are aggregated into the out-223 put of the integrated model, and the results show that the 224 method has significant performance improvement compared 225 with a single deep belief network and other selected meth-226 ods. Based on this, Torres et al. [34] proposed the Com-227 plete Ensemble Empirical Mode Decomposition with Adap-228 tive Noise(CEEMDAN) algorithm [28] by adding adaptive 229 white noise to each decomposition in order to improve the 230 completeness of EEMD and reduce reconstruction errors. 231 The basic principle is to adaptively add white noise during The steps of using the CEEMDAN algorithm to decompose 297 the travel time series are as follows.

298
Step 1: Add a series of adaptive white noise to the resam-299 pling travel time series:  Step 2: Combined with EMD algorithm, the travel time Step 3: The first margin sequence can be obtained by 312 removing d 1 (t) from T (t): where, the r 1 (t) represents the first margin sequence.

315
Step 4: Continue to decompose r 1 (t) + ϕ 0 E 1 θ i (t) and 316 obtain the second travel time series component: where, E j (·) is the j-th IMF component obtained by EMD 319 decomposition, and the d2(t) represents the second travel 320 time series component.

321
Step 5: Calculate the remaining IMF components: where, K is the total number of modes, r k (t) represents the 325 k-th margin sequence, d k+1 (t) represents the (k +1)-th travel 326 time series component, ϕ k is the noise coefficient.

327
Step 6: When the travel time margin cannot be further 328 decomposed. The margin was calculated as follows: where, R(t) represents the margin sequence.

332
SampEn can measure the complexity of a time series well. Its 333 calculation does not depend on the length of the travel-time 334 subsequence and has excellent consistency. The smaller the 335 sample entropy, the higher is the self-similarity of the time 336 series; otherwise, the greater is the nonlinearity of the travel 337 time series. Assuming that the travel time sequence {x (n)} = 338 x (1) , x (2) , · · · , x (N ) consists of N sample time points, the 339 calculation steps of the entropy are as follows.

340
Step 1: Convert the travel time series into a vector 341 series with dimension m, X m (1) , · · · X m (N − m + 1), and 342 These vectors represent m consecutive values 344 from the ith time-point.

345
Step 2: d X m (i) , X m (j) is the absolute value of the max-346 imum difference in the corresponding element X m (i) and 347 X m (j), which is calculated as follows: 349 (8) 350 Step 3: For a given X m (i), count the number of distances 351 between X m (i) and X m (j) that is less than or equal to r, and 352 record it as B i . For 1 ≤ i ≤ N-m, define: Step 4: Defined B (m) (r) as: Step 5: Increase the dimension to m+1, calculate the num-357 ber with distance between X m+1 (i) and X m+1 (i) less than or 358 equal to r, and record it as A i . Defined A m i (r) as: Step 6: Defined A m i (r) as: The calculation of each LSTM layer can be explained using 389 the following formula: 394 395 where, x (t) is the input signal, σ (·) is the activation func-  where N is the number of data points in the travel time series. 429 The threshold value of sample entropy is set to divide 430 the frequency of each sub sequence. According to the 431 VOLUME 10, 2022 randomness from high to low, it is divided into high-

472
In the process of gantry data acquisition and transmission, 473 owing to problems with the system itself or other factors, 474 data errors can easily occur. In the process of extracting 475 travel time from gantry data, firstly through simple screening 476 rules to eliminate obviously wrong data, such as data record 477 repeat, travel time is negative. The following rules are mainly 478 included.

479
Rule 1, excludes data with a model data of 0.  Step 2: Calculate the mean µ and standard deviation σ of 528 the travel time samples in each statistical time window.

529
Step 3: Judge whether there are any data outside the range Step4;

533
Step 4: Retain the travel time sample after final screening. of Yalu expressway as an example, according to the above 538 travel time data cleaning method, the scatter diagram of travel 539 time distribution after eliminating obvious wrong data can 540 be obtained, as shown in Figure 5 (b). It can be seen from 541 the figure that although the travel time shows a certain trend 542 after the initial cleaning of the data with an obvious wrong 543 travel time, there are still many noise points; that is, there are 544 still some data with large travel times in the flat peak. If an 545 appropriate outlier elimination method is not adopted, it is 546 VOLUME 10, 2022 is also retained to a great extent. Therefore, the 2SD-CE 568 algorithm was applied to address the outliers of travel 569 time.

570
The cleaned data were resampled to a time granularity of 571 5 min, and there were few missing data points. The historical 572 average method was used to fill in missing values. Because 573 the travel time series generally fluctuates significantly, the 574 Min Max Scaler in Scikit-learn is used to normalize the data 575 within the range of [ Historical travel time series based on expressway gantry data 579 can predict future travel time after preprocessing. However, 580 to ensure that the prediction results meet the needs of express-581 way intelligent control and guidance, it is often necessary to 582 measure whether the model meets certain requirements, such 583 as accuracy, efficiency, and portability. Therefore, to mea-584 sure the performance of different models, the root mean 585 square error (RMSE), mean absolute error (MAE), and mean 586 absolute percentage error (MAPE) were used as evaluation 587 indices. The smaller the values of RMSE, MAE, and MAPE, 588 the better is the prediction performance of the model. This 589 is specified as the observed value y i and the predicted value 590 y i , i ∈ (1, . . . , n). The evaluation indices are defined as 591 follows: The experiment was implemented in Python IDE PyCharm, 597 and the main software environments used were Keras Ver-598 sion 2.4.3 and TensorFlow Version 2.3.0. The hardware 599 environment was a Lenovo Y7000 personal computer (Intel 600 Core i7-9750H CPU@2.60GHz processor, NVIDIA GeForce 601 GTX 1660ti graphics card, 16GB RAM, 512 GB hard disk). 602 When setting the model training parameters, we set the ratio 603 of the training and test sets of the total samples to 9:1.

604
The grid search method of internal nested cross-validation 605 was used to optimize the network parameters. The steps of 606 this method are as follows: first, the pre-selected values of 607 various parameters such as the number of neurons in the 608 hidden layer, learning rate and batch size are listed in the dic-609 tionary, and the ''grid'' is generated by the exhaustive method 610 through the arrangement and combination of various param-611 eters, and then put into the model in batches for prediction 612 performance evaluation, In each batch, the cross validation 613 method is used to fully evaluate the model performance of a 614 single group of parameters, and finally the optimal parameter 615 combination is selected through comparison.  Table 3. It can be seen that  To evaluate the prediction performance and stability of 656 CM1, two combined models of convolutional neural network 657 (CNN) and LSTM were selected: CNN-LSTM and Convo-658 lutional LSTM (ConvLSTM). The specific parameters of the 659 various control models are set as follows:

660
(1) LSTM: The number of neurons was set to 48, the learn-661 ing rate was 0.001, the batch_ size was set to 64, and 662 the number of iterations was 30.

663
(2) CNN-LSTM: Uses the convolution part of the 664 model to process the data and inputs the processed 665 one-dimensional array into the LSTM model. In the 666 parameter setting, the number of filters was 64, and the 667 size of the convolution kernel was 1 × 2. The activation 668 function is a linear rectified linear unit (ReLU).    Table 4. It can be seen that under dif-  Therefore, compared with the combined prediction mode 726 of CNN and LSTM, the combined prediction mode of 727 step-by-step prediction using the time-series decomposition 728 algorithm has a higher prediction accuracy. In addition, the 729 advantage of CM1 over CM2 is that it uses the BiLSTM 730 model to predict high-frequency travel time subsequences. 731 Compared with the LSTM model, the BiLSTM model can 732 better predict high-frequency travel time subsequences.

E. PREDICTION EFFECT UNDER DIFFERENT EXPRESSWAY 734
To better analyze the applicability of the model to different 735 expressway datasets, the travel time on September 30 was 736 selected as the prediction object, the time granularity was 737 set to 5 min, and the traffic volume of each section of 738 the Yalu expressway, Chengnan Expressway, and Chengya 739 expressway were counted. According to the statistical results, 740 sections with large and small traffic flows on the Yalu 741 expressway were determined. The CM1 model was used to 742  sections of traffic flow. The prediction effect of each model 768 in the section with a small traffic flow was significantly bet-769 ter than that in the section with a large traffic flow. This is 770 because the change in vehicle speed was more frequent in 771 the section with a large traffic flow. Sometimes, it is diffi-772 cult to capture the operation situation of slow driving during 773 peak hours, resulting in differences in travel time prediction 774 results.

776
Expressway travel time data series have significant nonlin-777 ear and nonstationary characteristics, and it is difficult for 778 a single prediction model to meet the increasing demand 779 for prediction accuracy. This study attempts to combine the 780 CEEMDAN algorithm with an RNN to build a prediction 781 model, which is verified by different time granularities and 782 expressways. The main conclusions are as follows. (1) The 783 CM1 combined prediction model proposed in this paper has 784 high accuracy for travel time prediction of different time 785 granularities and expressways, and the prediction model has 786 certain generalization and robustness. (2) Under the time 787 granularity of 5 min, 10 min, and 15 min, the prediction 788 performances of CM1, CM2, and CM3 were better than those 789 of LSTM, CNN-LSTM, and ConvLSTM, which indicate that 790 the combined prediction method was better than the single 791 model.