Copula-Based Travel Time Distribution Estimation Considering Channelization Section Spillover

Travel time is inherently uncertain in urban networks due to volatile traffic flows, signal controls, bus stops and disturbances from pedestrians. An effective way to characterize such uncertainty is by estimating Travel Time Distribution (TTD). However, conventional TTD models are lack of considering the interactions between turning movements within the link. Whereas the phenomenon of the Channelization Section Spillover (CSS) is very common, leading to strong interactions between turning movements. In this study, by incorporating the correlation of turning movements in the TTD model, that is, considering CSS, copula-based link-level and path-level TTD models are built. First, based on the empirical data, the correlations of turning movements are analyzed, and then the applicability of the various copula models are examined. The marginal distribution of each turning movement is described using parametric and non-parametric regression analysis, respectively. Then, the best-fitting copula is determined based on correlation parameters and the goodness-of-fit tests. As a case study, the chosen model is applied in estimating link-level and path-level TTD in an arterial road in Hangzhou, China, and compared with the model that did not consider the CSS before. Both results indicate that the copula-based approach can precisely capture the positively correlated relationship between turning movements during peak hours. Furthermore, higher TTD estimation accuracy demonstrates the significance to consider CSS, particularly in peak hours.


I. INTRODUCTION
Travel Time Distribution (TTD) is the fundamental component of the reliability and variability of travel time estimation, which is essential for route guidance [1] and advanced traveler information system in urban traffic [2]. TTD is also useful to do travel time prediction on highways [3], evaluate traffic control policies [4] and to provide detailed model support for simulation software. What's more, it is TTD that enables detecting congestions and incidents and accessing the level of intersection service, which is crucial to traffic operators and planners [5].
TTD estimation is challenging because of the complexity and interrupted nature of urban traffic flows [6], usually caused by signal control, bus stops, roadside parking, disturbances from pedestrians and vehicle interactions. Under such complicated interactions, arterial travel times often exhibit distinct shapes of distributions. Thus, most studies The associate editor coordinating the review of this manuscript and approving it for publication was Dalin Zhang. have focused on the estimation of the average travel time.
No study of the TTD model has been found to consider the correlation of turning movements within the link. Moreover, rarely work has been done to quantify the variability of either link-level or path-level TTD and explore the interdependence between link TTDs along arterials. Some related work about link and path TTD is provided below.
Link-level TTD can capture the nature of interrupted flow under traffic signal control and has wide applications in practice. Initially, based on Automatic Number Plate Recognition (ANPR) data, link TTD can be directly obtained and estimated by assuming unimodal distributions, like Weibull, exponential, lognormal [7], or normal distributions. For instance, Hunter proposed a probabilistic model and employed an EM algorithm to simultaneously learn the likely paths and the parameters of lognormal link travel time distributions using the travel time observations [8]. Later, multimodal using GPS data were presented. Ji and Zhang used high-resolution bus probe data to estimate link travel times and revealed bimodal TTD at the segment level.
Then, to evaluate such bimodal TTD, they took the hierarchical Bayesian mixture model into consideration [5]. Ma et al. modeled the link-level bus travel time reliability of different types of roads and identified the causes of service unreliability [9]. They further used the Gaussian Mixture Model (GMM) to fit the link-level TTD and proved the superiority of GMM in supporting service reliability analysis [10]. Using the sources of floating car data, Yun and Qin developed a framework for determining the minimum sample size of floating cars to estimate TTD in situations when link travel times may follow multistate distributions [11]. Hans and Chiabaut proposed an aggregated diagram that provided a direct assessment of vehicle travel times with respect to their departure and traffic flow [12]. Another aspect is focusing on delay distribution at signalized intersections. Zheng and Van investigated urban link-level TTD and gained delay distribution to quantify the delay uncertainty [13].
As for path-level TTD analysis, one common approach is using statistical models to fit real travel time observations. Increasing researchers tend to use multimodal or mixture models to characterize path-level TTD [14]. However, in most studies, the assumption of the independence of individual link-level TTD is made [15], which appears not appropriate any more. Geroliminis and Skabardonis began the path-level travel time variance calculation under the assumption of linear correlations between consecutive link travel times [16]. To extend the work, they used Markov chains to estimate path-level TTD through the integration of correlated link TTDs. They got the sequence of pair-wise Markov matrixes which was calculated by assuming the transitions between different link pairs are conditionally independent [17]. Chen et al. introduced a copula-based approach to model arterial TTD by considering spatial link correlations [18]. Considering the correlation of both time and space, Ma et al. further developed a heuristic clustering method based on the link TTD and traffic conditions to obtain trip TTDs [19]. Recently, Zhang et al. developed a deep learning method to estimate trip TTDs in an urban road network. And by modeling the joint distribution of travel times of link pairs with the consideration of spatiotemporal correlations, a DCWD algorithm was proposed which could successfully characterize the traffic state transitions. The DCWD algorithm improved the trip TTD model accuracy [20]. In addition to the TTD models, other traffic variables such as travel demand in the simulation of the transportation system are also proved to be dependable. Ng et al. proposed a correlation-based approach that was able to represent a wider range of dependence structures than the multivariate normal distribution [21]. Zeng et al. elaborated a reliable pathfinding method considering the correlation between links and verified it in a real-world network in Tokyo, Japan. The results proved that it can characterize the link and path TTD validly [22].
In summary, many efforts have been made in characterizing link and path TTD. However, less research has paid attention to the interaction between turning movements within the link, which is nonnegligible. The most common phenomenon is Channelization Section Spillover (CSS), as illustrated in Fig. 1. Urban arterials customarily incorporate channelization design, where three turning movements are mixed in the upstream section and separated in the channelization section. It is evident that three turning movements have a strong nonlinear dependence. As illustrated in Fig. 1, during peak hours, the left-turn vehicles overflow the channelization section and hinder the path of through vehicles. It is necessary to take this interplay into account in travel time modeling.
Note that, vehicles' turning movements data could be obtained by roadside detectors, it is capable of studying the correlation among turning movements within the link. Even so, the dependence structure has not been explicitly addressed so far. To overcome it, we introduce the copula model to model link-level and path-level TTD. Compared to previous path-level TTD study, one distinct improvement of our work is firstly considering both the CSS within the link and the correlation among links. Meanwhile, as a feature of the copula model, the dependence structure is unaffected by the types of marginal distributions, which helps us get greater flexibility in correlating individual movement TTDs. It is proved that these efforts help provide more accurate model of link-level and path-level TTD estimation on urban arterials. The inputs to the proposed copula model are travel times of different turning movements, and the output is TTD over the study periods.
This paper is organized as follow: First, the concept and comparison of various types of copulas are present. Then, the method of modeling TTD is elaborated, including estimating the marginal distribution of travel times and identifying the best-fitting copula. In the following section four, we conduct two case studies at an arterial road in Hangzhou, China, to illustrate the application of the proposed method. In Case one, we statistically examine the correlations between left-turn and through travel time and then estimate link-level TTD by copula models. In Case two, we model the path-level TTD by taking into account the CSS in the downstream and the correlation among upstream and downstream. Next, the results are compared with empirical bimodal distributions and previous research that has not considered CSS [18]. In the last section, we draw conclusions and provide recommendations for future work.

II. BASIC PROPERTIES OF COPULA
To incorporate the impact of CSS in estimating TTD, we introduced the copula approach, which is widely used in modeling multivariate distributions among multiple random variables with preprocessed marginal distribution [23]. The copula is a multivariate joint distribution function, which provides a flexible way of describing the correlations among several random variables in isolation from their marginal distributions. Besides, different types of marginal distributions are allowed. In specific, by selecting the appropriate marginal distribution from the parametric families of univariate distributions and picking the best-fit copula, we can accurately characterize TTD in a copula-based model with the CSS and the correlations of links considered. Variables in the copula-based model stand for the travel times of different turning movements, meanwhile, parameters in the model can reflect the correlations of links and influence of CSS. The fundamental theorem on copula is given below.
A. CONCEPT OF COPULA Sklar firstly proposed the term copula, representing to connect or to join [23]. The primary purpose of copulas is to describe the integration of several random variables. In the last decades, copulas have been applied widely in vast fields, such as quantitative finance, economics, hydrology, etc. Recently, the function was introduced into the traffic field. Bhat and Eluru initially proposed a copula-based approach in modeling the built environment effects on travel behavior [24]. Wan and Kornhauser modeled the dependence structure of a sequence of link travel time distribution by a lagged Gaussian copula to get an approximation of path travel time distribution [25]. Lately, Chen applied a copula-based approach to characterize the travel time reliability of urban arterial [26].
The copula is a multivariate joint distribution function where each variable has a uniform marginal probability distribution. Sklar's theorem states that any n-variate joint distribution F (x 1 , x 2 , . . . , x n ) can be written in terms of univariate marginal distribution functions F 1 (x 1 ) , F 2 (x 2 ) , . . . ,F n (x n ) and a certain copula function C. The relationship between the variables are described as Explicitly, Sklar's theorem can be expressed as the corresponding joint probability density function (PDF) f (·) as: is the copula density function, which can be represented as: where C(·) is the copula function, if marginal distributions are all continuous, C(·) is unique.
Gaussian copula is the most frequently applied copula function, which is an elliptical normal distribution copula with Pearson's correlation parameter [26]. The 2-dimensional Gaussian copula can be represented as: where is the marginal distribution of X 2 ; u 1 and u 2 can be any arbitrary marginal CDF, either parametric or non-parametric, which is the most distinct difference from other joint normal CDF; and u 1 , The copula density can be denoted by: In terms of its copula density, the bivariate Gaussian copula can be expressed as: To characterize fat tails of variables, the copula of the bivariate t-distribution is universally used in the financial field, called t-copula. The joint CDF of t-copula is: The density function of t-copula is therefore as: where u 1 , u 2 ∈ [0, 1]; t −1 v denotes the inverse CDF of the standard t-distribution with the degree of freedom v; t v,θ is the bivariate t-distribution CDF with correlation parameter θ and degree of freedom v; correlation parameter θ and degree of freedom v are usually estimated based on actual data by using the maximum likelihood method; f v is the standard t-distribtuion PDF; f v,θ is the bivariate t-distribution PDF with correlation parameter θ.

2) ARCHIMEDEAN COPULA
Archimedean family of copulas is also widely used, and is constructed in a completely different way from elliptical copula by using a generator function ϕ. And the Archimedean copula is uniquely determined by the generator function ϕ.
The universal form of the bivariate Archimedean copula is described as following [27]: where u 1 , u 2 ∈ [0, 1]; the function ϕ is a generator of the Archimedean copula, which is a continuous and strictly decreasing function.
The function ϕ −1 is the pseudoinverse of ϕ, also continuous and strictly decreasing.

a: CLAYTON COPULA
The generator function of Clayton copula is . Clayton copula is represented as: The density function of Clayton copula is calculated as: where u 1 , u 2 ∈ [0, 1]; when parameter θ → ∞, variables u 1 , u 2 tend to be independent.

b: GUMBEL COPULA
The generator function of Gumbel copula is The function of Gumbel copula is: The density function of Gumbel copula is denoted as: where u 1 , u 2 ∈ [0, 1]. When parameter θ → ∞, variables u 1 , u 2 tend to fully dependent.

c: FRANK COPULA
The generator function of Frank copula is ϕ (t) = −ln e −θt −1 e −θ −1 , with θ = 0. Frank copula is defined as: The density function of Frank copula is calculated by [28]: where To choose the best copula function, we compare the characteristics of different copulas, as shown in Table 1.
Considering the relationship between travel time of turning movements within the link and combining goodness-of-fit parameters, the best-fitted copula can be selected. Due to space limitations, only the binary form of copulas is illustrated here, in which Gaussian and t copula can be deduced to multivariate copulas.

III. MODELING LINK AND PATH TTD
Taking CSS into consideration, the PDF of travel time is not merely a sum of turning movements travel time, but the joint PDF of them which can be deduced based on previously proposed copula method.

A. ESTIMATION OF THE MARGINAL DISTRIBUTION
Due to the heterogeneous traffic condition in the urban network, travel time may follow the multimodal distribution. Therefore, the Gaussian Mixture Model (GMM) is proposed to fit turning movements' TTD as a popular parametric estimator.
The finite GMM with k components can be described as: p (y | u 1 , u 2 , . . . ,u k ;s 1 , s 2 , . . . ,s k ;π 1 , π 2 , . . . ,π k ) where u j are the means of component j; s j are the inverse variances of component j; π j are the mixing proportion; N is a Gaussian distribution with specified mean and variance. VOLUME 8, 2020 Kernel smoothing estimator is as well applied to describe TTD of turning movements. The density at x using kernel smoothing estimator is provided by: where K is the kernel function; h is the kernel smoothing estimator's bandwidth; X i is the observed data. A trade-off exists for the selection of h considering the bias of the estimation and its variance. The bandwidth h is chosen according to the optimality rules [29].
To select the marginal distribution, we assume that turning movements and upstream travel time conforms to a specific distribution. Two statistic tests, Kolmogorov-Smirnov (K-S) test and Crameìr-von Mises (CvM ) test, and Hellinger Distance (HD) are applied to access the goodness-of-fit of GMM and kernel smoothing estimator.
Here the K-S test is employed to access whether the converted distribution conforms to the uniform distribution. K-S test provides a p ks value. A larger p ks means a better performance in the probability integral transform. At the same time, the CvM is used to measure the similarity between the estimated CDF and the empirical CDF by providing a p CvM value. A larger p CvM means a better estimation of empirical CDF. To confirm the fitting performance of the non-parametric approach, the Hellinger Distance (HD) is applied to evaluate the similarity between the estimated distribution and the empirical one [30]. And a smaller HD implies a close distribution.

B. MEASURING VARIABLE DEPENDENCE
Plenty of approaches can be used to measure the dependence between variables. For two-dimension variables, a general way is Pearson's linear correlation: where cov (X , Y ) is the covariance of variables X and Y ; the standard deviation of X and Y are σ 2 (X ) and σ 2 (Y ), respectively. Another commonly used dependence measurement is Kendall's tau, which is defined as the probability of concordance minus the probability of discordance, calculated by: where X ,Ỹ is an independent copy of (X , Y ). A positive τ value means the simultaneous movement of both variable ranks, whereas a negative value indicates when one variable's rank went down, the other went up.
Besides the Pearson's linear correlation and Kendall's tau, another traditional correlation coefficient is the Spearman's rho, which is also rank-based like Kendall's tau.
In the comparison of these three dependence measurements, Kendall believed the confidence intervals of spearman's rho are unreliable compared to Kendall's tau, and the absolute value of Pearson's linear correlation coefficient is generally overestimated [31]. Holding a similar view, Embrechts pointed out that compared with the rank-based correlation, the linear correlation has the deficiency that it is not invariant under nonlinear strictly increasing transformations. And as for multivariate distributions that possess a simple closed-form copula, the moment-based correlations (Pearson's correlation coefficient) may be difficult to calculate and the determination of rank-based correlation (Kendall's tau and Spearman's rho) may be easier [30]. In the following section, all measurements are applied to characterize the dependence of turning movements, but Kendall's tau is used as the main indicator in the estimation of parameters in copula.

C. ESTIMATION OF θ IN COPULA
After estimating the marginal distributions, the observed data is converted onto the copula scale by the probability integral transform. The next step is to estimate θ in copula. Spearman's rho and Kendall's tau can be used to estimating θ. Here, we take Gaussian copula and Kendall's tau as an example.
If the dependence structure of a random pair (X , Y ) was appropriately modeled by the Gaussian copula, the estimated parameter θ can be expressed as τ = (2/π) sin −1 (θ), where τ ∈ [−1, 1]. According to the equation, once the dependence estimators are known, the correlation parameter of copula θ can be estimated.

D. IDENTIFICATION OF THE BEST-FITTING COPULA
CvM test, K-S test, Log-likelihood, RMSE and Akaike Information Criterion (AIC) values are considered as the goodness-of-fit measures for comparing alternative copulas. The CvM statistic and K-S test are based on the empirical process comparing the empirical copula with a parametric estimate of the copula derived under the null hypothesis. CvM test has been proved to be more potent than the K-S test to a little deviation from the hypothesis distribution [32]. The models with higher p ks and p CvM values are better in terms of not being rejected. Setting the confidence interval in 0.05, larger Log-likelihood with lower AIC and RMSE values indicate a better-fitting copula.

E. MODELING LINK TTD
After estimating the copula model, the link travel time distribution can be expressed as: where F λt 0 x represents the probability of link travel time below λt 0 x ; p (·) is the density of the estimated copula where

F. MODELING PATH TTD
The path travel time distribution can be expressed as: where F z λt 0 x represents the probability of path travel time below λt 0 x ; T u1 is the travel time of left-turn vehicles; T u2 is the travel time of through vehicles; T u3 is the travel time of right-turn vehicles; T v is the travel time of upstream vehicles; p T u1 + T v ≤ λt 0 x is the cumulative density of the estimated copula where T u1 + T v ≤ λt 0 x .

IV. CASE STUDY
The proposed methodology was evaluated on an arterial in Hangzhou, China. Following presented is the study area description and data processing. Based on the investigation, the copula was chosen, and the related model was constructed for TTD estimation. Then, the link-level TTD model was compared with the empirical distribution fitting approach and the result reflected that the copula model performed better than the GMM. Next, the path-level TTD model considering CSS was compared with the copula model ignoring CSS, and the former performed better.

A. STUDY AREA DESCRIPTION
The study site is located in Nanxiu Road, Hangzhou, Zhejiang Province. As shown in Fig. 2(a As a main commuter arterial of the city, there is extensive traffic demand through the day in Nanxiu Road. The signal system's cycle time length in the intersection of link 1 ranges from 60 s to over 120 s; link 2 ranges from 100 s to over 170 s. The release of right-turn vehicles in link 1 is not limited. The travel time data were collected by Automatic Number Plate Recognition (ANPR) systems. The cameras are located at the entrance of each roadway in the intersections, as illustrated in Fig. 2(b). Based on image processing, ANPR cameras can catch the license plates of passing vehicles, then records the vehicles' license plates, time stamps, turning movements and vehicle types. According to the data collected by upstream and downstream ANPR cameras, actual travel times through the links and intersections can be obtained by matching identified vehicle license plates and comparing their time stamps.
Taking the features of urban roads into consideration, we can observe many outliers of travel time samples, which make data noisy. A major part of that noise was attributed to vehicles that, after passing the upstream camera, stop before continuing their trip (loading-unloading, parking, shopping, etc.). In such occasion, the vehicles were detected by the downstream ANPR cameras significantly later, and this phenomenon caused travel times to be notably higher than usual. In the same context, buses stopping as part of their regular routes produce another source of the noise. According to the field investigation, we proposed a simple filtering method for the data, which confirmed that during peak hours, most of the delayed vehicles had maximum queuing time less than three red phases. Thus, 400 s, approximately the time of three cycle lengths and 90th percentile travel time of all the vehicles, was set as the upper bound value for outlier elimination in link 1. Ten seconds was set as the lower bound value to avoid repeated or error detection. After a robust filtering method, the data of weekdays in Dec 2018, altogether the ground truth travel times of 4954 vehicles, were obtained and would be utilized for the copula modeling. The collected dataset is divided into the training set (70%) and the testing set (30%).
To demonstrate the need of considering CSS and the superiority of copula modeling, two trials was conducted within the study area. One tested the copula modeling performance of link-level TTD by considering the CSS within link 1. Since the release of right-turn vehicles are not limited, only left-turn and through vehicles were taken into consideration in case one. The other case modeled the path-level TTD, considering both the CSS within link 2 and correlation between link 1 and link 2.  the traffic conditions. Thus, we firstly evaluated the hourly dependence between left-turning and through travel times in link 1, which was based on ANPR data at a vehicle-tovehicle level on Dec. 4, 2018. For every hour, Pearson's linear correlation coefficient ρ P , Kendall's tau τ , and Spearman's rho ρ S were calculated. The results are presented in Table 2.
As shown in Table 2 and Fig. 3, the tendency of three correlation coefficients is similar. The values of Spearman's rho ρ S are mostly higher than those of Kendall's tau τ , which verifies that Pearson's linear correlation coefficient ρ P in absolute value is generally overestimated, as mentioned in Section III. Fig. 3 also illustrates that the dependence structure varies significantly during different times of the day. Generally, a strong correlation exists in peak hours, specifically in afternoon peak hours.
The line 'count' in Table 2 shows the vehicle counts through the left-turn lane in Nanxiu Road, collected at 60-min intervals. Notably, the number of samples during midnight or early morning is not enough, leading correlation coefficients to be abnormally high or equal to zero. Furthermore, the hourly dependence curve depicted in Fig. 3 shows that the afternoon peak starts from 16:00 to 20:00. Afternoon peaks are not clear in time compared with morning peaks. The possible reason for this phenomenon is that commercial activities prolong peak periods from early afternoon to late evening.
In essence, the level of correlation between travel time in turning movements may get influenced by numerous factors, such as traffic volume and signal settings. To accurately test the goodness of the copula model, the same period in days was chosen, which reflected a stable dependence structure. Particularly, in morning peak hours (07:00-09:00), correlation coefficients are a bit higher than the off-peak hour, which are suitable to be characterized. In the research period, the signal cycle is 120 s.
Given the numbers of samples and features of dependence structure, the data of weekdays' morning peak period were chosen to construct the copula model. The dependence estimation results are: Kendall's tau τ is 0.21, Pearson's rho ρ P is 0.35, Spearman's rho ρ S is 0.28. Adequately 1,109 samples altogether. In the following sections, the parameters of alternative copulas can be determined by dependence parameters.

2) INVESTIGATION OF MARGINAL TTD
The estimation of each turning movement's marginal distribution served as a foundation for the copula approach. Based on kernel smoothing estimation, assisted with the interpretation of density curves depicted in Fig. 5, the travel time of turning movements appears to follow a multimodal distribution. In order to choose a rational number of components in GMM, AIC and BIC values for different components were calculated. As shown in Fig. 4, GMM with 3 components has the minimum AIC value and a relatively small BIC value. Therefore, we chose GMM3 to fit both left-turn and through travel time. Red curves in Fig. 5 are the fitting results.

3) OPTIMAL COPULA MODEL SELECTION
As aforementioned, Kendall's tauτ for two turning movement's travel times is 0.21, indicating that the dependence between two turning movements is both positive and weak. According to the characteristics of copulas in Table 1, all functions can be applied in this case. The parameters θ of   different copulas was calculated based on Kendall's tau τ , as shown in Table 3. The statistics in Table 3 can reflect the interactions between left-turning and through movements. By comparing the θ value in Table 3, it can be found that the parameters of the copula model can mathematically capture correlations of travel times within the link. During peak hours, θ values are generally higher, especially in the Frank copula and Clayton copula. In the meantime, the correlation between left-turning and through travel times are simultaneously high as illustrated in Table 2. Particularly, the degree of freedom v in T copula is estimated using the maximum likelihood method, in this case v = 2.47 in peak hours, v = 8.96 in off-peak hours.
Afterward, to determine the best-fitting copula, the goodness-of-fit statistics of different copulas is calculated and presented in Table 4. The favorable copula is supposed to meet requirements: the biggest p ks value, the biggest p cvm value in the case of smallest CvM statistics; a larger Loglikelihood value and smaller AIC value. Overall, Frank copula appeared as the best.
After that, we drew the surface of estimated copula density of Frank copula and frequency histogram of empirical data, as presented in Fig. 6. Note the existence of extremely high values on the upper side of the frequency histogram near the node (1, 1). Accordingly, the estimated Frank copula density   confirms the presence of upper tail dependence, implying that the slow-speed flow occurs simultaneously and frequently on left-turn and through lanes during peak hours, which needs additional attention, implying the possibility of CSS. In the field investigation, we did find the spillover of the through vehicles in the channelization section during peak hours, which hindered the path of left-turn vehicles, as shown in the image in Fig. 7.
As mentioned in Table 1, Frank copula has symmetrical characteristics, indicating that after the probability integral transform, the dependence structure of travel times in leftturn and through lanes tend to be symmetrical. The left and right tail dependences near (0, 1) and (1, 0) in frequency histogram are remarkable, but are not accurately captured in the estimated copula density. This phenomenon indicates that Frank copula is not ideal for identifying the right tail and left VOLUME 8, 2020 tail dependences. More types of copulas are supposed to be tested in future work.

4) ESTIMATION OF LINK TTD
Unimodal distributions, e.g., Normal, Lognormal, Gamma and Weibull distributions, were first used to characterize TTD. Compared with unimodal distributions, GMM has also become of interest. In order to figure out the performance of these distributions, the goodness-of-fit values were calculated, and the results were listed in Table 5. The numerical values show that GMM3's fitting results are generally better than unimodal distributions and the GMM2. Thus GMM3 was chosen as a representative of multimodal distributions to compare with the copula model. Fig. 8 represents the fitting results. The goodness-of-fit measurements of Frank copula and GMM3 are also presented in Table 5. The statistics of higher p ks and p CvM and lower Cvm and HD indicate that Frank copula performs superior on the GMM3. Besides, the comparison of GMM2 and GMM3 above proves that the copula model is more flexible than GMM, due to the no need of any assumptions for input variables. And the copula model avoids the instability of the GMM which is caused by random initialization of the parameters.

C. APPLICATION OF THE COPULA MODEL IN PATH-LEVEL TTD
The CSS appearing within the link also has a great influence on the path-level TTD model. By considering both the CSS and the correlation between links, we constructed a path-level TTD model and compared with the existing research [26], which only considered the correlation of link travel time. Based on the above, the study period was set at the early peak and afternoon off-peak in weekdays.

1) DEPENDENCE BETWEEN LEFT-TURN, THROUGH, RIGHT-TURN MOVEMENTS, AND LINKS
For peak hours and off-peak hours, Pearson's linear correlation coefficient ρ P , Kendall's tau τ , and Spearman's rho ρ S were calculated. The results are presented in Table 6. The travel time correlation within the link is relatively high, particularly in peak hour, whereas the correlation between the upstream and downstream is weak. The strong correlation within the link in peak hours indicates the possibility of CSS in downstream. The weak correlation between upstream and   downstream means that the spillover only existed in the channelization section, vehicles didn't overflow the link or affect the traffic in the upstream.

2) INVESTIGATION OF MARGINAL TTD
Kernel smoothing estimation and GMM were used for marginal distribution estimation, as presented in Fig. 9. Since the right-turn vehicles are not restricted by the signal, the right-turn travel time presents a unimodal distribution. Additionally, it can be observed that there are limited visual discrepancies between the GMM and kernel estimator, which means statistics results are required to decide which way to choose.
Larger p ks , p CvM values, and smaller HD values always indicate a better overall fit. As reported in Table 7, although  GMM has an advantage in characterizing TTD similarly, the kernel performs better on probability integral transform, which is indicated by lager p ks values. Overall, kernel smoothing estimation provided a comparable performance with GMM in this case.

3) OPTIMAL COPULA MODEL SELECTION
For multivariable distribution, the candidate copula is Gaussian copula and t copula. The comparison of these two copula models is shown in Table 8. The goodness-of-fit statistics of Gaussian copula and t copula are close to each other. Overall, Gaussian copula appeared as the best-fitting copula function.

4) ESTIMATION OF PATH TTD
Since the integral of multivariate TTD model is difficult, the Monte-Carlo simulation were used to get the results of the CDF of path travel time. Compared with the path TTD model that only considers the correlation between links, the path TTD model also considering the influence of CSS within downstream performs better, particularly in peak hour, as shown in Fig. 10.
For a more comprehensive comparison, statistics values are presented in Table 9. The numerical value proved it's necessary to consider the mutual influence of CSS within downstream. The values of the p ks and p CvM are much higher  than the model ignoring the CSS. Due to the existence of the CSS during the peak periods, the correlation of turning movements within the link is enhanced, which makes the superiority of the copula model more prominent. The copula-based model can mathematically capture correlation and appropriately characterize the path-level TTD during peak hours. However, during off-peak hours, two models have similar behavior.
Path-level TTD is usually complicated and difficult to calibrate because of the complexity and interrupted nature of urban traffic flows, caused by signal control, bus stops, roadside parking, disturbances from pedestrians and vehicle interactions. But the copula-based function simplifies the relationship, turn it into a mathematical model. Using the parameters in the copula-based model, it is able to describe the traffic flows' correlation within the link and between links. The reasonable copula-based model is proved to be accurate and may have wide applications in practice. However, due to the limited data amount and poor correlation between links, the TTD model of the path-level is less reliable than that of the link-level. In future research, more data on various paths should be collected and modeled.

V. CONCLUSIONS AND DISCUSSIONS
This paper introduced a copula-based method to estimate link-level and path-level TTD by taking both the CSS and the correlation of links into consideration. The method, without any assumptions for input variables, was applied in modeling TTD on an arterial road in Hangzhou, China. We compared VOLUME 8, 2020 link-level TTD estimated by copula model to the Gaussian mixture model. Moreover, path-level TTD model considering CSS was compared to the previous researcher's copula model, which only noting the correlation between links [20]. All results prove the necessity of considering CSS. Also, the superiority of the copula model in the travel time modeling is illustrated. The main findings are summarized as below: (1) Movement-level travel time represents significantly different dependence structures in different time of day. Such changeable correlation could be regarded as the overall effect of the CSS, which should not be ignored.
(2) For link-level TTD estimation, by taking CSS into account, the copula model with marginal distributions from movement-level travel time performs reasonably better than empirical Gaussian mixture model. The statistical superiority of the copula model is proved.
(3) When modeling path-level TTD, in off-peak periods, the copula model considering CSS has no apparent superiority over the copula model ignoring CSS, due to weak correlation. In peak hours, with the increasing correlation among movements and links, taking CSS into account makes the TTD model performs much more accurate. The necessity to consider CSS in path-level TTD has been proven. The proposed TTD model can make travel time estimation more accurate. On applications, if the underlying determinants of the travel time are to be analyzed, a more precise TTD model could be applied as a research basis. What's more, the underlying determinants could be introduced as variables in the copula-based TTD model, making the model more accurate and universal. With additional factors, the improved statistic TTD fitting could support effective travel time reliability analysis and average travel time prediction. Besides, the reliable TTD model is expected to have wide applications at a more macro level, such as grasping the network-wide traffic states index and targeting specific strategies aimed at improving traffic states.
It is worth noting that in this study, we only compare the copula model with unimodal distributions and GMM when characterizing link-level TTD. The results and conclusions derived above may have some limitations. To extend the initial findings, we are supposed to test other types of distribution models using more data. What's more, in the path-level TTD model where Gaussian copula and T copula are adopted, they may have limitations. More multidimensional copulas would be conducted in future work. Also, to have practical value, additional inputs are to be introduced to the copula-based model including link length, lane count, signal parameters, day of the week, time period of the day and so on. With adequate factors, a general and complete spatial-temporal TTD model can be used to predict the average travel time and its reliability. Besides, as this paper only focuses on link-level and short path-level TTD model, future research will extend the TTD model to paths with more than two links, route-level, and networklevel. The algorithm will be optimized to be faster for the application.