Estimating Travel Times of Mid-Volume Buses Considering Right-of-Way Variation

Transit agencies often provide estimates of bus travel times to downstream stops. This study aims to improve the perceived reliability of bus transit systems and enhance their competitiveness. This study considers the characteristics of low headway and high demand for mid-volume bus lanes. Considering the variation in right-of-way with respect to both time and space, a stop-based bus route is built to divide the road into sections. Available real-time data from a schedule-based mid-volume bus route are used, including bus global positioning system (GPS) data, road condition information, and weather. Based on the accelerated failure time (AFT) model, a dynamic travel time model considering right-of-way variation is established to estimate bus travel times between adjacent stops and explore the specific impact of bus right-of-way variation on the travel time. The AFT model is chosen because it can reveal the significance of different variables to estimate travel times, and simultaneously estimate expected travel times as well as travel time uncertainty. The experimental results indicate that bus right-of-way variation significantly affects travel times. In contrast to the linear model, the parameter estimated by the AFT model conforms better to expectations, especially for long-distance travel.


I. INTRODUCTION
With rapid urbanization and expanding motorization, the increase in household income has led to a rise in private car ownership in China. Residents are increasingly dependent on commuting by private car, which exacerbates traffic congestion and emissions in a metropolis [1], [2]. For instance, in Shanghai, a megacity in China, the urban area has reached 3,640 km 2 , with the population density increasing to 3,814 people per square kilometer, and private car ownership increasing to 616 vehicles per square kilometer in 2017. The number of passengers using public transportation daily has reached 17.96 million, which accounts for 49% of the total travel modes [3]. Therefore, methods need to be developed to relieve traffic congestion, protect the travel rights of low-income people, and implement traffic demand management fairly and productively within the limits of urban public resources [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Nilanjan Dey .
Effective right-of-way allocation policies can improve the efficiency of public transportation and guarantee the reliability of public transport travel times, which can attract more people to travel by public transport [5]. In the public transportation system, mid-volume buses have a higher travel time reliability when the right-of-way is not fully guaranteed, which is in contrast to the subway and regular small-volume buses. Mid-volume buses play an important role in improving the functional hierarchy of the public transportation system, servicing passengers in short and medium distances. The transport capacity and speed of the mid-volume buses is 5-20,000 people per hour and 20-35 km per hour, respectively, with departure intervals reducing to 2.5 min in peak hours. Mid-volume buses are a popular mode of public transportation considering passenger corridors, lack of subway coverage, and road conditions. Many mid-volume bus routes have achieved operational benefits in China. For example, the number of passengers using the mid-volume No. 71 bus route in Shanghai, China, has reached 51,000 people daily.
Compared with the subway, the mid-volume bus is more susceptible to traffic factors, weather, and road conditions, especially the right-of-way variation. Therefore, the reliability of mid-volume bus travel time is a crucial aspect of bus service quality, in which the right-of-way even plays a decisive role. Furthermore, in the process of the promotion of mid-volume buses, some roads cannot guarantee the right-of-way, which is limited by red lines and restrictions (e.g., the change in the number of lanes and the availability of bus lanes). In addition, the right-of-way use time for many bus lanes is limited in the morning and evening rush hours, such as during the hours of 7:00-9:00 and 17:00-19:30. During the non-use time of the bus lane, other vehicles can also pass. It is of considerable significance to estimate the travel time while considering the right-of-way from perspectives of time and space, which can improve the punctuality rate of a bus, attract people to take public transportation, and alleviate traffic congestion.
Various methods have been used to estimate bus travel time. These methods can be classified into four categories: models based on historical data, statistical models (time series, classic regression analysis), machine learning models (artificial neural network, support vector machines), and Kalman filtering models [6].
Models based on historical data assume that the current and future travel patterns depend on historical data, i.e., average travel time and average speed [7]. Existing studies have combined real-time location data from global positioning system (GPS) receivers with historical average travel speed, calculated the weighted mean value from the bus and current real-time speed, and predicted the travel time through the ratio of distance to speed [8]- [10]. Cortés et al. presented a method based on GPS data that systematically analyzed average bus speed [11]. However, the models developed with historical data cannot respond to conditions. Statistical models such as time series and classic regression models are commonly used to estimate travel times. Time series models assume that the historical traffic patterns will remain the same in the future, and their precision is highly dependent on the correspondence between real-time and historical traffic patterns [12]. Regression models build transparent relationships between travel times and a set of independent variables that can affect travel times [13]- [15]. Patnaik et al. developed a set of regression models to estimate bus travel times using data collected by automatic passenger counters installed in buses [16]. In general, the applicability of the regression models is limited because variables in transportation systems are highly inter-correlated [17].
Machine learning algorithms are a powerful tool for modeling complicated problems in which the relationship between input and output variables is not clear [18], [19]. The artificial neural network (ANN) is a popular approach to solving complex non-linear problems. The ANN models have shown good performance in predicting bus arrival times [20], [21]. The support vector machine is a specific type of statistical learning algorithm, which has been shown to outperform the historical average time-series methods [22], [23]. Julio et al. implemented and compared different machine learning methods to estimate bus travel speeds using real-time information about traffic conditions [24]. Although machine-learning models perform better than linear regression models, they are difficult to interpret and need a large amount of data to train the system [25].
Kalman filtering is an efficient recursive procedure that estimates the future states of dependent variables. It originated from the state-space representations in modern control theory [26]. Chien et al. [12] as well as Cathey and Dailey [27] introduced Kalman filtering methods to estimate bus travel time by using real-time and historical data [12], [27]. Bai et al. applied the Kalman filtering-based dynamic algorithm to adjust bus travel times with the latest bus operation information and estimated baseline travel times [28]. A model based on the Kalman filtering algorithm needs continuous real-time feeds, and data fluctuations might cause difficulties in solving the time lag problem [14].
Besides the estimation methods, the factors affecting the distribution of travel times have been divided into the road, traffic environment, vehicle, and sudden factors [16]. Geometric conditions, route length, number of intermediate stops, and intersections are usually included in road factors. Traffic environment factors are generally expressed by weather and temperature, while the vehicle and sudden factors include speed, arrival time, departure time, travel time, incidents, and accidents. For a bus travel time estimation system, the primary data sources are tracking devices, such as a GPS. GPS-equipped buses provide a trajectory of the buses along their respective routes throughout the day by updating the location information every 5-10 s [17]. The bus transport agency uses the advanced traveler information systems (ATIS). This is one of the main intelligent transportation systems (ITS) that tracks their vehicles and collects bus operation information. By using ATIS, bus transit agencies have increasingly adopted the automatic vehicle location (AVL) and automatic passenger counter (APC) systems [29]. Moreover, information on travel time, arrival time, departure time, and speed along a route are available for these buses. However, no previous study has incorporated the right-of-way constraints into the estimation of bus travel times, such as the change of lane numbers and the availability of bus lanes.
To fill these knowledge gaps, this study used the regression model, considering the right-of-way and other influential factors, to estimate the mid-volume bus travel times. The most significant advantage of the regression model is that it reveals the significance of variables for estimating travel times. In addition, it estimates the expected travel times and travel time uncertainty. Through the analysis of the right-of-way factors, this study provides some recommendations to promote the mid-volume bus.
The remainder of this study is organized as follows. Section 2 describes the data sources and variables used to develop the travel time estimation model and the data processing procedure. Section 3 describes the methodological VOLUME 8, 2020  Data used for this study were collected from the 17 km midvolume No. 71 bus route-known as the ''ground subway''that connects Hongqiao Hub, the largest transportation hub, with Bun, the most prosperous business circle in Shanghai, China. Particularly, only the data of the section containing stops 1-5 were used in this study. These stops were monitored from August 26 to September 8, 2018. The selected section covered complex bus right-of-way variations between adjacent stops. For stops 1 to 3, the road has no bus lane, and the number of lanes changes. For stops 3 to 4, only part of the road has a bus lane. For stops 4 to 5, the entire road has a bus lane. The route and selected sections are shown in Fig. 1.
The average daily passenger flow of this line has reached 51,000 people, which ranks first among bus lines in Shanghai. Various factors may affect the uncertainty of travel times, which in turn causes unstable operations. The high midvolume bus arrival frequency could result in a problem in which the real travel time between adjacent bus stops of a bus in front is unavailable. This is because the headway is often less than the travel time between adjacent bus stops, which means that many buses are running between adjacent stops simultaneously. In this case, it is impossible to use the travel time of the bus in the front. This partially reflects the road condition to estimate the travel time of the bus behind with the missing data. Therefore, the influence of the road conditions should be considered, especially the bus right-of-way variation, such as the bus lane, and the number of lanes and intersections. The methods developed in this study will likely provide more precise estimates when applied to routes with lower demand patterns, lower departure frequencies, and more consistent right-of-way conditions.
The No. 71 bus operates using a schedule-based scheme in which buses are dispatched regularly by schedule. Forty buses were generally in operation during the study periods, with scheduled headways ranging from 2-3 min during high peak departures and 4 min during non-peak departures. All buses in the No. 71 fleet are equipped with AVL systems. In these systems, GPS receivers are usually interfaced with the global system for mobile communications (GSM) modems and placed in the buses. The systems basically record point locations in latitude-longitude pairs, date, time, speed, and direction of the buses. The acquisition frequency of the GPS data is 6-12 times/min, which means that the GPS data is updated every 5-10 s.

B. DATA PROCESSING
The goal of this study is to estimate bus travel times between adjacent stops primarily using real-time GPS data. A total of 2,262,911 observations were available for modeling. These data were examined and cleaned to eliminate erroneous or potentially inaccurate information. The data processing process is shown in Fig. 2.

1) DATA TRANSFORMING
A stop-based approach considering right-of-way is proposed in this study. There are two major technical challenges that we addressed in our approach. The first challenge is to accurately divide a bus route into sections based on bus stops. Existing data do not have information on the number of times buses go ''in'' and ''out'' of bus stops. The second challenge is to reflect the impact of right-of-way on mid-volume travel times. The central data transformation is described below: Arrival, departure, and travel time Arrival and departure time information from the GPS data were considered for the development of the estimation model. First, by calculating the distance between a bus and bus stop i, observation distances of less than 50 m. The zero velocity observations were then sorted by time. The first zero velocity corresponds to the arrival time, and the last one corresponds to the departure time. The travel time of the bus k between the bus stop i and the bus stop i + 1 can be expressed as arrival time minus departure time, shown as equation (1). Relevant variable descriptions can be seen in Table 1.

Right-of-way
Right-of-way is separated into the two main aspects of time and space, respectively. Between them, the spatial right-of-way of the bus is composed of the right-of-way on the road sections and at the intersections. The following will describe the data transformation of the two aspects.
Above all, to express the impact of the temporal right-of-way of the bus, 0-1 variable t is defined.
When it comes to the spatial right-of-way of the bus, three key factors should be considered, namely bus lanes, the number of lanes, and intersections. Roads are segmented based on the right-of-way to ensure that a section has the same number of lanes and bus lane setting. In this study, it is assumed that the bus lane usage time is the same and synchronized.
where i is the identifier for a bus stop. Between adjacent stops i and i+1, a is the number of sections with a bus lane; b is the number of sections without the bus lane; and j is the number of lanes other than bus lanes in one section. In addition, M represents the number of bus lanes in an equivalent non-bus lane when t = 1. This study gives an assumption that M is large enough for buses to run without disturbance of other vehicles. Specifically, the value of M is set as 5. All variable descriptions are shown in Table 1.
2) DATA SCREENING First, some observations have shown incomplete GPS track information, or severe deviation from bus lines. Traffic accidents or GSM system errors may cause these phenomena. Operational interruptions may result from various reasons, including driver rests, driver shifts, mechanical breakdowns, and emergencies. When an interruption occurs, travel time estimations are no longer reliable, which has a significant impact on operational continuity. The data were screened to  6:30 am to 11:30 pm, observations of the data generated in other periods were deleted. Finally, the unreasonable short travel times are also deleted based on the maximum speed of the bus, which was 60 km/h, and the distance between adjacent stops. Table 2 shows descriptions of the variables used in this study. Variables can be divided into two categories: weather and peak time. The standard deviation of the travel time is larger than that of other variables, which indicates that many factors influence the bus travel time.

3) DATA DESCRIPTION
Because the westward and eastward bus stops are set slightly differently, the selected section was numbered based on trips, as shown in Fig. 3, from trip 0 to trip 5. Clustering travel times by period is vital because travel times between stops usually vary over the time of day, resulting in different travel patterns. The travel time of the six trips (trip 0 to trip 5 are drawn in sequence) every day over two weeks is shown in Fig. 4. Fig. 4 shows that the travel time of trip 0, the first trip of every day, is higher (approximately 1000 s) than that of the other four trips (200 s to 300 s on average) as a result of the longer distance between the bus stop 1 and bus stop 3. The travel time distributions over different days of the week seem to be nearly the same. Compared with a weekday, the travel times during a weekend will be relatively flat.

III. METHODS
This section describes two types of models. By using a stop-based model, linear regression models were considered for comparison purposes because this modeling method has been used extensively in the literature to estimate bus travel times. The AFT survival models were explicitly proposed   because they can simultaneously estimate expected travel times and travel time uncertainty [18]. Fig. 5 shows an overview of the proposed approach. First, the inputs required for using the proposed model are real-time GPS data [8], weather data, and road conditions containing bus right-of-way information. Second, through the bus GPS data, including longitude, latitude, and speed information, and the actual stop location information, the bus route is segmented based on the location of the stop and time period segmentation is obtained. Then the dynamic travel time estimation model of the stop-based bus considering right-of-way is established based on the AFT survival model.
The travel time of the bus k between the two adjacent stops i and i + 1 includes the running time, delay on the road, and delay at the intersections [13], [30]. There are many reasons for the delay, such as distance, change of lane number, VOLUME 8, 2020 signal control, and interference from pedestrians and other vehicles. The stop-based model is given as follows: To estimate the unknown parameters in (6) considering the right-of-way, the following data transformation, which was mentioned in Section 3, can be used.
The length of adjacent stops represents the running time. The percentage of bus lanes and the weighted number of lanes represent road delays, which reflect the right-of-way of the bus on the road. The number of bus non-priority intersections represents delay at intersections, which reflects the right-of-way of the bus at the intersections.

A. LINEAR REGRESSION MODEL
The linear regression model is specified as: where T trav is the dependent variable to be estimated, β n are the coefficients to be estimated, x n are the observed estimator (independent) variables, and ε is an unobserved random error term associated with each observation n. In a linear model, the effects of all variables are constrained to be additive. For example, a unit change in any independent variable x n corresponds to an additive change in the expected value of the dependent variable by an amount equal to the associated model coefficient β n . These model coefficients are typically estimated by using the ordinary least squares (OLS) procedure, which selects the coefficients that minimize the sum of squared errors between the actual observations and those estimated by the model.

B. ACCELERATED FAILURE TIME SURVIVAL MODEL
The survival model estimates the distribution of the time remaining until the occurrence of a specific event. The event considered in this study is the arrival of a bus to a downstream stop location. An AFT survival model assumes that an independent variable x n acts multiplicatively on the dependent variable T (i.e., additively on the natural logarithm of T ) [31]. The model form can be written generally as follows: where T trav is the time before the occurrence of event n and µ is an unobserved random error term. When there is no censoring of T trav , the coefficients β n can be estimated by OLS regression of log(T trav ); with censored data, maximum likelihood estimation is applied instead. The distributional form of the unobserved error term µ i determines the regression model. Four typical duration distribution alternatives include Weibull, generalized Gamma, log-normal, and loglogistic distributions for T trav . The survival function represents the probability that the event happens after a certain time [32]. An important function in the survival model is the hazard function h(t). It is the probability that a bus will arrive at the stop after it has traveled to time t during a very short interval (t, t + t). The hazard function can be expressed as follows: Another essential concept in duration analysis is the survival probability s(t), which gives the probability that the bus does not arrive at the stop beyond time t. The survival function can be expressed as follows: (12) where f (t) and F(t) denote the probability density function and the cumulative distribution function of t, respectively [33].

IV. REGRESSION RESULTS AND MODEL EVALUATION
Eighty percent (80%) of the data were randomly selected for modeling, and 20% of data were used to test the model. Both the linear regression models and survival models were developed using the analysis dataset described in Section 2. The mean absolute error (MAE) and root mean squared error (RMSE) were used as the measures of model performance, which represent the average percentage difference between the observed value (in this case the observed arrival times at a bus stop) and estimated value (in this case the estimated arrival times at a bus stop). In addition, the Akaike information criterion (AIC) is essentially an estimated measure of the quality for each of the models as they relate to one another for a certain set of data, thus making it an ideal method for model selection. More specifically, based on the probabilities for the assumed distribution, the AIC works to balance the trade-offs between the complexity of a given model and its goodness of fit. For this model, k represents the number of parameters. In addition, the log −likelihood is a measure of model fit. The higher the number, the better the fit.

A. REGRESSION RESULTS
The regression results of the AFT and the linear models are shown in Table 3. A smaller MAE for the survival models indicates that the magnitudes of the error terms for the survival models are generally smaller than the error terms for the linear model. The AIC value for the AFT survival model indicates that the AFT survival model fits the data better than the linear model from a probabilistic perspective. The close values of the RMSE for all the models indicate that there are a small number of observations with larger error terms in the survival models than for the linear model. From these observations, it can be concluded that the survival models provide slightly better overall predictions than the linear model and that the performance of the various forms of the survival models are approximately equivalent. Furthermore, the R-squared value for the linear model is 0.547, which means that approximately 54.7% of the observed variation can be explained by the model's inputs. This result shows that the fitting effect of the linear model is not good.
According to the AFT model regression, considering the right-of-way variation, all three variables (percentage of bus lanes, weighted number of lanes, number of non-priority intersections) are significant in influencing the bus travel times. The most significant factor is the percentage of bus lanes followed by the number of non-priority intersections, and the number of weighted lanes. Meteorology, temperature, and humidity have a negative influence on travel time, whereas weekdays and peak times have a positive association with travel time. In contrast to the AFT model, some of the linear model parameter estimates were not consistent with expectations, such as peak time, percentage of bus lanes, the number of weighted lanes, and the number of non-priority intersections.
One of the most important results of the AFT modeling technique is survival probability and hazard function. Fig. 6 illustrates the change in the survival probability and hazard function by increasing the travel time. The survival probability of the AFT model decreases as the travel time grows. The hazard function of the AFT model increases as the travel time increases. Fig. 6 (a) and (b) show that when the travel time is more than 300 s, the survival probability is low, and the hazard function has a sharp increase when the travel time is more than 1,000 s. It is a remarkable fact that the larger the percentage of the bus lanes is, the faster the survival probability decrease and the hazard function increases, as shown in Fig. 6 (c) and (d).

B. MODEL EVALUATION
After the model estimation, the 20% validation dataset (2,198 samples) was used to compare with the estimative capabilities of the linear and survival models. Because the travel time of trip 0 is obviously greater than that of the other four trips, the validation dataset was divided into two groups: trip 0 and trip 1 to 5. Table 4 shows the evaluation of regression models. This study found that the AFT model accurately estimates the travel time due to its low RMSE, MAE, and AIC values. Apart from this, the AFT model performs better than the linear model in regression estimation of the travel time for longer distances. Besides, when the AFT VOLUME 8, 2020 model accepts missing data, the precisions of estimation will be enhanced a step forward. Therefore, the AFT model could calculate the accumulated ending probability of travel times.

V. DISCUSSION
The survival model was specifically considered because it could estimate the distribution of travel times. Variables such as the percentage of bus lanes, length, number of weighted lanes, number of non-priority intersections, and weekday and peak times were found to be statistically significant at a confidence level of 0.95 in the models, and their parameter estimates were all consistent with expectations. The categories of variable weather, clear, cloudy, and rain, were not found to be significant at a confidence level of 0.60. The possible reason is that, although rainfall and snowfall may have a great impact on bus travel times, the data collection was in the summer, and from August 26 to September 8, 2018, it only rained for a few hours, and there was no snow. In comparison with the survival model, linear regression models were estimated, but they were not efficient. In particular, the parameter estimation of some variables did not conform to expectations.
Variables about right-of-way were more important than other variables in the model. On the one hand, when the headway of the mid-volume bus is very small, and many buses are traveling between two adjacent stops at the same time, the travel time of the bus in front cannot be used as a reference to reflect the road conditions. It is necessary to consider various factors of complicated road conditions separately. On the other hand, bus travel times change with right-of-way variations. Take the top three parameters about right-of-way as an example. The most significant parameter, the percentage of bus lanes, is negatively correlated with travel time and has the highest Wald statistic, suggesting that increasing the proportion of bus lanes can significantly reduce bus travel times. It can be seen from the survival probability in Fig. 6 that, as the percentage of bus lanes increases, the reliability of bus travel times is improved, and the probability of arriving at the bus stop in less time has increased. Similar to the first parameter, the number of weighted lanes is also negative. This indicates that under the circumstance of no bus lane, or the more bus lanes on the road, the bus travel times will be shorter. Finally, there is a positive correlation between bus travel times and the number of non-priority intersections, the third typical parameter, which implies that the more non-priority intersections there are, the longer the travel time is. The uncertainty of bus intersection delays can be reduced by green wave controls at the intersections.
For mid-volume buses with high travel time reliability requirements, it would be helpful if bus temporal and spatial right-of-way can be ensured by establishing bus lanes and increasing dedicated time for bus lanes. However, due to the restrictions of road red line and the limitation of the number of lanes, it is generally impossible to expand the capacity of the road by adding lanes or changing one lane into the bus lane. Therefore, it is suggested that high occupancy vehicle lanes be built for buses and vehicles with multiple passengers. In addition, mid-volume buses are usually on schedule-based routes, and the actual headway is usually less than the scheduled headway, which may contribute to the condition that two buses arrive at the same stops within a very short time. It is suggested that the signal controls at the intersections can be used to adjust the actual headway of buses.
The estimation of bus travel times between two adjacent bus stops uses real-time data and GPS data. By combining the longitude and latitude information of the bus with that of the stops, the arrival and departure time data of the bus can be obtained. Subsequently, the AFT model is used to process the obtained data and provide travel time information at the next stops with minimal error. Apart from this, the static data, right-of-way information, and variables about time, peak time, and weekday reflect the traffic condition of the road. The model will improve the reliability of the public transport system and assist passengers in planning their trips to minimize waiting time, thus attracting more travelers to bus trips and helping relieve traffic congestion.

VI. CONCLUSIONS
This study investigated the problem of estimating mid-volume bus travel times between two adjacent stops with real-time traffic information. The AFT survival model was applied to estimate the bus travel times to the next stop as a function of bus GPS data, right-of-way information, and weather data that would be available in real-time. A stop-based bus route was built to divide the road into sections considering two aspects of right-of-way variation: time and space. The spatial right-of-way of the bus, involving both road sections and intersections, is influenced by three primary factors, namely, the percentage of the bus lane, weighted number of lanes, and prioritized intersections. The travel times between adjacent stops were then divided into running time and road delay. The AFT model was chosen because it reveals the significance of variables in estimating travel times as well as simultaneously estimating expected travel times and travel time uncertainty. Compared with the linear model, the parameter estimation by the AFT model conformed well to expectations, and the AFT model performs better in the regression estimation of travel times for longer distances. The methods developed in this study will likely provide more precise estimations when applied to routes with low demand patterns, low departure frequencies, and more consistent road conditions. The findings of this study could improve the accuracy of estimations of bus travel times and could be used in various ITS applications, such as passenger information systems.
One of the limitations of this study is that the data were collected in August and September when there was no snow and little rain in Shanghai, and the estimation model cannot be extended to other seasons. In the future, the travel time in other seasons can be modeled more accurately by considering the effects of rain and snow. Another limitation is that the bus lane for No. 71 mid-volume bus is used 24 h, so the impact of the bus lane used at different periods on travel time can be further explored. In addition, this study only estimated the mid-volume bus travel times between two adjacent stops. A long-period study might increase the inaccuracy of the results because of fluctuations in traffic volume and accumulation of deviations between the actual and scheduled headways. Future studies will extend to the estimation of travel times to any downstream bus stop by allowing for the individual bus delay at stops. Such a problem will become more challenging when a large number of passengers getting on and off mid-volume buses are not available. Due to the limited data, the research does not focus on the model evaluation and simulation to verify the validity and superiority of the survival model. Based on collecting multiple bus lines in multiple cities, the future study can focus on verifying the uncertainty of the model prediction, the factors affecting the accuracy, and the sensitivity of the prediction.