Estimate Passengers’ Walking and Waiting Time in Metro Station Using Smart Card Data (SCD)

Passengers’ walking time and waiting time at metro stations are important indicators for evaluating the level of service (LOS) of a metro station. This paper aims to estimate passengers’ walking time and waiting time at metro stations based on a proposed passenger-to-train assignment method using smart card data (SCD). Firstly, two algorithms were developed to determine the possible train choice set for all the passengers. Secondly, passengers’ walking time distribution was estimated for different group of passengers categorized by travel periods, type of cardholder and social demographics. Lastly, passengers’ waiting time at origin stations was explored based on the train choice model proposed in this study. The study found that passengers traveling at peak hours tend to walk faster than they do at non-peak hours. The overall waiting time at origin stations is highly related to the train headway (time interval between the adjacent arrival trains in the same direction and station) and passenger volume. More importantly, the result suggested when the passenger volume is low, train headway could significantly affect waiting time. However, when the passenger volume exceeds certain level (>1000 in 15 minutes), it becomes to the major factor leading to the increase in waiting time. Elders and disabled passengers have extended waiting time during peak hours than other passengers. The method proposed in this study offers a new perspective of using SCD in travel behavior analysis. Besides, the results bring insights for future metro station construction and operation design as well as evaluation on the LOS of metro system.


I. INTRODUCTION
Metro system, with high efficiency and sustainability, is one of the most popular public transportation modes widely developed in many cities in China. In comparison to other public transportation systems, metro system provides more reliable travel time due to its highly-organized and independent operation system. In Beijing, the daily maximum passenger volume of metro system can reach up to ten million since 2018 [1]. Passengers' experience in metro system is closely related to architectural structure design and the system's level of service (LOS). To enhance the passengers' satisfaction, metro system designers and operators should focus on improving the service level from different perspectives.
The associate editor coordinating the review of this manuscript and approving it for publication was Roberto Sacile. Passengers' waiting time at origin station is an important indicator that measures the operation performance of the metro system. Previous studies have pointed out that passengers are more likely to perceive waiting time compared to in-vehicle time [2][3][4]. Waiting time is related to several factors, such as station environment, time-of-day, and travel purpose. It has been reported that random passengers (who are not arranging their arrival time) are more likely to arrive at non-peak hours, but passengers traveling at peak hours tend to experience shorter waiting time [5][6]. Recent study found that station auxiliary facilities, such as shelters and benches, have impact on passengers' perception of waiting time [3].
Though waiting time plays an important role in LOS evaluation, challenges remain regarding how to obtain waiting time practically. Many public transit assignment models, especially the frequency-based models usually assume that passengers arrive randomly and the average waiting time equals half of headway in both bus and metro systems [7][8][9][10][11][12][13]. Some studies investigating passengers' arrival patterns also applied simple linear relationship between waiting time and headway [14]. However, other studies believed that passengers should not be treated with random arrival as a whole [15]. With the open access to public transit timetable, threshold has been explored between arranged arrival of passengers (who arrange their arrival time based on train schedule to minimize their waiting time) and random arrival of passengers. Some researchers suggest that 10 to 11 minutes headway is the dividing line between random passengers and nonrandom passengers, and passengers tend to arrange their arrival time to minimize waiting time when the headway exceeds 38 min [16]- [18].
The conventional methods to directly obtain passenger's waiting time consist of field observation and personal survey. In recent decades, the widespread use of smart card for public transport brings new perspective for waiting time estimation. Smart card data (SCD), which collect travel information at both entry and exit gates of metro system, minimizes the human role in data collection [19]. Ingvardson et al. developed a method to estimate waiting time using smart card data by assigning the traveler to the first train running between the origin and destination stations [20]. Specifically, the study counted waiting time as the period from tap-in to the departure time of assigned train, but this situation was not exactly true, since passengers' walking time from the entry gate to the platform was included in the waiting time.
Walking time, as the time tapping the card to enter the station to the platform, is different period from waiting time, is another important indicator that should be considered throughout every trip from entry to exit. For passengers, walking time to get access to the train directly influences their train choices and waiting time on the platform. For metro system operators, walking time reflects the rationality of station design and provides valuable information when it comes to improve the system LOS.
Previous studies have applied different methods to estimate passengers' walking time. Some studies assumed that passengers with the shortest travel time of each Origin-Destination (OD) pair have zero waiting time and applied regression model to estimate the total walking time in the trip [21]- [22]. However, how to define the ''shortest'' travel time seems to be a practical issue. Therefore, field survey has been commonly used to collect walking time information. Sun and Xu selected eight stations with a sample size of 40 passengers to determine the proper distribution of walking time [23]. Chen et al. conducted passenger travel trajectory analysis and estimated the walking time by field survey [24]. Similarly, some studies explored the route choice behavior or passenger flow assignment by using fixed walking time determined by survey data [25]- [26]. In addition, some studies obtained the walking time using walking distance divided by walking speed. For example, Zhu et al. proposed a method by using walking distance obtained from station plan divided by the average walking speed of 1.12 m/s (derived from agency observation) to obtain the exit time distribution [27]- [28]. Zou et al. calculated the access and egress walking time in corresponding to the walk distance and the average walking speed [29].
The traditional methods resorting to surveys or observations to obtain passengers' walking time and waiting time are usually time-consuming, which requires high cost on human resources. With the availability of smart card data (SCD), this study provides a new approach to estimate passengers' walking and waiting time. Specifically, the study has four objectives: 1) to deduce passengers' train choice set using smart card data and train schedule timetable; 2) to estimate passengers' walking time and waiting time in a given metro station; 3) to explore the walking time distribution among travel timeof-day (peak vs. non-peak hours), passenger characteristics (elder & disabled vs. normal passengers) and cardholder type (long-term vs. one-time); and 4) to explore the relationship among train headway, passenger volume and waiting time. Following the research objectives, several assumptions need to be clarified prior to proceeding the study. These assumptions are: 1) all passengers arrive at the station randomly; 2) passengers entering a station share the same walking path as passengers exiting the station; 3) all trains run punctually in accordance with the timetable; 4) passengers walk out of the destination station without lingering after they get off the train; 5) capacity constraint is not considered when trains are assigned.
In the following Section 2, the concept of travel trajectory in metro system and definition of passengers with single feasible train choice are introduced. Section 3 presents the algorithms design on passenger-to-train assignment, which aims to determine the possible train choice set for all passengers. Meanwhile, the methodology of walking time estimation, train selection and waiting time estimation is introduced in this section. Estimation results of four selected stations in Nanjing Metro system for walking time, waiting time and passenger delay are presented in Section 4. Discussion with practical application are provided in Section 5, and conclusions are drawn in Section 6.

II. COCEPTS IN THE STUDY A. TAVEL TRAJECTORY AND TIME PERIODS
The determination of travel trajectory in metro system is relatively simpler than that in roadway system, especially with the information provided by smart card data (SCD). A metro station can be divided into two main areas, i.e. pay zone and free zone. The information recorded by SCD only covers the time period within pay zone. When passengers swipe the smart card at the station entry or exit gate, the station ID and entry/exit time are recorded. In general, travel time without transfer contains four parts, which are (1)   spatiotemporal travel process of passenger from entry to exit without transfer. In this particular situation, two trains (No.2 and No.3) are operated between the time interval from tap-in to tap-out.
Since the time-related information within station is limited from SCD, it is difficult to determine the possible train choice of passengers. However, if combining the operation train timetable with SCD, more information can be extracted. Figure 2 illustrates the time periods of traveling in metro system. Four types of activities occur in four periods, and the certain time nodes can be determined once the boarding train is confirmed.

B. PASSENGERS WITH SINGLE FEASIBLE TRAIN CHOICE
The travel time between the same origin-destination (OD) stations could be very different among different passengers. Figure 3 describes two common situations of traveling between an OD pair. In situation 1, the passenger has only one feasible train choice within the travel period. However, in situation 2, as the travel time extends, more choices become available. Passengers with single feasible train choice can be recognized as a group of people having minimum waiting time and optimized travel time between any OD pairs. If we assume that the passengers walk out of the stations without lingering (assumption 4), which is mostly true in the real life, and combine it with assumption 2, the egress walking time can then represent the access walking time of the same station. With the availability of access walking time, passengers' waiting time on the platform can be obtained.

III. METHODOLOGY
The methodology consists of three parts. The first part (3.1&3.2) describes the two algorithms developed to form train choice set for all the passengers. Meanwhile, the

A. NOTATIONS
The notations and parameters applied in passenger-to-train algorithms are listed in Table 1.
Parameters calculation equations:

B. PASSENGER-TO-TRAIN ALGORITHMS
To identify the possible train set for passengers, two algorithms are developed based on different assumptions and constraints. The first algorithm is established based on the assumption that passengers would board the first coming train to minimize the access time at origin station without considering train capacity constraint (equation 5).
As such, the selected train should satisfy the constraint of equation 6,which indicates that the egress time of passengers on the assigned train should not be less than zero.
If the constraint is satisfied, the train will be selected and included in the setH S 1 .
The second algorithm is established based on the assumption that passengers would exit the destination stations without lingering (equation 7). They are assigned to the last arrival train before tapping-out at destination station without considering train capacity constraint.
In the same manner, the selected train should satisfy the constraint that the access time should not be less than zero (equation 8).
If the constraint is satisfied, the train will be selected and included in the set H S 2 .
Through the two algorithms, the assigned train number, passenger access time and egress time of selected train can be obtained (equation 2 & equation 3).
The flowchart of the algorithms is shown in Figure 4.

C. ESTIMATION OF WALKING TIME DISTRIBUTION
Passenger train choice set can be deduced from the two developed algorithms. The first algorithm determined the first train for passengers within the travel time while the second algorithm deduced the last possible train. By combining the two algorithms, a group of passengers with the same train choice can be identified, namely, passengers with single feasible train choice. Their egress walking time provides valuable information to estimate the overall average walking time by applying assumption 2.
For the same walking distance, a lognormal distribution is adopted for egress walking time since the walking speed at metro station is usually considered as normal or log-normal distribution [30]- [33]. To ensure an unbiased and asymptotically efficient estimation on population mean and variance of walking time, the Maximum Likelihood Estimation (MLE) is applied in this study.
Maximum likelihood function: Take logarithm for maximum likelihood function: VOLUME 8, 2020 Use derivation method for parameter µ and σ 2 : Solve equation (11), and get (12) and (13), The mean and variance of lognormal distribution are,  N (µ, σ 2 ). The train will be selected when the egress walking time has the highest estimation value in equation 16.
where, c s o,walking access,p,h represents the mean walking time of each station obtained by MLE.
In order to evaluate the average waiting time in different time periods, the entry time recorded in smart card data is sliced by every 15 minutes from 6:00 to 24:00. The average waiting time is calculated by equation 18,where c is the estimated mean walking time and N p is the total number of entered passengers in a 15-min time slice.

IV. RESULTS
Four stations are selected from Nanjing Metro system to perform the overall estimation results. They are categorized into small-scale and large-scale stations which is determined by the total passenger volume of entry and exit in one workday. The results are presented in four sections: (1) a brief description of data input; (2) walking time estimation results of four selected stations; (3) waiting time estimation results of four selected stations; (4) passengers' total delay estimation at the origin stations.

A. DATA DESCRIPTION
The data used in this study were provided by Nanjing (China) Metro Group Company, Ltd. in 2014. Figure 5 shows a simplified Nanjing Metro network in 2014. It consisted of five lines in total and one of them (S8) is disconnected from the other four lines. There are 88 stations on these lines, between which over 1500 trains are operated every day. A typical workday data of four stations were used in this study (November, 03, 2014) and the trips with transfer were excluded. By applying the algorithms proposed in the above section, the possible train sets for each trip are determined. Data descriptions of selected stations are listed in Table 2. In specific, station 14 is close to Nanjing train station and station 31 is nearing one of the most famous attraction spots in Nanjing. Trips generated and terminated at each station are composed by two types of passengers: passengers with single feasible train choice (1) and passengers with more than one train choices within their travel period (>1).

B. PASSENGERS' WALKING TIME DISTRIBUTION
The walking time estimation was based on passengers with single feasible train choice. Three potential factors relevant to passengers' walking time distribution in different size of stations are explored, namely, passenger characteristics (elder/disabled passengers vs. normal passengers), travel period (non-peak vs. peak), and cardholder type (long-term vs. one-time). Specifically, the long-term card represents stored-value card and one-time card represents single-use recycled card.
The walking time distribution of four stations and the box plot of different groups of passengers are shown in Figure 6. The estimation results suggest that most passengers spend two to five minutes in walking. The long-tail in large-scale stations indicates that there is a small group of passengers walking extremely slowly compared to the others.
There is noticeable passenger group difference in average walking time in small-scale stations, whereas the group difference in large-scale stations is very small. In most stations, the normal passenger group who hold long-term card and travel in peak hours have the shortest average walking time. Moreover, elder/disabled passengers and passengers with one-time card traveling in non-peak hours have longer walking time than other groups. In order to explore the variance within and between groups, the walking time distribution comparison are plotted in Figure 7, from which three major tendencies can be observed. Firstly, elder/disabled passengers mostly walk in an average-level speed. Secondly, the walking time distribution during peak hours is more concentrated than that in non-peak hours. Lastly, passengers holding one-time cards tend to have longer walking time than other passengers holding long-term cards.

C. PASSENGERS' WAITING TIME
Waiting time estimation is based on the train assignment method proposed in the previous section. The average waiting time for all passengers and elder/disabled passengers are presented in Figure 8. The result shows that the duration of waiting time is greatly affected by train headway and passenger volume In the most situations, the average waiting time decreases with the reduction of train headway. However, when the headway exceeds 6 minutes, the overall waiting time increases remarkably. Besides, large passenger volume also contributes to long waiting time even when the headway is small. For example, in Station 14, when the passenger volume is over 1500 within 15 minutes, the average waiting time increases to 2 minutes which exceeds half of the train headway. Station 16 shows a worse situation when passenger volume is over 3000 within 15 minutes, as the average waiting time is longer than headway. This means a part of passengers are not able to board the first coming train due to over crowdedness.
As shown in Figure 8, elder/disabled passengers are more likely to wait longer time during peak hours. This is probably because elder/disabled passengers are more willing to extend their waiting time to increase the chance of having a seat than normal passengers.
Moreover, the figure 8 demonstrates that headway has strong influence on average waiting time especially when the passenger volume is low. However, as the passenger volume reaches a critical point, shortening train headway has no substantial impact on reducing the average waiting time. To explore this critical point, the relationship between passenger volume and average waiting time at four stations is explored and presented in Figure 9. The result suggests when the number of entering passengers is over 1000 in 15 minutes, the waiting time starts to increase even though the train headway is small.

D. PASSENGERS' DELAY AT ORIGIN STATIONS
The LOS of metro system should be enhanced through reducing passengers' walking distance in the station and their waiting time on the platform. In this study, the total time of walking and waiting (access time) in metro station is used as passengers' delay to get access to the train. To examine the differences of passenger delay between different passenger groups and stations, the Mann-Whitney U test is adopted since the data have a non-normal distribution. The results are listed in Table 3. According to the results, passenger characteristics, travel period, cardholder type and station scale all have significant impact on passengers' delay. Elder/disabled passengers have significant longer delay than normal passengers. People travelling at peak hours have shorter delay than those travelling at non-peak hours, and long-term card users have shorter delay than one-time card users. Moreover, largescale stations are associated with significantly shorter delay compared with small-scale stations.

V. DISCUSSIONS
Smart cards used in metro system contain substantial amount of data, and relatively accurate information of passengers' spatiotemporal movement within a metro system. The algorithms developed in this study make full use of the SCD and train scheduled timetable to provide valuable information (e.g. passenger walking time and waiting time) for the metro system LOS evaluation. The algorithms are developed based on several reasonable assumptions which have also been adopted in previous research [34][35][36]. Table 4 gives the comparison between passenger-to-train methods proposed by different studies.
Previous studies mentioned that passengers with single feasible train choice can only represent the passengers with faster walking speed [27][28]. In this study, passengers with single feasible train choice account for about half of the travelers, which is a considerable amount. Moreover, the average walking time deduced from this group of passengers is around two to five minutes, which is consistent with the estimated walking time in Beijing Metro Network [24]. This result suggests that passengers with single feasible train choice benefit from short waiting time on the platform, rather than fast walking speed. Another study conducted in Singapore reported a quite short walking time compared to the present study, which is 20 to 70 seconds [22]. It is suspected that the big difference in walking time largely depends on the station design in different cities/countries, instead of passengers' walking speed.
Passengers with different characteristics, holding different types of cards and travelling at different periods and stations present different walking patterns. During peak hours, passengers tend to walk at a faster speed than they do at nonpeak hours. This can be explained as during peak hours, the travelers are mostly commuters who are familiar with the path and have strict time limit, and a short walking time can decrease the overall travel time to some extent.
Elder/disabled passengers travelling at non-peak hours were the group with longest walking time. Passengers holding one-time recycling cards walk more slowly than long-term card holders, probably due to their unfamiliarity of the path. Large-scale stations undertake more trips compared to smallscale stations, and thus the differences between passenger groups are reduced in large-scale stations.
The estimation result of waiting time suggests that train headway is a critical factor, and large headway can greatly extend the overall waiting time. In addition, the increase passenger volume during peak hours could increase the overall waiting time but the increment is not remarkable. The possible reason relates to a frequent train arrival in peak hours which somehow offsets the increasing tendency, and a similar finding has been reported by Sun and Xu [23]. However, a critical threshold of entering passenger volume (1000 in 15 minutes) was found, over which the waiting time starts to increase even though the train headway is short.
The overall access time shows discrepancy between different groups of passengers, and specific group of passengers have longer delay than the others. For example, elder/disabled passengers and passengers holding one-time card are more likely to spend longer time at origin stations. Furthermore, passengers travelling at peak hours and large-scale stations have less delayed than those passengers travelling in nonpeak hours and small-scaled stations.
Overall, results of the study have several practical implications on metro station design, train operation and passenger travelling plan. Walking time is an important measurement that could be used in evaluating the rationality of a station design. The study suggests that 2-5 min is an average range for walking time, but it also depends on the passenger features and the station design. The walking time estimation result in this study provides conclusive evidence that some stations in Nanjing metro system require improvement on access facilities and guide sign to reduce access time for special groups of passengers.
Passengers' waiting time should be managed by considering both the headway and the passenger volume. Meanwhile, as a vulnerable group, more attention should be paid on elder/disabled passengers, since they have longer walking time and waiting time than the other passengers, especially during the peak hours.
Some limitations should be mentioned in this study. Firstly, one should note that there is always a minor gap between scheduled timetable and actual operation time. Using scheduled timetable may cause a bias in the estimation results. The real operation timetable (if available) should be applied in future studies to optimize the results. Secondly, the number of SCD inputs can be enlarged. Larger data input can improve the estimation accuracy and decrease the variance within each group. Third, the walking time distribution captured in this study only represents the average speed level. The passengers with single feasible train choice somehow exclude people with slower walking speed. Moreover, the walking time in this study only consider the distance from entry gate to the platform due to the lack of information in SCD. Lastly, the assumption of neglecting train capacity constraint should be modified in the future study.

VI. CONCLUSION
In summary, this study developed two algorithms to determine the train choice set for all the passengers. The algorithms identified passengers with only one possible train choice within travel period, and provided dependable estimation on station walking time and waiting time. With detailed VOLUME 8, 2020 information provided by SCD, the differences in walking time in relation to travel periods and personal characteristics were investigated. Moreover, based on the walking time estimation, a new train assignment method was proposed in this study. The assignment results provided explicit travel information of all the passengers, including their waiting time on the platform. The result revealed the effect of passenger volume and headway design on overall waiting time. Further, it demonstrated the travel behavior difference related to passenger attributes and travel period.
The application of SCD in this study brings a new perspective for the research of passenger-to-train assignment analysis. Findings of the study provide implications for practical evaluation of operation and management performance of metro system, and offer referential information on timetable optimization and metro station design. Future study will focus on developing integrated train assignment method in metro network by considering the transferring passengers.

CONFLICTS OF INTEREST
The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.
WANJUN LI received the B.S. and M.S. degrees in civil engineering from Iowa State University, Ames, IA, USA. She is currently pursuing the Ph.D. degree with the School of Traffic and Transportation, Beijing Jiaotong University, China. Her research interests include transportation big-data analysis, travel behavior analysis, metro planning, and traffic engineering.
XUEDONG YAN received the M.S. and Ph.D. degrees from the University of Central Florida, Orlando, FL, USA. He was the Director of traffic safety with the Transportation Research Center, Tennessee University, Knoxville, TN, USA. His research interests include road safety analysis, travel behavior and performance analysis, and transportation big-data analysis.
Dr. Yan is the Chief Editor of Journal of Transportation Safety and Security.
XIAOMENG LI received the Ph.D. degree from Beijing Jiaotong University, in January 2017. She is currently a Research Fellow with the Centre for Accident Research and Road Safety-Queensland (CARRS-Q), Queensland University of Technology. Her researches mainly focus on road traffic safety, driving performance, big-data analysis, ITS, and traffic engineering.
JINGSI YANG received the B.S. and M.S. degrees from Beijing Jiaotong University, China, where she is currently pursuing the Ph.D. degree with the School of Traffic and Transportation. Her major fields of study include traffic safety analysis, driving simulator analysis, and transportation big data analysis. VOLUME 8, 2020