Dynamic Robustness Analysis for Subway Network With Spatiotemporal Characteristic of Passenger Flow

The robustness is a crucial and essential problem of a subway network (SN), which can help us improve the efficiency of a transportation system. Several existing researches have analyzed the SN robustness based on the rail structure or the static distribution of passenger flow. However, the spatiotemporal characteristic of passenger flow also plays an important role in the SN robustness, since it can trigger some unexpected cascading failures in SN. Therefore, how to characterize the effect of this cascading failure on the SN robustness still remains an important and open problem. In this paper, we address the above problem as follows: (1) we propose a temporal subway network (TSN) to consider the dynamics of passenger flow in SN; (2) we adopt the linear threshold (LT) model to simulate the cascading failure process of TSN and propose a new robustness metric $R(t)$ to evaluate the effect of this cascading failure on SN robustness. Based on the Shanghai subway smart card data, we carry out extensive experiments to analyze the effects of the cascading failure on the Shanghai SN robustness. Experiments show that the Shanghai TSN robustness varies over time. More significantly, the large volume of passenger flow can increase the impact of failure modes (i.e., random and malicious failure modes) on the Shanghai TSN robustness.


I. INTRODUCTION
Credited for the advantages of high speed, safety, environmental protection, the urban rail transit has become the first option of transportation [1], [2]. However, some incidents (such as severe weather, sudden disaster, and terrorist attack) may lead to the subway station disruption [3]- [5]. Therefore, it is essential for us to investigate the robustness of subway system, which can largely ensure the reliance and efficiency of service that the subway system provides.
Based on the network science, the robustness analysis of subway network (SN) has made great progress [6], [7].
The associate editor coordinating the review of this manuscript and approving it for publication was Dominik Strzalka .
Currently, there exists two types of methods for SN robustness analysis, namely, the rail structure-based analysis [8]- [11] and passenger flow-based analysis [12]- [15]. The rail structure-based analysis focuses on the topological properties of subway. For example, Derrible et al. have studied the subway systems of 33 cities [9], through which they have found the scale-free and small-world characteristics of SN. They have also proposed some suggestions (such as increasing additional transfer stations) to improve the SN robustness. Mouronte et al. have analyzed the characteristics of urban bus and subway networks of Madrid [10], through which they have found the structure parameters (such as the shortest distance between stations, betweenness and detection of clusters) can help to improve the robustness of transport networks. Based on the FMECA method (i.e., failure mode, effects and criticality analysis) [16], Deng et al. have proposed a new framework, through which they can study the physical vulnerability of subway system [11]. The results show that train is the most vulnerable functional module in subway system.
Considering the large volume of passenger flow in the SN [17], [18], the network robustness is likely to go beyond the issue of pure topology of rail structure. Therefore, the passenger flow-based analysis pays much attention to the properties of SN with respect to the time and passenger flow. For example, Sun et al. have addressed the issue of SN robustness from the perspective of line operation by exemplifying the Shanghai subway system [12]. The result shows that the subway lines with a large passenger flow volume generally have a significant impact on the network vulnerability. Xiao et al. have proposed some new dynamic metrics that can reflect the local and global features of node's degree and betweenness, through which they have found the heterogeneity and vulnerability of Beijing subway network vary over time when passenger flow changes dynamically [13].
Furthermore, due to the spatiotemporal characteristic of passenger flow in SN [19], the failure caused by the disrupted stations will diffuse to the global SN or at least to a large part of it [20]- [22], that is, the cascading failures occur in the SN [23], [24]. In order to explore the cascading failure caused by the passenger flow in SN, some cascading failure models (such as the load capacity model [25], coupled map lattice model [26] and linear threshold model [27], [28]) have been improved to simulate this cascading failure process. For example, both Ma et al. and Shen et al. have proposed an improved coupled map lattices model to analyze the cascading failure process of Xi'an SN [14] and Nanjing SN during a certain time period [15], respectively. Ma et al. have found there is a two-stage cascading failure process of a network under the condition of passenger flow overload. Shen et al. have found that it is easy for the largest strength station disruption to trigger a global network failure.
However, it appears apparent that the cascading failure will affect more people than that occurs at other time, if stations or lines are disrupted at some time with a large passenger flow. Therefore, it matters to investigate the impact of the cascading failure occurring at different times on the SN robustness. In this paper, we aim to answer two main questions as follows: • How to reflect the spatiotemporal characteristics of passenger flow in the SN?
• How can the dynamic effects of passenger flow on the SN robustness be characterized?
In order to address the above research problems, we first design a temporal subway network (TSN) to reflect the spatiotemporal characteristic of passenger flow. More specifically, we construct the TSN based on the L-Space method [33]. A subway station is denoted as a node and the pathway between two adjacent stations is formulated as an edge. Each edge weight is calculated by estimating the whole travel routes, and then assigning passenger flows to each pathway at a specific time. Second, we adopt the linear threshold (LT) model to simulate the cascading failure process of TSNs at different times. Third, to evaluate the dynamic effect of this cascading failure on the network robustness, we propose a new robustness metrics R(t). Finally, we construct the Shanghai TSN based on published subway datasets (e.g., the rail structure, running data, traveling data based on the smart card). Moreover, we simulate the cascading failure process of Shanghai TSN by the LT model and analyze the dynamic robustness of such network.
The main contributions of this paper can be listed as follows: 1) We propose a temporal subway network (TSN) for characterizing the dynamics of passenger flow. 2) We propose a new robustness metric R(t) to evaluate the dynamic robustness of TSN. 3) We carry out extensive experiments to reveal the dynamic robustness of Shanghai TSNs at different times by the LT model and R(t).
The rest of this paper is organized as follows. Sec. II first illustrates the formulation of TSN, and then introduces the LT model and robustness metrics. Sec. III implements some analyses to reveal the dynamic robustness of Shanghai TSN. Finally, Sec. IV concludes this paper.

II. PRELIMINARIES
This section first introduces the formulation of temporal subway network (TSN), followed by some node property metrics of TSN in Sec. II-A. Then, Sec. II-B presents the linear threshold (LT) model. Finally, Sec. II-C describes a new robustness metric.

A. TEMPORAL SUBWAY NETWORK 1) FORMULATION OF TEMPORAL SUBWAY NETWORK
In this paper, a temporal subway network (TSN) is constructed by L-Space method [33], through which we can display the connectivity feature and assign the passenger flows to each pathway. More specifically, a station is denoted as a node and the pathway between two adjacent stations is formulated as an edge. Let N denote the total number of stations. A denotes the adjacency matrix of TSN whose size is N × N , with its elements defined as Eq. (1).
Generally speaking, subways have upstream and downstream directions [34]. Based on this fact, Def. 1 defines the TSN. A simple example of the TSNs at different times is shown in Fig. 1.  of edge weights. w ij (t) stands for the number of passengers passing through e ij at time t.
In order to assign passenger flows to each pathway at time t, we first estimate the whole travel routes based on the original-destination (OD) data of passengers taking the subway. Such data consists of many OD pairs (v i , v j , w ij ) [35], [36]. Each OD pair denotes there are w ij passengers from v i and v j . Generally, some factors (e.g., the length of a route, the transfer time and the weather condition) may affect the route choice of passengers [37], [38]. In this paper, if there are more than one routes from v i to v j , the route will be selected according to the behaviors and purposes of passenger travel [39], [40], such as the shortest route [41], the minimum number of transfer and the shortest transfer time. If there are still more than one routes that meet the above screening criteria, the route will be the one randomly selected from them.
Then, we estimate the time when passenger departs from the first station of the whole travel routes. Generally, when passengers enter the automatic fare gate, it takes a while for them to walk to the platform and wait until the train arrives [42], [43]. Passengers can only alight or board during the dwell time of the train at the platform. Therefore, we calculate the departure time of passengers (T d ) by Eq. (2).
if a train has arrived t ac + t dwell , if no train arrives (2) where t in denotes the check-in time of passenger. t walk denotes the walking time span from an automatic fare gate to a platform and t ac denotes the closest arrival time of the subway train which is greater than (t in + t walk ). All the arrival time in each station can be calculated based on the timetable of metro system. t dwell denotes the dwell time span of the train at the platform, which can also be collected from the timetable of metro system. In this paper, when the passenger arrival coincides with a train stop at the platform, we set t dwell = 0. Finally, we estimate the time span when passengers passing through the pathway between any two adjacent stations. Based on this, we can calculate the moments when passengers arrive at each corresponding station of the whole travel route, which starts from T d , so that the passenger flows can be assigned to the corresponding pathway at time t. If passengers need not transfer, this time information can be collected by the timetable of metro system. If passengers need to transfer, we should add the transfer time, which is calculated by Eq. (3).
where DIS i (m, n) denotes the distance from v i in Line m to v i in Line n. NWS denotes the normal walk speed of passengers [44]. α is the congestion coefficient. If passengers arrive at the platform when a train has just stopped at the platform, we set t dwell = 0.

2) NODE PROPERTY METRICS OF NETWORK
Lots of metrics have been proposed to estimate the centrality of nodes in a network [45], [46]. In this paper, we utilize four node property metrics (i.e., degree, betweenness, strength and flow betweenness) to evaluate the property of node in the TSN. The details of such metrics are as follows: In G t , v i has the incoming and outgoing degrees. Incoming degree D in i denotes the number of edges that point to v i and outgoing degree D out i denotes the number of edges from v i to other nodes, which are given by Eq. (4) and Eq. (5), respectively. In addition, the degree of v i is given by Eq. (6).
In G t , the betweenness of v i (B i ) is defined as Eq. (7).
where σ od i is the number of the shortest paths from v o to v d that pass through v i in G t . σ od denotes the number of the shortest paths from v o to v d . The larger B i is, the more important the connection role of v i in G t becomes.

c: STRENGTH
The strength of a node v i (S i (t)) is the sum of the weight of edges that v i shares with other nodes at time t. Therefore, in G t , v i has the incoming and outgoing strengths. The incoming strength S in i (t) denotes the sum of the weight of edges that point to v i at time t and the outgoing strength S out i (t) denotes the sum of the weight of edges that from v i to other nodes at time t, which are given by Eq. (8) and Eq. (9), respectively. In addition, the total strength of v i (S i (t)) is given by Eq. (10).
A large S i (t) means that there are lots of passengers passing through v i at time t.
The flow betweenness of v i (F_B i (t)) in G t equals the passenger flow ratio of the shortest paths from all nodes to all other nodes that pass through v i in G t at time t and is defined as Eq. (11).
where f σ od i is the sum of the passenger flow volume on the shortest paths from v o to v d that pass through v i at time t. f σ od denotes the sum of the passenger flow on the shortest paths from v o to v d at time t. When F_B i (t) is large, there are lots of passengers on the shortest path that pass through v i at time t.

B. LINEAR THRESHOLD MODEL
In this paper, the cascading failure process of TSN is modeled based on the linear threshold (LT) model [27], [28]. In detail, each node v i has two states (i.e., the failed state and normal state). Each node v i has a threshold θ i which is selected randomly in the interval [0,1] [29], [30]. v i is influenced by its incoming neighbors v j based on the edge influence EI ij . The calculation of EI ij is shown in Eq. (12).
Here, (V ) denotes a set of the incoming neighbors. At time step t 0 , each currently normal node v i will become failed at time step t 0 + 1 if and only if the total edge influence of its failed and incoming neighbors is at least θ i , that is, v j ∈ t (V ) EI ij ≥ θ i . Here, t 0 (V ) denotes a set of the incoming neighbors of v i that are failed at time step t 0 . Fig. 2 illustrates an example of the above process.
For an initial set of failed nodes, the cascading failure spreads out deterministically in discrete steps. Initial failure nodes are used to simulate network under different failure modes in real life [31], [32]. When there is no failed node in G t , such process will be terminated. Let F denote the total number of failed nodes at this time. In this paper, we call F as the cascading failure size of G t . Since, the edge influence EI mj is greater than the threshold θ m , v m is failed at time step t 0 + 1. Similarly, we can see v k is failed at time step t 0 + 1. However, EI ni ≤ θ n , so v n is still in normal state at time t 0 + 1.

C. EVALUATION OF ROBUSTNESS
In order to evaluate the dynamic effect of the cascading failure caused by the passenger flow on the SN robustness, we propose a robustness metrics R(t). R(t) couples the relative size of largest component [47], [53] and operational efficiency [13], [48] by using a coupling coefficient ε, which is given by Eq. (13).
where R(t) denotes the robustness of G t . LC denotes the relative size of largest component, which is defined in Eq. (14). OE(t) denotes the operational efficiency of G t as defined in Eq. (15). max T ∈[t 1 ,t 2 ,t 3 ,......] {OE(T )} denotes the maximum value of the operational efficiency at different times T . In particular, the networks at the time set [t 1 , t 2 , t 3 , . . . . . .] should reflect the difference of passenger flow volume. F_OE(t) denotes the operational efficiency of G t when there are failed nodes in G t . ε is a coupling coefficient to evaluate the importance of the effect of network topology on the robustness of G t . (1 − ε) is used to evaluate the importance of the effect of passenger flow on the robustness of G t . In this paper, ε is quantified as Eq. (16).
where N is the number of nodes in the largest component after some nodes are failed. N is the total number of nodes. LC is used to characterize the robustness of G t from the perspective of network scale. The higher the value of LC is, the more robust a network becomes when suffering attack. where OE(t) denotes the operational efficiency of G t . d ij is the number of edges along the shortest path from v i to v j in G t . w ij (t) denotes the weight of edges along the shortest path from v i to v j at time t. OE(t) is used to estimate the averaged passenger flow volume of edges on the shortest path. The higher the value of OE(t) is, the more passengers are transported in the pathway between two stations at time t.
where v i ∈V S i (t) denotes the total sum of node strengths in G t . max T∈[t 1 ,t 2 ,t 3 ,.....] { v i ∈V S i (T )} denotes the maximum value of the total sum of node strengths in the TSNs at different times. Based on the above-mentioned works, Fig. 3 presents the flowchart evaluating the dynamic robustness of a real TSN. Specifically, we extract the OD data from our raw data firstly by cleaning and processing it. Then, we use the OD data to construct real TSNs at different times based on the formulation of TSN in Sec. II-A1. In addition, we use the LT model to simulate the cascading failure of these real TSNs. Finally, we evaluate the effect of the cascading failure on these real TSNs robustness based on the proposed metric R(t).

III. SIMULATION AND ANALYSIS
In this section, we first introduce the datasets and statistical analysis in Sec. III-A. Then, following the construction of a Shanghai TSN in Sec. III-B, we carry out some experiments to explore the dynamic robustness of Shanghai TSN in Sec. III-C.

A. DATASETS AND STATISTICAL ANALYSIS 1) DATASET
The contents datasets use in this work are (1) Shanghai subway lines and stations data, (2) Shanghai subway running data and (3) Shanghai subway smart card data. The details of the three datasets are as follows.  As some existing examples from 13th April 2015 can demonstrate the details of the above three datasets, we list some ones in Table 1.

2) STATISTICAL ANALYSIS
Existing studies have shown the dynamic features of human behaviors, such as the human mobility [49] and online behaviors [50]. The spatiotemporal characteristic of passenger flow greatly affects its own dynamic distribution [51], [52], which can help us construct the Shanghai TSN. We select three weeks' data (i.e., April   April 6 is an exception, which is known as Tomb Sweeping Festival, when the residents switch back and forth between working mode (from Monday to Friday) and holiday mode (from Saturday to Sunday) [53]. Moreover, we use the Pearson correlation coefficient [54] to analyze the correlation of three weeks' data. The results show that the correlation coefficient of two weeks is 0.89, 0.86 and 0.95 between W1 and W2, W1 and W3, W2 and W3, respectively. Based on this, we can conclude that the distribution of weekly passenger flow is regular and similar.
In particular, as shown in Fig 6, we use the box-whisker plot to analyze the difference in the distribution of passenger flow during different time periods on weekday. According to the result, we observe the passenger flow data is close to its mean or median, with few outliers. Therefore, the passenger flow data among the three-week is similar. Obviously, the distribution of passenger flow characterizes the morning and evening rush hours on weekdays. Based on the above analyses, this paper divides each weekday into several travel periods [55], which is shown in Table 2.

B. SHANGHAI TEMPORAL SUBWAY NETWORK
Before constructing the Shanghai TSN, some relevant assumptions are listed as follows to simplify the network construction process: 1) The departure time of each departure station in each line is 5:30 am. 2) The departure interval between two adjacent subway trains during the rush hours is 3 minutes and during the off-peak hours is 6 minutes. 3) The dwell time span of each subway train at the platform is 30 seconds. 4) The normal walking speed of each passenger is 60m/s [44]. 5) When passengers enter the automatic fare gate, they will immediately go to the platform to take the subway. 6) The congestion coefficient during the rush hours is 0.8 and during the off-peak hours is 1. where 1) and 3) are the value with highest frequency in the departure time of each departure station in each line and the dwell time span of each subway train at the platform, respectively. In 2), 3 minutes and 6 minutes are calculated by rounding the average value of all the departure interval between two adjacent stations during the rush hours and off-peak hours, respectively.
As the distribution of passenger flow varies over time, the TSN differs at different moment. Despite this, it is unrealistic and unnecessary for us to construct TSNs at every moment. In Sec. III-A2, we have found that the changes of passenger flows on weekdays can reflect the characteristics of it. Therefore, with the partition of travel time periods in Table 2, we construct a corresponding Shanghai TSN every 5 minutes based on the data of 8:00-9:00 (morning rush hour), 12:00-13:00 (off-peak hour) and 18:00-19:00 (evening rush hour) on April 13th (Monday) in the Dataset 3. Fig. 7 shows 36 Shanghai TSNs at different times on weekdays.

C. DYNAMIC ROBUSTNESS ANALYSIS
In order to trigger the linear threshold (LT) model, we first define five kinds of failure modes based on the node property metrics in Sec. II-A2, which are shown as follows.
(1) Degree failure mode is defined as selecting the failed nodes from a network based on the descending order of node degree.  Then, we select 5, 10 and 30 initial failure nodes under each failure mode to trigger the cascading failure in each Shanghai TSN, respectively. In order to eliminate the influence of the node threshold θ i in the LT model, we repeat each cascading failure experiment caused by one failure mode for 500 times. The final result is the average of these 500 experiments. Figure 8 compares the cascading failure size (F) of Shanghai TSNs under various failure modes at different failure times. Figs. 8(a)-(f) show that the cascading failure size of Shanghai TSN is smaller under the random failure mode than other failure modes. This finding suggests that when the number of initial failure nodes is relatively fewer (i.e., 5 or 10 initial failure nodes), the cascading failure is unlikely to occur under the random failure mode in Shanghai TSN, but it is likely to occur under other failure modes instead. However, as shown in Figs. 8(g)-(i), when the number of initial failure nodes is large (e.g., 30 initial failure nodes), the Shanghai TSN under random failure mode can also cause larger cascading failure. Moreover, the gap of the cascading failure size of the Shanghai TSN under the degree failure mode and other failure modes becomes increasingly large. According to this, we can conclude that with the initial failure nodes increasing, the larger cascading failure in Shanghai TSN is prone to occur under the degree failure mode. In addition, when the initial failure nodes and failure modes are identical, the cascading failure size of Shanghai TSN during the rush hours is far greater than that during other hours. Meanwhile, there also exists fluctuation among each cascading failure size of Shanghai TSN during the rush hours. In particular, during the morning rush hour, this fluctuation range is relatively large. For example, the cascading failure size of Shanghai TSN at 8:15 is larger than that at 8:40. These results suggest that the effect of passenger flow on the cascading failure size of Shanghai TSN varies over time. The larger volume of passenger flow is, the more likely the cascading failure will occur when the network is under the same failure modes.
Finally, we use the robustness metrics LC, OE(t) and R(t) to evaluate the effect of the above cascading failures on the Shanghai TSN robustness. More specifically, since the node VOLUME 8, 2020 threshold θ i is randomly selected in each experiment, the final failure nodes may be different even if the network is under the same failure mode. Therefore, we record the final failure nodes in each cascading failure experiment and sort the nodes based on the frequency of each failure node in experiments. Then, nodes are removed from the networks based on the above sequence. Based on this, we can evaluate the effect of the above cascading failures on the Shanghai TSN robustness by analyzing the dynamic changes in robustness metrics. Figure 9 uses LC and OE(t) to compare the Shanghai TSN robustness under various failure modes at different failure times. The specific method we use is to remove the final failure nodes in the cascading failure experiments where the initial failure nodes are 10. According to the changes in LC and OE(t), we find that different failure modes have different effects on Shanghai TSN robustness. For example, Figs. 9(a)-(f) show that the Shanghai TSN is more robust under the random failure mode than other failure modes. Therefore, this finding shows that both LC and OE(t) can reflect the effects of failure modes on the network robustness to a certain extent. However, even during the rush hours, the changes in LC is unapparent when the network is under the degree failure mode, strength failure mode or random failure mode. This is because LC is used to evaluate the network robustness from the perspective of network scale, without considering the impact of passenger flows on the network robustness. Actually, although the number of final failure nodes is the same, the passenger flow volume is changing with time. In particular, for some stations in special functional areas (e.g., the working area), the passenger flow may be concentrated at a certain moment. Therefore, if these stations disrupt at certain times with a large passenger flow, the network robustness will be more affected than that at other times. OE(t) evaluates the network robustness from the perspective of passenger flow. The higher the value of OE(t) is, the more robust the network is at time t. From the changes in OE(t), we can observe that OE(t) is small during the off-peak hours. However, since the final failure nodes during the off-peak hours are far fewer than that during rush hours, the conclusion that the Shanghai TSN is the most vulnerable during the off-peak hours is against our normal cognition. Therefore, we propose a new robustness metric R(t) to overcome the problems caused by LC and OE(t) in evaluating the dynamic robustness of Shanghai TSN. Figure 10 uses R(t) to compare the Shanghai TSN robustness under various failure modes at different failure times. Compared with other failure modes, Shanghai TSN under the random failure mode is the more robust. This finding is understandable because the initial failure nodes in other failure modes are either the hub nodes or the nodes with large volume of passenger flow. In addition, with the number of initial failure nodes increasing, the Shanghai TSN robustness downtrend under each failure mode becomes more and more obvious. In particular, when the number of initial nodes is 30, the Shanghai TSN robustness under the degree failure mode appears the weakest. Meanwhile, as shown in Figs. 10(a), (d), (g), (c), (f) and (i), we can find that the changing trend of the Shanghai TSN robustness varies over time during the rush hours. However, as shown in Figs. 10(b), (e) and (h), during the off-peak hours, the changing trend of the Shanghai TSN robustness is basically a straight line. Therefore, Shanghai TSN is the most robust under the random failure modes during the off-peak hours. These findings suggest that when the networks have a large volume of passenger flow, the Shanghai TSN robustness can vary over time apparently. However, when the passenger flow volume in the networks is comparatively small, the Shanghai TSN robustness basically depends on the change of network topology [56]. Furthermore, through comparing the Shanghai TSN robustness under the same failure mode at different times, we find that the passenger flow volume can increase the impact of failure modes on the Shanghai TSN robustness. When the passenger flow volume is large, it is tougher for Shanghai TSN to tolerate the cascading failure caused by each failure mode.

IV. CONCLUSION
To reflect the dynamic effect of passenger flow on the subway network robustness, we first propose a temporal subway network (TSN). Then, we take advantage of the linear threshold (LT) model to characterize the cascading failure process of TSN. In addition, we propose a new robustness metric R(t) to evaluate the effect of this cascading failure on the TSN robustness. By simulating the above methods to 36 Shanghai TSNs, the main and practical findings of this paper are as follows: • Different failure modes will cause different cascading failure. The cascading failure rarely occurs in the Shanghai TSN under the random failure mode, but easily occurs under other malicious failure mode (i.e., degree, betweenness, strength, flow betweenness oriented failure modes).
• Under the same failure mode, Shanghai TSN during the rush hours will cause a larger cascading failure than that during the off-peak hours.
• When the passenger flow volume is comparatively small, the Shanghai TSN robustness basically depends VOLUME 8, 2020 on the change of network topology. Moreover, the large volume of passenger flow can increase the influence of failure modes on the Shanghai TSN robustness. From the above findings, we state that at different times, it is necessary for us to adopt different measures to manage stations. We need to focus not only on the hub stations of the subway system, but also the stations with a large volume of passenger flow at a certain moment. In particular, during the rush hours, due to the large passenger flow volume, the SN robustness varies over time. Therefore, to improve the SN robustness, we suggest that (1) updating the changes of passenger flow in time to grasp the dynamics of passenger travel patterns [57]; (2) increasing the travel routes between two stations to decrease the edge influence EI ij [58], so as to reduce the cascading failure size; (3) identifying the priority restoration stations based on the passenger flow volume and rail structure [59]. YI