Revealing Urban Traffic Demand by Constructing Dynamic Networks With Taxi Trajectory Data

As a crucial travel mode, taxi plays a significant role in residents’ daily travel. Uncovering taxi traffic demand has become a hotspot in transport studies. Previous researchers pay more attention to the statistical characteristics of taxi trips, while few studies focus on the dynamic features in different periods of a day. In this article, we study the taxi travel demand by constructing dynamic networks based on taxi trajectory data. In addition, relationship between travel intensity and point of interest (POI) in Xiamen, China is discussed. Firstly, the study area is divided by 1km $\times1$ km uniform cells. The pick-up and drop-off activities of passengers are recorded for each cell. Secondly, the networks are constructed by regarding each cell as a node and regarding taxi trips from a cell to another cell as an edge. On this basis, we divide a day into 12 periods by two hours and construct the networks for different periods. Finally, correlation between travel intensity and POI intensity is detected with regression analysis. Results show that the taxi trip networks have large clustering coefficient and small shortest path length, which indicates they are ‘small world’ networks. Moreover, the taxi trip networks are disassortative networks that hotspot areas tend to connect with the common areas. Furthermore, the taxi trip length in a day follows a lognormal distribution and the peak hour of taxi trip appears around midnight. Finally, a cubic polynomial curve could fit the relationship between travel intensity and POI intensity. Our findings provide a new insight for understanding the traffic demand of taxi.


I. INTRODUCTION
Traffic demand plays an important role in urban traffic planning, traffic management and city planning. However, it is hard to grasp urban traffic demand because the structure of traffic flow is complex and changeful. Traditional traffic survey cannot provide enough information to reflect actual traffic demand for the big cities because the survey is timeconsuming and expensive. Big geospatial data (e.g. mobile phone data, bike-sharing order data and taxi trajectory data) provide a large number of human positions to uncover the traffic demand. The data is more accurate, continuous, costeffective compared with traffic survey data. In recent years, researchers have investigated traffic problems with geospatial data in several aspects such as traveler's behavior, travel The associate editor coordinating the review of this manuscript and approving it for publication was Yilun Shang . pattern and spatial-temporal characteristics [1]- [3]. To understand the traffic demand, a common way is dividing studied area into small parts [4]- [7].
Taxi is a significant component of urban transit systems, which could provide 24-hour service. It is an important supplement for other transit options, especially at night when the most transit options stop operating. Taxi trajectory data that record a series of GPS points and other information such as instantaneous velocity, status and time, which has been widely used in analyzing characteristics of taxicab movement [8]- [10], travelers' mobility [11]- [13], travel pattern [14], [15], traffic congestion [16], [17], route choice [18]- [20], taxi trip characteristics [21], [22] etc. Taxi operates without fixed line, schedule and stations, which can achieve the spatiotemporal characteristic of travelers. Three have been many studies focusing on spatiotemporal characteristics of urban travel demand by using taxi trajectory VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ data. Cui et al. extracted travel demand from taxi trajectory data and identified the mismatch between travel demand and transport network services [23]. Sun et al. proposed a framework to derive the dynamics of the pick-up and dropoff locations of taxi [24]. To understand the travel demand, the study areas have always been divided into small grids, cells or zones. Tang et al. proposed an Entropy-Maximizing method to extract OD information by dividing study area into 40 × 60 grids [22]. Zhang et al. studied the taxi OD trips with different cell sizes and found trip length distribution is negative exponential across scales [25]. Ke et al. divided Hangzhou, China into 7 × 7 grid, and each grid is 4.77 km × 4.81 km. They studied the short-term foresting of passenger demand using DiDi data [26].
Recently, complex network theory has been introduced to analyze the taxi trip structure, where small grids or cells are taken as nodes and trips from one node to another node are taken as directed edges. The network can be called spatialinteraction network, which provides a new insight to understand the taxi trip structure and cities' functions. Liu et al. proposed a complex network method to study the community structure of taxi trips [27]. He et al. partitioned studied area into grids with 100 m × 100 m and constructed spatiotemporal traffic diagrams to extract traffic dynamics [28]. Yang et al. divided study area into zones by transportation planning method and found that the complex taxi network exhibited ''small world'' characteristic [29]. Yang et al. proposed a weighted k-core decomposition method incorporating complex network theory and weighted distance, and three layers have been found in spatial-interaction network: a core layer, a bridge layer and a periphery layer [30]. He studied taxi networks with different cell scales, and observed that the degree distributions follow the power law and the power law exponents decreases with the growth of square cells [31].
Urban traffic flows contain many types, such as commuter flow, leisure flow and transfer flow. Some types of urban traffic flow highly depend on the urban form such as commuter flow that moves between residence and workplace. The built environment has been widely discussed for the impact on residents' travel in three factors: density, diversity and design [32]- [35]. The aforementioned study indicates that there is a close relationship between taxi driver activities and built environment characteristics [36]. Li et al. used points of interest (POIs) to apply a gravity model to evaluate the interzonal commuting patterns [37]. Zhang et al. studied the relationship between travel intensity and POIs based on carhailing data, and the results show that some types of POIs such as traffic facilities have great impacts on pick-up and drop-off [38]. Liu et al. argued that higher proportion of commercial area and public service area produced greater taxi demand, while the proportion of residential area and land use mix have negative impacts on taxi demand [39]. Based on the strong relationship between taxi traffic demand and POI, taxi trajectory data could be used to identify the urban functional regions [40].
There were many researches focused on the spatialinteraction network by dividing study area into small grids. However, there was little work to study the dynamic change of taxi networks due to random trip demands in different periods. The traffic demand fluctuates in different time period, which is very important for urban traffic management. For example, the traffic demand is high in peak time, traffic manager should provide more service to satisfy the requirement. At present, the fluctuation of traffic demand from network perspective during a day is not clear, which hinders the improvement of taxi service. To fill this gap, this article proposes a spatiotemporal analysis method by constructing taxi trip network to detect the spatiotemporal characteristic of taxi trips. Moreover, we intend to build a relationship between taxi trip demand in cells and the number of POI. This article intends to provide a new insight to understand the traffic demand by constructing dynamic taxi trip networks. The rest of paper is organized as follows. Section 2 gives the study area and data description. Section 3 introduces the method. Spatiotemporal characteristics of taxi trips are given in section 4. Section 5 describes network-based characteristics. Section 6 introduces relationship between taxi trips and POI. Conclusions are given in section 7.

II. STUDY AREA AND DATA DESCRIPTION A. XIAMEN CITY
The case study area is Xiamen, China, which locates in the southeast of China (see Fig.1). The main study area locates in the place where longitude ranges from 117.89E to 118.35E, latitude ranges from 24.42N to 24.74N. At present, The population of Xiamen is approximate 4.29 million. The length from east to west is about 34 km and north-south length is about 44 km. The entire region is divided into 1496 uniform grids, and each grid is 1 km × 1 km.

B. TAXI GPS TRAJECTORY DATA
We collected a week taxi GPS trajectory data from July 2, 2018 to July 8, 2018. Moreover, we select a weekday July 3 to study the dynamic the characteristics of taxi trips within a day. There are totally 4973 vehicles and approximate 7 million GPS trajectory records on July 3. The taxi data content contains taxi id, timestamp, longitude, latitude, instantaneous speed and the status of taxi (vacant or occupied). In the  dataset, '0' represent the vacant status, while '1' represent occupied. Table 1 shows the sample of a GPS record data.
The GPS trajectory data record taxi vehicles' information, including the status of travelers. Fig. 2 illustrates the GPS trajectory data of a taxi in a day. The red points stand for occupied vehicles, while green points are with vacant status. Before the data is used, a data cleaning process is implemented. The data will be removed if the GPS point is not within the scope of studied area or the length of trips are smaller than 200 m. From the trajectory data, we can extract each trip information such as start point, end point, trip distance, trip start time and trip cost time.

C. POI DATA
In this article, we use Python to extract POI data from Baidu map (https://map.baidu.com/). The POI data contains 13 types such as dinning facility, shopping facility, service facility, traffic facility, etc. There are totally 83816 POIs in the study area. Each POI consists of four fields: type, name, longitude and latitude. We describe the example of POI data in Table 2. Since travelers' behaviors have high correlation with the type and scale of POI, we intend to identify the relationship between trips and the number of POIs in the grids in this article. Fig. 3 shows the distribution of POIs in Xiamen. As can be seen, most POIs concentrate in Xiamen island including Huli and Siming districts.

A. NETWORK CONSTRUCTION
In urban area, travelers' movement trips connect different areas with their needs. However, it is hard to know the exact places where travelers attempt to get to and the places where   they come from. Therefore, we adopt to divide study area into small square cells. The cells can be regarded as nodes, and the trips between nodes can be regarded as directed links or edges. Then we can construct spatiotemporal networks to detect the taxi trip patterns by taxi trajectory data. Fig.4 shows the process of constructing a network. To build a network by taxi trips of a period T , the spatial area could be divided into same large square cell with a side length L. An area of ML × NL can be divided into M × N cells, each cell C i stands for a node v i . To conveniently delineate nodes, nodes are given a consecutive series number [1, M × N ]. We use G = (V , E, W ) to represent a directed network, where V is the set of nodes, E is the set of edges, W is the set of edge weight. e ij = 1, if there is an edge from node v i to v j ; otherwise, e ij = 0. w ij is the weight of the edge e ij , which stands for the number of taxi trips between node i and node j.
Node degree k i is defined as the number of nodes connect node i, a node with large degree means it is important in the network. For a directed network, node degree contains in-degree and out-degree. In-degree means the number of nodes pointed to the node i, which is calculated by while out-degree is the number of nodes pointing from the node i, which is defined as So, the degree of a node is sum of in-degree and out-degree:

2) WEIGHTED DEGREE
In this article, we defined a weighted degree of nodes to measure the number of trips. Let w ij stand for the number of trips between node i and node j, similarly, the weighted in-degreek in i , weighted out-degreek out i and weighted degreẽ k i are defined as follows:

3) AVERAGE SHORTEST PATH LENGTH (ASPL)
The average shortest path length is a key role to detect the small world property of networks, which is defined as the number of steps along the shortest paths of all possible pair of network nodes. It is calculated as where N is the total number of nodes in the network, d ij is the shortest path length between node i and node j.

4) CLUSTERING COEFFICIENT
Clustering coefficient describes the local cohesiveness of current node or the extent to which the node in the network are cluster together. For a node i with k i connected neighbor nodes, there are at most k i (k i −1)/2 edges among the k i nodes. So, the clustering coefficient is defined as where E i is the actual number of edges among the k i nodes. Obviously, 0 ≤ C i ≤ 1. The clustering coefficient of the network is the average value of all nodes.

5) MODULARITY
In complex networks, community structure is a common phenomenon. There are close ties in the community structure, while sparse connections between communities. Newman and Girvan [41] proposed an index Q to measure the community structures. For a division with g communities, then define a g × g matrix e whose component e ij is the fraction of edges in the original network that connects nodes in community i to those in community j. The modularity is defined as where x means the sum of all elements of x. It can be achieved 0 ≤ Q ≤ 1. Q = 0 means the community structure is not stronger than random choice. The larger value of modularity indicates stronger community structure.

6) ASSORTATIVE COEFFICIENT
Assortative coefficient, which describes the tendency of connections between nodes according to node degree. It is a crucial indicator to measure network structure, which is defined as follows [42].
where M represents the total number of edges, j i and k i are the degree of nodes connected by edge i respectively, with i = 1, · · · , M . The assortative coefficient is in the range [−1, 1]. The value is closer to 1, the more assortative the network shows, which means nodes with high degree tend to connect nodes with high degree; otherwise, the value is closer to −1, the more disassortative the network exhibits, which means nodes with high degree tend to connect nodes with low degree.

IV. SPATIOTEMPORAL CHARACTERISTICS OF TAXI TRIPS A. POINTS OF PICK-UP AND DROP-OFF
The density of pick-up and drop-off points of taxi reflect the traffic demand in the area. Generally, the well-developed areas will have more traffic demand. Identifying the hotspots is crucial for many fields such as city management and traffic control. To illustrate the spatiotemporal characteristic of taxi trip, we have mapped the taxi trips on a weekday (Fig. 5). Fig. 5(a) shows the distribution of pick-up points and Fig. 5(b) shows the distribution of drop-off points. It is noteworthy that the pick-up points are the start points of trips and the dropped off points are the end of trips. We can see that the pick-up points are more concentrated than drop-off points. That is because travelers tend to find conspicuous places to call taxis for making taxi drivers easily find them. From the figure, we can also see that the taxi data are mainly distributed in Xiamen island.

B. CHARACTERISTICS OF TAXI TRIPS
In general, traffic demand fluctuates during different time period in a day. Fig. 6 shows the number of taxi trips per hour from July 2, 2018 to July 8, 2018. Note that, the smallest number of trips appear between 5:00 and 6:00. Another low point appears between 17:00-18:00. The reason is that it is the time to change shifts when some taxi drivers stop service. It is noticed that the peak values emerge around midnight. On weekdays, the peak values appear a little before midnight. While they appear a little after midnight on weekend. The results demonstrate that the public transport system stop operating at midnight and many travelers have to use taxi to travel.   Fig. 7(a) is the fitted curve. Fig. 7(b) shows the trip time distribution, which can also be fitted by a lognormal distribution. We can see that there is a small proportion of trips that have a long distance and spend a long time. Most trips are within 10 km and 40 minutes.
To specifically explore the occurrence and attraction of each cell, we divide a day into 12 periods: 0:00-2:00, 2:00-4:00, 4:00-6:00, 6:00-8:00, 8:00-10:00, 10:00-12:00,  12:00-14:00, 14:00-16:00, 16:00-18:00, 18:00-20:00, 20:00-22:00 and 22:00-24:00. In this article, we use P 1 , · · · , P 12 to represent the 12 periods. For a cell in one period, the number of pick-ups and drop-offs are different. The number of pickup and drop-off could reflect the occurrence and attraction of cells, which have been counted respectively. Fig. 8 illustrates VOLUME 8, 2020 the distributions of occurrence and attraction of cells. We remove the values equals 0 in the cells which means there is no trips. The figures show there is a small number of cells that have very large number of pick-up and dropoff points. These cells concentrate in the downtown areas. Take the number of pick-up points for example, most cells' value is smaller than 200 in each period of a day. There exist some hotspots in the figure. We can see that the hotspots happen around midnight, which indicates many travelers use taxi when the public transport stop operating. The hotspots in night are located in the central of city and the transportation hubs.
We construct the taxi trip network by dividing the study area into small grids. Trips that start in the same cell are regarded as starting from the central point of the cell, and trips that end in the same cell are regarded as ending from the central point of the cell. Fig. 9 shows the distributions of taxi trips in different periods. Each period is two hours long. We intend to exhibit the trips changes during different time. The trip density is the smallest during 4:00-6:00, which testifies that the traffic demand is smallest in the period. Moreover, the trips within Xiamen island are with high density.

V. NETWORK-BASED CHARACTERISTICS
In this part, we discuss the taxi trip network structure using common complex network indicators. In the networks, there exists some nodes that do not connected with other nodes, which are isolated nodes. We remove the isolated nodes when analyzing the network. Fig. 10 shows the taxi trip network on July 3, 2018, which contains 630 nodes and 21349 edges. The colors mean the community structures.

A. CHARACTERISTICS OF NETWORKS IN DIFFERENT PERIODS
To explore the characteristics of networks in different period, we calculate the basic measures of networks in Table 3. The average degree of taxi trip network in a day reaches 33.88, which means the network connection is tight.  Furthermore, we note that all the networks in the table have large clustering coefficient and small average shortest path length, which demonstrates that the taxi trip networks are 'small world' networks. The modularity values of all networks are very small. That indicates that the community structures of taxi trip networks are not obvious. The reason is that taxi has no fixed routes. Regions that have long distances can be connected with taxi trips. Moreover, the assortative coefficients of all networks are negative, which means the taxi trip networks are disasssortative networks. In reality, travelers prefer to use taxi from suburb to core urban area when the public transport cannot satisfy their travel requirement. The average degree is smallest with value 10.812 in P3, which demonstrates the traffic demand is smallest. The transportation efficiency is worst in P3 that ASPL reaches to 2.744, while it is best in P11 with value 2.455. The network of whole day outperforms the networks in different time periods.

B. DISTRIBUTIONS OF DEGREE DISEQUILIBRIUM FACTOR
Generally, the inbound and outbound traffic flows for nodes are of disequilibrium due to many reasons such as the competition from other traffic modes. The traffic flow disequilibrium could result in heavy traffic congestion and the mismatch between traffic demand and supply. There have been many works to measure the disequilibrium by Gini coefficient, Theil index and complex network indicators [43]- [46]. In this article, we use degree disequilibrium factor (DDF) to represent the disequilibrium of taxi traffic demand for nodes. It is defined as follows.
Obviously, DDF ranges from 1 to 2. The larger value means the more unbalanced of the node. Fig. 11 shows the distributions of DDF in 12 periods. As the figure shows, there is a large proportion of nodes' DDF values under 1.5, which indicates the disequilibrium is not serious. The proportion tends to decrease between DDF = 1 and DDF = 2. However, it increases to a large proportion when DDF equals 2. DDF = 2 means the one of in-degree and out-degree of the node equals 0. We analyze the nodes with value DDF = 2 and find that these nodes' degrees are almost smaller than 4, which means these nodes have less connections with other nodes. Furthermore, the connections concentrated in urban core areas. Additionally, these nodes are located in suburb areas with less transit connections with core areas. The proportions of these nodes are very large (approximate 30%). The results imply that taxi plays an important role in connecting peripheral areas and urban core areas.

VI. RELATIONSHIP BETWEEN TAXI TRIPS AND POI NUMBER
Generally, the number of POI in one area reflects the prosperity of the area. In other words, a region with a large number of POIs would have more pick-ups and drop-offs, which means the development of the area is better. In this part, we attempt to study the relationship between the number of POI and the number of taxi trips. Fig. 12 illustrates the number of POI in cells from large values to small values. The cells that have very large number of POI account for small proportions. These cells mostly concentrate in the downtown area. An exponential curve could fit the value distribution.  In this part, we use weighted degree to represent the travel activity because it represents the sum of outbound and inbound flow. We plot the relationship between weighted degree and the number of POIs in cells in Fig. 13. From the figure, we can see that about 70% of cells that have weighted degree smaller than 50. There is a small number of cells that have very large weighted degree. As can be seen, the value of weighted degree increases as the POI number increases. A cubic polynomial curve could fit the relationship. The results show that there is a close relationship between POI number and traffic demand.

VII. CONCLUSION
This article proposes a framework to analyze taxi traffic demand by constructing complex networks based on taxi trajectory data. The study area is divided into small cells with size 1km × 1km. Each cell is regarded as a node and taxi trips that start from a node and end in other node are regarded VOLUME 8, 2020 as edges. In order to study the fluctuation of traffic demand, we divide a day into 12 periods by two hours and construct networks in each period. Moreover, we count the number of POI for each cell, and build the relationship between travel intensity and the POI intensity.
The results show that the number of taxi trips fluctuate during a day and it reaches the peak around midnight, because most public transport stop operating at that time. Statistical results show that the taxi trip length in a weekday follows a lognormal distribution. The pick-up points are more concentrated than drop-off points. For the taxi trip networks, they exhibit a 'small-world' characteristic that having a large clustering coefficient and small shortest path length. The community structures of taxi trip networks in all periods are not obvious, while they show a disassortative characteristic. The results show that residents prefer to use taxi between hotspots and the common region, which indicates that taxi is a significant supplement for the public transportation. The network performance is worst during 4:00-6:00 according to degree and ASPL. The traffic demand in that period is smallest. The network performance becomes better as time goes on, and it reaches the peak at night. Furthermore, we count the number of POI in each cell and achieve that the cells that have large POI intensity account for a small proportion. A cubic polynomial curve could fit the relationship between travel intensity and POI intensity.
This study could provide a new sight to understand the travel demand of taxi. Future research should focus on the following aspects. First, the partition method and scale of the studied area should be discussed specifically. Second, different POI type should be given different weight. Finally, more kinds of data will be introduced to analyze the traffic demand such as bus, metro and bike-sharing to gain accurate activities of residents.