A Spatio-Temporal Structured LSTM Model for Short-Term Prediction of Origin-Destination Matrix in Rail Transit With Multisource Data

Passenger assignment of rail transit has recently attracted increasing research interest due to its potential applications in large-scale intelligent transportation systems. In the rail transit system, the foundation of passenger assignment is passengers’ origin and destination demand (OD matrix). However, due to the nature of stochastic of the short-term dynamic OD matrix, how to accurately predict the distribution of passenger travel spatio-temporally is still an open challenge. In this paper, combined multisource data with deep learning method is proposed to improve prediction of dynamic OD matrix accuracy. Firstly, multisource data such as smart card data, weather data and mobile phone data are introduced. And after quantitative analysis of the inﬂuencing factors, choosing 31 features as model inputs. Secondly, considering the superiority of Long Short-term Memory Network in time series, we improve the structure of LSTM by redesigning the hidden layer and neuron, in view of the spatio-temporal characteristics of spatio-temporal Long Short-term Memory Network (STLSTM) of rail transit passenger ﬂow. Finally, using the Beijing subway network which had 54,056OD for veriﬁcation. Extensive experiments and evaluations on a large-scale dataset well demonstrate the superiority of STLSTM over commonly used prediction models and standard LSTM model for short-term prediction of dynamic OD matrix. In addition, the application method of multisource data in OD prediction in this paper can deal with more data from other sources to further improve the information exploit effect on passenger ﬂow law.


I. INTRODUCTION
Rail transit is an important means to alleviate the problems caused by urbanization such as traffic congestion and environmental pollution. Currently, to enhance the operation of rail transit, predictive control strategy is being applied. The basic logic is firstly to predict the passenger distribution in the network (normally every 15 minutes). Dynamically evaluate the distribution of network passenger flow, if there are local congestion problems in the distribution, put some countermeasures such as reschedule [1] and flow control [2] based The associate editor coordinating the review of this manuscript and approving it for publication was Sabah Mohammed . on the prediction to dynamically improve subway operation management capacity and passenger travel quality. The first step of this strategy is to predict the OD matrix of passenger flow accurately.
However, few methods can significantly improve the accuracy of dynamic OD matrix predictions in rail transit, especially during peak hours [3]. There are two parts of the main reason. One is the restrictions of the data used. Passenger travel is affected by a variety of factors. The data with a single source is difficult to include all passenger travel laws, which makes it difficult to improve the accuracy of prediction. In view of this situation, Ni et al. [4] and Zhang et al. [5] has made effective improvements. The second question is which predict method can be used to analyze the information in different dimensions. Valid information for each source needs to be retained to the maximum by the method, and meanwhile random quantities can be eliminated in order to truly improve the accuracy of predictions. This paper first introduces as much as multisource data from different sources, e.g. smart card data, mobile phone data, weather data etc. and then, it exploits the main features behind these data. Then, in view of the spatio-temporal characteristics of rail passenger flow, we improve the structure of LSTM by redesigning the hidden layer and neuron. That is, the STLSTM model is designed for the characteristics of spatio-temporal dimension data. By doing so, it shows that the accuracy of prediction can be significantly improved.

A. LITERATURE REVIEW
Accurate passenger demand forecast is an important basis for the release of rail transit information and the implementation of passenger control strategy. Previous studies have mainly focuses on forecast of station passenger flow of rail transit system and static OD prediction. Due to the large scale of rail transit network system, the complex of passenger travel behavior and other reasons, short-term dynamic OD passenger flow prediction has been seldom studied. Nevertheless, because the time series data of short-time OD passenger flow are similar to the time series data of station passenger flow and cross section passenger flow, the corresponding research results can also provide valuable insights for short-time OD passenger flow prediction.
The short-term passenger forecasting approaches can be generally divided into two categories: parametric and nonparametric techniques [6]- [8].
In the traditional parametric techniques that can be used to achieve short-term OD forecasting are mainly historical average [9], smoothing techniques [10], and autoregressive integrated moving average (ARIMA) [11]. The operation of the historical average model is the simplest. Although the method is simple to calculate and highly efficient, it is generally less accurate to predict with the uncertainty and nonlinear characteristics of passenger flow [9].The prediction is not stable enough [10] and it can be greatly affected by sudden passenger flow. The ARIMA model can effectively eliminate the short-term fluctuations in the OD passenger flow time series, so it can better capture the long-term characteristics of OD passenger flow distribution time series data. On the basis of the ARMA model, many researchers have proposed various variant models, some of which are wellknown, including the Kohonen-ARIMA model [12], Subset ARMA model [13] Seasonal ARIMA model [11] and so on. When the distribution of OD passenger flow changes greatly, there are some deficiencies in the ARIMA model prediction delay. So, this method is more suitable for relatively smooth traffic flow prediction, and it is difficult to apply to large-scale OD volume forecast.
For the non-parametric techniques, several methods that can be used to achieve short-term OD traffic forecasting are mainly Kalman filtering models [14], non-parametric regression [7], [15], Artificial neural network [16]. Kalman filtering model is an optimal self-regression data processing algorithm based on linear filtering theory, which updates the equation by adding new OD passenger flow data to achieve real-time OD passenger flow prediction. The model performs well in processing both smooth and non-stable data, but the method has the problem of easily insufficient filtering. Okutani and Stephanedes [17] applied the Kalman filtering model to short-term OD passenger prediction problems for the first time. Van Lint [18] had achieved online real-time prediction of passenger flow through Kalman filtering model. Yao [19] established a short-term OD matrix estimation state spatial model based on Kalman filtering method, and realized the dynamic OD matrix estimation of Beijing's whole network under the time granularity size of 15min, 30min and 60min. In addition, they [19] successfully reduced the forecast error of OD volume at 15min time granularity to 30.8% by minimizing the deviation of the current OD passenger flow at 60min time scale from the historical OD passenger flow in the same period. Non-parametric regression such as the k-Nearest Neighbors algorithm (k -NN) [20] and Support Vector Machine (SVM) is widely used in prediction. Castro-Neto [22] used the Online-SVR (support vector regression) model to achieve short-term traffic flow forecasts in typical and atypical environments. Castro-Neto [23] further improved the online-SVR model into a supervised online weighted SVM model (OLWSVR model) with good prediction results. Jeong et al. [23] used several forecasting models such as SVR to investigate the effect of temporal and spatial features as well as external weather influence on passenger flow forecasting, and the validity of this method is verified by the actual data of Shenzhen metro. Artificial neural network (ANN) trains models by using a large amount of historical data to obtain a more accurate mapping relationship between output and input. ANN has a strong dependence on the data, and needs a lot of sample data training to get a good prediction result. It can be used as a powerful tool to describe OD passenger flow, because it can describe the characteristics of nonlinear system and take consideration into the relationship between OD passenger flow and spatio-temporal dimension. However, in the course of training the traditional neural network model, a great deal of marker data is required. The BP (back propagation) algorithm commonly, used to solve the weight parameters is prone to gradient disappearance and gradient explosion in the case of more model layers, and the result may fall into local optimum. Smith and Demetsky [25] applied the BP neural network model to short-term traffic flow prediction early. Hu et al. [26] pre-processed the historical data firstly, and then put the differential data into the BP neural network to predict the short-term passenger flow. Zhu et al. [27] predicted the passenger flow by using radial base function (RBF), and considered the impact of traffic conditions at intersections. As a major breakthrough in the past ten years, deep learning has achieved good results in many tasks such as speech recognition and image processing [28]. As a machine learning method that integrates unsupervised learning, semisupervised learning and supervised learning, deep learning has a variety of models and algorithms for all kinds of scenarios [29]. As a result, it is ideal for dealing with complex, high-dimensional short-term OD passenger flow forecasting issues. Recurrent Neural Network (RNN) in Deep Learning introduces the concept of timing into existing neural networks [30], this model is ideal for learning long-term dependent timing data such as OD passenger flow data because the output of the precedent moment has a direct impact on the input at the current moment [31]. Long-Short Term Memory (LSTM) is a special RNN network designed to solve the problem of gradient disappearance or gradient explosion. Ma et al. [32] constructed the traffic speed prediction model based on LSTM network and trained the LSTM network with remote microwave sensor data. Tian and Pan [33] discussed the performance of LSTM network in short-term traffic forecasting in detail, and demonstrated the advantages of LSTM by comparing LSTM with several other commonly used models. Rui et al. [34] predicted the short-term traffic flow by LSTM and GRU models, and confirmed the LSTM model is better than the ARIMA model in the short-term traffic flow prediction. Gu et al. [35] combined GRU, ARIMA and RBF to predict short-term traffic volume, and confirmed the combined method is better than other methods in terms of accuracy and stability. Toque et al. [36] used the LSTM model that input the subway and bus passenger flow data to subway predict short-term OD passenger flow. The result showed that LSTM model has obvious advantages in subway short-term OD passenger flow prediction, and after increasing the input of bus data, the prediction accuracy of the model has been further improved. The TABLE 1 is a summary of the recent research.
These methods, which appear in different stages of the development of short-term OD prediction, have good predictive ability, and can reflect the essential characteristics of OD passenger flow time series change to a certain extent. With the sharing of data between different sources, it is difficult to analyze different sources data by existing methods, and it is difficult to exploit the potential temporal and spatial relationships of different data. This paper compares different methods and considers the advantages of LSTM network model in processing time series, selects it as the model basis, and improves the structure of LSTM by redesigning the hidden layer and neuron to fully exploit the spatio-temporal information of data hiding.

B. MAIN CONTRIBUTIONS
This paper studies the short-term OD passenger flow prediction in rail transit, and construct a short-term OD prediction model based on an STLSTM network under multisource conditions through analysis and selection of multisource data, and compare the assessments with other methods.
There are three main contributions. Firstly, introduce multisource data into the problem of short-term OD prediction in the context of rail transit, and to construct a prediction model under multisource data. Compared with the standard LSTM network under the single source, it can be seen that the model can extract more characteristics with an increasing input of multisource data, which improves prediction accuracy of the model significantly under 15min, 30min and 60min time granularity. Secondly, to fully capture the characteristics of each OD in OD passenger flow distribution time series data and the overall characteristics of each period, an STLSTM model is obtained by improving the hidden layer structure and neuron structure of the standard LSTM network. The result shows that after improving the model structure, the model can fully capture the characteristics of each OD pair and overall characteristics of each time series in the OD passenger flow distribution time series, so that the prediction effect of the model under different time granularities can be obviously improved. Finally, large-scale rail transit network and mass data are applied. This paper is verified by using 54,056 OD pairs, 199.6 million smart card data records and 10.5 million mobile phone data in Beijing rail transit network, and the STLSTM can be fully trained based on the massive multidimensional data.

II. PROBLEM DESCRIPTION
Passenger demand in the transportation sector is for specific origins O and destinations D, and passenger demand in rail transit systems can be reflected by smart card data in different time periods. The distribution of passenger flow between  ODs is influenced by people's daily activities, which presents inherent law and trend in the time dimension. In addition, there are mutual influence among different ODs. Influenced by the traffic trend, there is a time correlation among ODs, affected by station land properties, distance, etc. and there is also spatial interaction of ODs. is the number of passengers from origin i to destination j in time interval k, x k ∈ R M is the number of passengers of all OD pairs within the time interval. In practical application, there will be a lag between the inbound time and outbound time collected by smart card system. And there is passenger flow lag effect in OD matrix estimation, that is, the passenger flow information collected in the earlier period has not left the rail transit network, and this kind of passenger flow diffuses results in mismatching of the passenger flow information collected in the same period to each other. Therefore, estimates should not only take into account traffic flow information for the current time period, but also include information from multiple time periods in the previous period, which greatly increases the complexity of the process [19] The basic relationship between inbound and outbound quantity collection period is shown in Fig. 2.
The goal of this paper is to improve the prediction of short-term OD passenger flow accuracy in rail transit system. That is to predict the quantity of OD x t+l from the origin to the destination in all the ODs after l time intervals with the observed values known k of the time series of OD passenger flow distribution.

III. DATA COLLECTION AND FEATURE SELECTION
In order to improve the accuracy of short-term OD matrix prediction, a large number of multisource data are collected and processed as input during neural network training. Smart card data are a very important data source for short-term OD prediction of rail transit. It is based on the fact that the OD matrixes show significant similarities to historical values, especially during peak hours [19]. Each complete smart card data contains the time of inbound and outbound stations, which are the basis of model input. In addition, weather information [40] will affect passengers' willingness to travel and the choice of transportation modes. Mobile phone data show the users condition around the station (i.e. crowd gathering hot spots). These three kinds of information can also provide more features to explain irregular passenger flow. Multisource data has a very vital role in rendering the STLSTM construction and the final result for the network, so it should be adapted to the characteristics of the data. Therefore, some statistical methods, such as classification and regression, are used to evaluate the correlation between data as much as possible, in order to choose the suitable data for data mining and avoid repeating calculation or missing important information in this paper. Combined with the above-mentioned data, this paper collects smart card data, mobile phone data and weather data as data sources, and processes and extracts the data as input to the STLSTM.

A. DATA COLLECTION
The data used in this paper consists of three main aspects including smart card data, weather data and mobile phone data [41]. Firstly, smart card data are obtained and filtered by the smart card system, and then process the 199.6 million data records to avoid the impact of holidays on forecasts. The data set contains smart card data and weather data of Beijing from February 15 to April 4, 2014, a total of seven consecutive weeks as a data set. Secondly, weather data are obtained by the weather news Internet (weather data source: http://www.tianqihoubao. com). With the increasing penetration of mobile phones in China, mobile phone users cover most of the population in the city, so it is possible to use mobile phone data to aid OD passenger flow analysis. Thirdly, using mobile phone data to aid short-term OD volume prediction by converting mobile phone data into time-changing user location information table. This paper collects some mobile phone signaling data provided by mobile operators in Beijing on September 2, 2010, with a total of 10,5million data, which can reflect the attractiveness of the station to some extent. Take the distribution of 8:00-8:15 a.m. on September 2, 2010 for example, the results are shown in Fig.3. Among them, the black dots are the subway station, and the red dots are the target locating points for the mobile phone users. The darker the color is, the denser the points are.

B. FEATURE SELECTION
In order to fully exploit the passenger spatio-temporal information hidden by multisource data, it is necessary to process the characteristics of the data before data input [41]. Therefore, this paper deals with the operating day information, front-order passenger flow information, station attribute VOLUME 8, 2020 information and other features contained in the data, and explains the reasons in detail for choosing the feature and the quantitative processing methods, which are as follows:

1) INFLUENCE FACTOR OF OPERATION DAY
The distribution of OD passenger flow on the rail transit network will show different characteristics on different operating days. Considering the similarity of the distribution characteristics of the network OD passenger flow, the characteristics of different operating day's passenger flow can be analyzed and classified by using the collected historical OD passenger flow information. And it can save the analysis of passenger flow characteristics in the same characteristic day for the followup study. The sample OD matrix which is made by smart card data clustered by K-Means algorithm. Because the K-Means algorithm is sensitive to the initially selected centroid points, it is necessary to use other methods to determine the number of classifications and the initial centroid points. In order to obtain accurate results, we used within-group Linkage and Ward method [43] (both select square Euclidean distance to calculate the distance between samples) to cluster respectively first. Then it is found that when the number of classifications is 3, the results of the two methods are consistent. Finally, the centers of gravity of these 3 categories are input as the initial centroid points, and the K-Means algorithm is used to obtain the final classification result of the operation day.
According to the results of K-Means method, operating days can be divided into three types of passenger flow characteristic days. It can be found that there is a certain correlation between OD passenger flow distribution and passenger flow characteristic day type. On such normal working days from Monday to Thursday, commuter traffic (mainly work and school) is the main body of rail transit system, so the distribution structure of passenger flow is affected by the attractiveness of places such as work units and schools. The composition of passengers on Friday mornings is basically similar to those made up from Monday to Thursday, but in the afternoon and evening, passengers travel in a manner and purpose that differs from Monday to Thursday. At normal weekends, the main purposes of passenger travel include shopping, sightseeing and visiting relatives and friends, so the distribution structure of passenger flow is affected by attractions such as recreational facilities, tourist attractions and business centers.
In summary, the normal operating day of rail transit (excluding special holidays) can be divided into characteristic day 1 (Monday to Thursday), characteristic day 2 (Friday) and characteristic day 3 (Saturday to Sunday).

2) INFLUENCE FACTORS IN OPERATION PERIOD
Within an operating day, there will be a large difference in passenger flow at different times, for example, during the morning and evening rush hour, due to the large number of commuters, the overall passenger flow of the network will be significantly higher than the off-peak time. (Due to the limited data, this article does not take holiday passenger flow into consideration.) In the preceding sections, the normal operating day is divided into three types of characteristic days. In theory, different characteristic days will not show exactly the same operating period division results. However, the OD passenger flow data display the characteristics of timing, and the hierarchical clustering method and K-Means method do not apply to the clustering of time series data. The optimal sorting method is used to orderly cluster data set's operating time in this paper. The clustering process is described below using the characteristic day 1 as an example. Because the number K of classifications is difficult to determine in advance, this paper calculates the classification loss function, plots the trend of the classification loss function as the number of classifications increases, the results of which are shown in Fig. 4.
As can be seen from Fig. 4, when K ≤ 5, the classification loss function decreases rapidly with the increment of K ; when K ≥ 10 during the operating period, the classification loss function decreases slowly with the increment of K ; When K = 8 there is a turning point indicating the slowdown of the changing trend of the classification loss function. Therefore, this paper sets the number of operational period classifications to 8, so that for characteristic day 1, at which point the classification loss function value is L [p (n, k)] = 2087.637.
The same is the clustering of OD passenger flow of characteristics days 2 and 3, and the final result is that the classification number of characteristics day 2 is also set to 8, and the number of characteristic day 3 is set to 6. The divisions of operating hours for different characteristic days are shown in Table 2.

3) HISTORICAL PASSENGER FACTORS DURING THE SAME PERIOD
The OD spatial and temporal distribution of passenger flow has a certain stability, that is, the historical passenger flow and the daily passenger flow has a certain correlation. In the actual rail transit operation management, last week's OD passenger flow is used to assist the current operation management, so the impact of the historical period of passenger flow data on the current time OD passenger flow distribution is analyzed. However, due to passenger travel, different travel needs and times lead to a certain degree of volatility of OD volume. Analysis of the stability of OD passenger flow can judge the correlation between historical same-time passenger flow and current OD passenger flow.
Select the smart card data (2014.3.1-2014.3.14) for two consecutive weeks of Beijing Urban Rail Transit in 2014 to carry out statistical analysis, and establish relevant indicators to analyze the change law of OD passenger flow distribution structure. In this paper, the weighted relative error is selected as a quantitative indicator, which is used to analyze the stability characteristics of the OD passenger flow distribution. The calculation formula is shown in (1) ij represents historical OD flow; q ij represents OD flow of the day; K represents analysis period;γ ij (t) represents the weight coefficient for any OD pair.
The analysis results are shown in Fig. 5 below: Fig. 5 (a-e) indicated, the trend of the stability of OD passenger flow on weekdays is similar, the weighted relative error of the morning and evening peak is between 15%-30%, the weighted relative error of the off-peak period is between 40%-68%, and the characteristics of OD distribution working days have strong similarity; as can be seen from Fig. 5 (f) to Fig. 5 (g), there is no obvious error in the distribution of OD passenger flow at weekends and the weighted relative error is large throughout the day.
Moreover, the daily statistical results show that with the increase of statistical time granularity, weighted relative error is decreased. If 60min as the statistical time granularity, the weekday OD passenger flow in the morning and evening peak weighted relative error can be reduced to about 18%, and off-peak weighted relative error can be reduced to about 49%.

4) PASSENGER FLOW FACTORS FOR A NUMBER OF TIME PERIODS BEFORE THE FORECAST POINT
The OD passenger flow at the current moment is not only affected by the historical OD passenger flow data, but also VOLUME 8, 2020 affected by the OD passenger flow of previous periods. In order to make full use of the historical OD passenger flow distribution data, the OD passenger flow distribution data in the first four time periods of the prediction point were also taken as the influencing factor of OD passenger flow distribution. Taking 15min time granularity as an example, i.e. 15min is a time period, studying the effect of OD passenger flow before the forecast point 15min, 30min, 45min, and 60min on the forecast point OD passenger flow, respectively, the scatter plot is drawn. The horizontal coordinate is the forecast point traffic value, and the ordinate is the traffic value for several periods before the forecast point. Take the all-day OD full-time data of the Beijing metro network as an example, the results are shown in Fig. 6.
In this paper, Pearson correlation coefficient method is used to examine the relationship between actual data and forecast points in the four periods. Pearson correlation coefficient calculation is shown in (3): In the formula: The value range of r is [−1,1]; if r < 0, it indicates a negative correlation between the two variables; if r > 0, it indicates a positive correlation between the two variables; the bigger the |r| is, the stronger the correlation between two variables is. The correlation coefficient between the forecast point and four historical data of OD passenger flow distribution is shown in Table 3. And the values of the Pearson correlation coefficient are within the 99% confidence interval.
It can be seen that there is a clear correlation between the forecast point data and the previous period data, and the closer the period of time to the forecast point, the higher the correlation between the actual passenger flow and the prediction.  Table 4.
Select the average all-day OD passenger flow data from February 15 to February 28, 2014, and statistically analyze the different OD pairs of the origin station in the line type and OD volume. Different types of line between the average OD volume and different types of OD proportion can be shown in Fig. 8.    As can be seen that OD passenger flow of which the origin station is the urban line accounted for 91% of the total volume, and the OD between the urban line and suburban line, and that between the suburban lines, is significantly less.

6) THE TYPE FACTORS OF THE ORIGIN AND DESTINATION STATIONS
In addition to the type of line, the different passenger flow characteristics of rail transit stations and the nature of surrounding land use will also lead to different trends in passenger flow in and out of the station [44]. Therefore, the classification of stations according to passenger flow characteristics and land use is considered and the impact of different origin station types on the OD passenger flow distribution is analyzed.
This paper takes the 15min granularity of passenger in and out of the station on the working day of Beijing from February 15 to March 14, 2014 as the data source, and obtains the average number of in and out passenger flow of each station in 20 working days. With Z-score standardized based  on passenger flow characteristics, processed passenger time series are used K-means to cluster. According to the characteristics of the time distribution of passenger flow in and out of the station at different times of the day, 233 stations can be summed up as the following 7 types of stations, namely, residential station, workplace station, residential work mixed station, mixed-bias residential station, mixed-bias work station, shopping and sightseeing station and external hub-type station [45]. The passenger flow time curve for the number of stations in and out of various stations is shown in Fig.9. below.
The classification results are shown in Fig.10. It can be seen that the station surrounding land types have significant differences, the perimeter is mainly residential-based and the center is mainly employment-oriented.

7) INFLUENCE FACTORS OF THE INBOUND VOLUME OF THE ORIGIN STATION AND OUTBOUND VOLUME OF DESTINATION STATION
The inbound and outbound volume of stations can effectively reflect the passenger flow in the rail transit operation network. When the network passenger flow increases, the OD volume also theoretically increases, and vice versa. The inbound volume at a station at a certain time is all the OD VOLUME 8, 2020 volume sum of this period with this station as the O, so is the outbound volume. Thus, the meaning of a single OD and the inbound and outbound volume is not the same. In order to simplify the calculation and avoid the impact of extreme OD on the correlation, we randomly select 100 ODs from the first 50% of the OD of the passenger flow and use the Z-score standardized method to process the data.
It can be seen that the correlation between OD volume and the outbound volume of the terminal is high, whereas relatively low with the outbound volume at the origin station during the morning peak from Monday to Friday; at Monday to Friday night peak time, the correlation between OD volume and the passenger inbound origin station and terminal outbound volume is high; at weekends, the correlation is relatively low.
The average full-day OD data from February 15 to February 28, 2014, is selected to analyze the correlation between OD volume and origin and destination stations attraction, as shown in Table 5. It is known that the inspection period is satisfied with sig < 0.01. Through the significance test, there is a certain correlation between OD quantity and origin and destination stations attraction, and the attraction of the terminal station has a relatively large influence on the OD volume.

8) INFLUENCE FACTORS OF AVERAGE TRAVEL TIME BETWEEN ODS
In addition, passenger travel can usually be divided into travel with explicit destination (e.g. work, etc.) and travel with vague destinations (e.g. shopping, etc.) [46]. For destinationdefined travel, people tend to choose places where they live with relatively short commute times, resulting in more passenger travel between OD pairs with relatively short travel times. Passengers often choose a relatively close shopping center for travel when the destination is unclear. This will result in a relatively short travel time for OD to more passengers. Considering both of these situations, there is a negative correlation between OD volume and travel time.
Passenger travel time between ODs is from swipe card inbound station to swipe outbound station. By counting and analyzing the relevant smart card data, the travel time distribution of any pair of OD pairs of passenger travel can be seen. Using the data, the average travel time between ODs is used to count the travel time of all OD passengers.  The number of OD intervals, transfer time and average travel time data are calculated and collected, with a total of 54,056 OD. Considering that the number of OD intervals has a strong correlation with the average travel time between OD and OD, the average travel time between ODs as the influencing factor is used for analysis. The average travel time of all OD for a week in a row is shown in Fig. 12.
The travel time of passengers taking the Beijing subway is mainly distributed within 20min to 45min, accounting for 41.37% of the total sample. The travel time exceeds 60min is less, only 15.24%, and the travel time of few passengers exceeds 120min. 45min subway system commute time means a one-way commute from home to work is about 60min. After seven years of observation by Huang et al. [47], the 45min subway system's commute time can be considered the longest commute that Beijing residents can tolerate: if the subway system commute is less than 45min, residents tend to extend the commute to get better employment opportunities or better living conditions; if the subway system commute is more than 45min, residents will prefer to move in order to shorten the commute.
In order to study the relationship between OD and travel time, the average OD of each travel time period at 3min intervals is calculated. The calculation results for the three characteristic days are shown below: According to the statistical results of the three characteristic days, there is a clear correlation between OD volume and the average travel time between OD. In most range, the OD volume decreases with the increase in OD average travel time.

9) INFLUENCE FACTORS OF WEATHER INFORMATION
In addition to smart card data, some external factors can have a certain impact on OD passenger flow distribution.
Environmental factors such as weather condition, average temperature and air quality can have a certain impact on people's travel. Rainy weather, for example, can lead to an increase in the number of passengers taking subway, which can lead to longer morning and evening rush hours when it occurs during morning and evening peak hours. And air quality can also affect some OD travel, for instance, when air quality is good, people are more inclined to outdoor activities [40]. Therefore, this paper considers the relationship between data analysis and OD volume in terms of weather conditions and air quality.
Usually for residents in the daily travel, the air quality data are more sensitive to the quality rating, AQI index and PM2.5 index, and these three sets of data are strongly related. Considering the convenience of following feature construction, this paper selects the AQI index to depict the air quality data. This paper selects the highest temperature, the lowest temperature, the weather conditions, AQI index and wind for analysis. The time series of weather data is shown in Fig. 14.
It can be seen that the correlation between the highest temperature and the lowest temperature is very strong, so the daily average temperature is depicted; the correlation between AQI index and wind is strong, when the wind is greater than level 3, the AQI index drops, the air quality becomes better, which is also in line with our daily cognition. So, the wind factors must be considered, and AQI index and wind only select AQI index. Therefore, this paper selects the average temperature, weather conditions, AQI index of these three sets of data for the impact analysis of the external environment of the following on OD passenger flow.
The Pearson correlation is calculated by the processed sum of every OD passenger data and the processed weather data from February 15 to April 4, and the list of relevant coefficients is shown in Table 6.
The Pearson relationship value of each influence factor and OD volume is within the 95% confidence interval, indicating that there is a certain correlation between OD quantity and weather factors, but due to the low correlation coefficient, it is needed to divide OD and date to analyze on which characteristic days the weather factors mainly affect which types of OD conditions.
Weather factors such as weather condition, average temperature and air quality can have a certain impact on people's travel. Generally speaking, weather factors have a greater impact on the travel of leisure groups such as shopping and sightseeing groups, but less on the travel of commuters.  Therefore, according to the types of travel terminal, the travel purposes of OD are divided. The stations above are divided into commuting, shopping-sightseeing and business as new-type 1, 2 and 3.
All data is standardized for comparison, and weather conditions are numerically represented. The smaller the weather value is, the better weather conditions are. The results of the relationship between OD flow and weather factors under different conditions are analyzed according to the daily passenger flow characteristic days and OD travel purposes, and the results are shown in Fig. 15.
It can be seen that commuting and business groups are basically not affected by external environmental factors during the three characteristic days, because commuter or outof-office passenger travel is rigid demand, less affected by external factors are. On characteristic day 2, it can be seen that the effect of temperature on shopping-sightseeing group is not obvious, but AQI and weather conditions have some effect on shopping-sightseeing groups, this is because on Fridays, leisure entertainment such as shopping and dinner after work is generally preferred after work on Fridays; external environmental factors can be considered to have a certain impact on OD of shopping-sightseeing, and it can be seen that the influence of AQI is greater among the three attributes, which is consistent with the results of Pearson-related analysis. Therefore, the final choice of AQI is chosen as the characteristics of shopping-sightseeing passenger flow, temperature, and weather are chosen as the characteristics of passenger flow of shopping-sightseeing under characteristic day 2 and 3. Because the interval of the collected weather data is twice a day (8:00-18:00, 18:00-8:00). It is different from the granularity of the forecast, so it is necessary to process the interval of weather data into the granularity of the predicted.    ridership and choose other stations with similar distance, and this just explains why the weekend inbound volume and OD volume correlation is not high above mentioned. Therefore, it is necessary to modify the passenger flow caused by this behavior in combination with other source data. Mobile phone data can depict the density of people outside the station and be used to express the attraction of the station. These effect of chooses of OD passenger flow can be characterized by analyzing destination surrounding personnel intensive.
The mobile phone data is divided, calculate the number of users within 500m of each station within each 15min granularity, and get the attraction value, which can reflect the attractiveness of each subway station at different times, as shown in Fig. 16.
According to the calculation, the ''Heat'' level of the station is divided. Taking 8:00-8:15 as an example, the station attraction level heat based on mobile phone data is shown in Fig. 17.
Due to data limitations, there is only September 2, 2010 mobile phone data with no matching passenger flow data, so September 2, 2013 OD data and mobile phone data are selected for related analysis. The relationship between the station's attractiveness level and OD correlation is shown in Fig. 18.
Due to data limitations, the correlation did not achieve the expectation, however, from the perspective of personnel distribution space, the changes of distribution in urban have been relatively small within three years. [48]. Considering that can be used as a supplement to smart card data, and avoiding neglecting the specific characteristics of the passenger, this paper adds the mobile phone data as a predictor into the forecast variable set.

C. DATA STANDARDIZATION
The data collected in the preceding mentioned on the short-term OD passenger flow prediction data have different volume and volume units, which will affect the prediction effect of the short-term OD prediction model in the later paragraph. In order to avoid the impact of the volume outline between different data, the collected data need to be standardized, to solve the problem of comparability between different volume data. In addition, standardizing the model input data can improve the accuracy of the short-term OD prediction model and improve the convergence speed of the prediction model.
In this paper, the z-score standardized method is applied to process the data. The calculation formula is: where x represents standardized processed data, x represents raw data, µ represents the mean of all sample data, and σ represents the standard deviation of all sample data.

D. INPUT FEATURE SORTING
The values of influencing factors are divided into numeric type, category type, two-value type and other types, different types of variables have different processing methods. This paper standardizes all numerical variables with Z-score, and code all classification characteristics with One-hot. Then All data is organized in 15min, 30min, and 60min time granularity and matched with the passenger data. Variable selection is performed using recursive feature elimination, and logistic regression is used as the base model. At first, different variable combinations are generated and evaluated. Then multiple rounds of training are conducted by using the base model. After each round of training, the feature of multiple weight coefficients is eliminated, and the next round of training is conducted based on the new variable subset until the final result is selected. The final selection results of the predictive features in this paper are shown in Table 7, and 31 features are finally obtained. The spatial information contains the origin station, passing station and destination station between different ODs. Each OD is different. In other words, different information should contain spatial information unique to each OD. So, features with different values between different OD are classified as space-dependent features, and those with the same value are classified as time-dependent features.

IV. MODEL
The OD passenger flow distribution of rail transit system has timing smoothness, periodicity and trend, and then the shortterm OD passenger flow forecast needs to make effective use of the OD passenger flow distribution information in the pre-sequence period. LSTM is considered to perform well with time series prediction problems such as OD passenger flow prediction. However, in view of the spatio-temporal attributes of input features, it is necessary to adjust the structure of LSTM network to exploit hidden spatio-temporal information. Therefore, based on the standard LSTM network, this paper construct STLSTM model by improving the hidden layer structure and neuron structure of the standard LSTM network.

A. RECURRENT NEURAL NETWORK
LSTM is a kind of time recurrent neural network (RNN), which is specially designed to solve the long-term VOLUME 8, 2020 dependence problem of general RNN. In other words, the LSTM network is essentially an RNN network using LSTM units. A basic RNN model is shown in Fig. 19(a). It can be seen that compared with other neural network models, the nodes of the RNN hidden layer are connected to each other, and the input of the hidden layer includes not only the output of the input layer but also the output of the previous time hidden layer. And these self-connected hidden layers that span the time points enable the RNN to make explicit modeling of time. As can be seen from Fig. 19(b), the characteristics of the chain reveal that RNN is essentially related to the sequence, and its output value is affected by the previous input values. So RNN is the most natural neural network architecture for OD passenger flow distribution time series data.
However, because the chain-based method in the process of RNN network piloting forms the form of multiplication, the error accumulates in the update which leads to gradient explosion and gradient disappearance easily [49].

B. NEURON STRUCTURE IMPROVEMENT
To overcome the aforementioned disadvantages of traditional RNNs, and fully exploit the spatio-temporal attributes of multisource data, STLSTM is proposed in this paper to predict OD passenger flow prediction. These features are especially desirable for passenger prediction in the transportation domain. A STLSTM is composed of input layer, hidden layer and output layer. The improvement of the RNN by using STLSTM is by adding time state C time and space state C space to save the long-term state. Three control switches are used to control C time and C space . Two channels are constructed for independent extraction of spatio-temporal characteristics of multisource data. Among them, the control methods of C time and C space are consistent, so this paper takes C time as an example to explain.
The switch of the STLSTM network is implemented through the gate. The gate is actually a fully connected layer, and its input is a vector, and the output is a real vector between 0 and 1. The use of gate is to multiply the output vector by the element multiplied by the vector that needs to be controlled. Two gates are used in STLSTM to control C time . One is the forgotten gate which determines how much of the unit state of the previous moment C time_t−1 is retained to the current moment C time_t ; the other is input gate, which determines how much input X time_t from the network is saved to C time_t at the current time. The output gate is used to control how much C time_t is output to the current output value H time_t .

1) FORGET GATE F time
Identify information that should be forgotten by neurons. The F time_t reads H time_t−1 and X time_t , and finally assigns the value of the output to C time_t−1 .The output value is between [0, 1]. 0 means 'completely discarded' and 1 means 'completely reserved'. Its calculation is shown in (5).
From (5) to (10), W f , W i , W o represents the corresponding weight, b f , b i , b o represents the corresponding paranoia, sigmoid and ReLU represents activation functions.

2) INPUT GATE I time
I time determines which new information is saved into the C time . The I time consists of two modules, one using the sigmoid layer to determine the value to be entered, and the other using the ReLU layer to determine a new candidate vector, which is then added to the new C time_t . The calculation is as follows: The new contribution to neuronal state is: The two parts of the I time will update C time , C time_t−1 will be updated to C time . F time_t · C time_t−1 indicates the resulting information that needs to be discarded, and then adds it to I time_t ·C time_t . That is:

3) OUTPUT GATE O time
O time determines the value to be output. This output is based on C time_t and is filtered through a filter. 1) The sigmoid function is used to determine the part of C time_t that needs to be output; 2) C time_t is processed by using the ReLU layer (the purpose is to keep the output value between the values of the -1 and 1); 3) multiply the output of the sigmoid gate to determine the final output. The specific calculations are:  The improved neuronal structure is shown in Fig. 21. The improved neurons always maintain two channels to model the OD passenger flow distribution time series, the time-dependent feature channels in the sequence (blue line channels in Fig. 20) and the spatial dependent feature channels (green line channels in Fig. 20).
Similarly, the control of C space is as follows: W * , b * are learning parameters. l represents the number of layers, when X l time_t = H l−1 time_t ∈ R d×N , X l space_t = H l−1 space_t ∈ R d×N , l = 1, S is entered into the model as a priori knowledge, and its initialization can be based on the distance between ODs, whether each OD pair contains the same area and other information. The input gate, the output gate and the forgotten gate still use sigmoid as the activation function.

C. FUSION SPACE DEPENDENCE AND TIME DEPENDENCE
The fusion module will combine the two-channel results of the output of the improved LSTM neurons in space-time, with a structure shown in Fig. 21.
As shown in the (12), the parameters and the characteristics of the time dimension and space dimension are firstly weighted and summed. Then the fusion features are flattened. Finally, the space-time features are further abstracted and extracted with a fully connected network as shown in the (13). (12) x seq_t = relu W seq · x fusion_t + b seq (13) where x seq_t ∈ R d×N is flattened by X seq_t ∈ R d×N , 1 , 2 ∈ R d×1 , W seq_t ∈ R u×Nd , b seq_t ∈ R u represents learning parameters. The module also uses ReLU as an activation function, with an output of x seq_t . In each time slice, x seq_t which combines time and spatial features, is extracted again through a fully connected layer. Finally, through the tanh function activation, the output value is mapped [−1,1] to get the final predictionx t+k . The calculation formula is:x VOLUME 8, 2020 W ∈ R N ×u_seq , b ∈ R N , b seq_t ∈ R u represents learning parameters.

D. MODEL FRAMEWORK BASED ON IMPROVING SPACE-TIME LSTM NETWORK UNDER MULTISOURCE DATA
Combined with the neuronal structure and the spatial-time dependence fusion method, this paper proposes STLSTM model, explicitly model the spatial-time correlation of the demand sequence, so as to predict the OD passenger flow distribution more accurately. The STLSTM model is shown in Fig. 22. Compared to the traditional model, the new LSTM neuron structure redesign the hidden layer structure, and the integration layer and the full connection layer, after adding two channels of STLSTM neuron layer.

V. CASE STUDY A. CASE DESCRIPTIONS
In this paper, the Beijing rail transit network is selected as the case to validate the proposed model. The network contains 233 stations and 54,056 ODs. Based on the data demand of the short-term OD passenger flow forecast research on the rail transit network, this paper collects the original records of smart card of the whole network for 7 consecutive weeks through the Beijing Rail Transit system as the basis of this study. Considering that February 14, 2014 is the Lantern Festival, April 5, 2014 is Qingming Festival, the study period was further identified as February 15, 2014 to April 4, 2014, thus avoiding the impact of holidays on OD passenger flow in the rail network. At the same time, the corresponding weather data and mobile signaling data were collected to match the OD demand data. All the data are settled in 15min, 30min and 60min time granularity. Among which, the data for six consecutive weeks from February 15, 2014 to March 28, 2014 are used as training data set for the analysis of the characteristics of the OD passenger flow distribution of the network and the

B. MODEL PARAMETER SETTING
The parameters of the network connection, the number of layers, the number of neurons per layer, etc. in the prediction model based on LSTM network are not generated by machine learning, but need to be set in advance by human beings. These parameters are called Hyper-Parameters. Next, some of the hyper-parameters of the model are tested in this paper, and the optimal parameter results are selected.
In order to analyze the effects of the number of neurons and epoch in the prediction model on the accuracy of the prediction model, this paper sets the value range of the number of neurons is N = {50, 100, 150}, and epoch takes 30,50,70,90, and the loss function takes MAE or MAPE. Combine each neuron number, epoch, and loss function, and compare their prediction result to select the optimal parameters. In the combined prediction results of each neuron number, epoch, and loss function, the minimum values of MAE and MAPE are 1.31161 and 0.42287 respectively when the number of neurons is 100, and epoch is 70, achieving a better prediction effect. By comparison with loss functions, the prediction effect is better when the loss function takes MAPE. In other experimental tests in this paper, the number of neurons is set to 100 and the loss function as MAPE. In order to avoid overfitting, it is necessary to set dropout [50]. To test the effect of dropout parameter settings on experimental prediction errors, this paper builds a prediction model with dropout values of 0, 0.1, 0.3, and 0.5, respectively, and enters epoch ranging from 30-90. From Fig. 23 to 24, it can be concluded that using dropout prediction model is less perused in the MAE and MAPE values under different epoch conditions than without dropout prediction model. This is due to the fact that in the training phase, the hidden layer neurons appear randomly at a set probability at each iteration when the weight matrix adjustment is made using the data of the input model. This way of updating the weight matrix no longer depends on the joint action of the neurons with the hidden layer of complex relationship, enhances the learning ability of the short-term OD prediction model, avoids the problem that some OD passenger flow distribution time series data features only have effect under certain circumstances. Thus, improving the generalization ability of the short-term OD prediction model. Dropout value are set as 0.1, 0.3 and 0.5, the OD prediction model error differences are relatively small, but overall epoch as 70, dropout value as 0.5, the prediction model perform slightly better. Therefore, the value of dropout is chosen to be 0.5 in the model.

1) ANALYSIS OF THE OVERALL PREDICTION PERFORMANCE OF DIFFERENT CHARACTERISTIC DAYS
The three predictions at 15min, 30min and 60min time granularity are obtained by constructing a short-term passenger flow forecast model based on STLSTM under the multisource data to predict the full week of OD passenger flow distribution from March 29, 2014 to April 4, 2014. Take the period from 8:00 a.m. to 8:15 for example, the forecast results are illustrated, and the OD passenger flow distribution results of different characteristic days are shown in Fig. 25 (b-d) below. In the figure, the black dot represents the subway station, the different colors of inter-lines between OD represents the OD passenger flow of different sizes, the greater the passenger flow is, the closer the color is to the red, otherwise, the closer the color is to the blue, the less the OD with passenger flow of 0 does not show.
It can be seen that passenger flow distributions in the morning 8:00-8:15 of characteristic day 1 and 2 are very similar, but they have large differences with day 3. Passenger flow of day 3 is significantly smaller than the other two days. Based on the forecast results, the predicted accuracy results for calculating the days of different passenger flow characteristics with different time granularity are shown in Fig. 25 (a): Analysis of the model's overall prediction accuracy results found that the MAPE of 15min interval are all below 45%, MAPE of 30min time interval are below 40%, MAPE of 60min time interval are below 35%, and in the ordinary working day (characteristic day 1), the MAPE of the 60min interval reached 22.20%. In addition, the predicted MAE value under the granularity size of each time was basically below 1.5, of which the prediction effect is relatively better to the Beijing subway network passenger flow level. It can be seen that the short-term passenger flow prediction model based on STLSTM under multisource data has good overall prediction effect and can be used as a reference when the operator of rail transit system formulates the transportation organization strategy.

2) ANALYSIS OF PREDICTION EFFECTS IN DIFFERENT TIME PERIODS
Through the above analysis, it is found that the overall prediction accuracy of the STLSTM under multisource data is high. A specific analysis of the precision of the time-sharing prediction which comparing the prediction results of different characteristic day in different time periods is shown in Fig 26. As can be seen from Fig. 26:(1) On the whole, the error of characteristic 1 day is basically less than the other two characteristic days, indicating that the characteristic daily 1 passenger flow regularity is relatively strong. (2) Characteristic day 2 is similar to the prediction effect of characteristic day 1 in each time of the morning. However, from about 12:00 noon, the OD passenger flow prediction error of the characteristic day 2 is significantly larger than that of characteristic day1, because it is affected by weekend travel. (3) The MAE value and MAPE value of characteristic day 3 are also relatively small, probably because the passenger flow distribution stability of characteristic day 3 is not high, while the passenger flow is also small. Therefore, the MAE value is lower than that of the passenger flow characteristic day 2, and the MAPE value is also smaller because there are great number of OD pairs with zero passenger flow.(4) In the forecast results of each period, the forecast effect of the  peak period is not better than the off-peak period. Therefore, analyzing the relationship between OD volume and forecast results during peak and off-peak period, Fig. 27 is available. As can be seen from Fig. 27: (1) The MAE of the peak time is basically below 20, the MAPE is basically below 25%. However, because the MAPE value is higher at the point of small passenger flow, the overall MAPE value is larger; (2) The median MAE value is basically below 10, but the MAPE is basically below 70%. This is because when the OD volume is small, a large MAPE value is possibly obtained even though the MAE value is small.

3) ANALYSIS OF THE PREDICTION EFFECT OF DIFFERENT PASSENGER DEMAND TYPES
In order to further illustrate the short-term prediction effect of the model with different pick-up point station type of OD, the OD pair (XI'ERQI-SHAHE) with relatively large OD passenger flow and the OD pair (ZHICHUNLU -YONG'ANLI) with relatively small OD passenger flow are selected from the forecast results, and the prediction results are analyzed. In order to further illustrate the short-term prediction effect of the model for different passenger ODs, the OD pair (XI'ERQI-SHAHE) with relatively large OD volume (top 10% of passenger volume) and OD pair (ZHICHUNLU-YONG'ANLI) with relatively small OD volume (50% after passenger flow) are randomly selected to analyze the forecast results. Fig. 28 is the results of OD passenger flow forecasts for OD pairs (XI'ERQI-SHAHE) and OD pairs (ZHICHUNLU -YONG'ANLI) at 15min forecast intervals, respectively.
Through Fig. 28, the prediction effect of the OD pair (XI'ERQI-SHAHE) with the larger passenger demand and the OD pair (ZHICHUNLU-YONG'ANLI) with the smaller passenger demand for passenger flow can be seen. The MAE of OD pair (XI'ERQI-SHAHE) is 6.902 and the show that the prediction effect of the model is related to the type of OD passenger flow, and when the volume of flow is large and relatively stable, the model can learn the data characteristics better, and the prediction effect is often higher. In contrast, when OD flow is small and relatively unstable, the model learning effect is relatively poor, the prediction effect is general. The results show that the prediction effect of the model is related to OD flow. When there is more OD flow with less stochastic effect, the model performs better in learning and predicts more accurately. This further verifies the model's extraction of regular information in the data.

4) COMPARATIVE ANALYSIS OF MULTIPLE MODELS AT DIFFERENT TIME GRANULARITY
In order to study whether the short-term OD passenger flow prediction model improves the prediction accuracy, it is necessary to compare with other methods indicating historical average method, such as ARIMA, standard LSTM under single-source data (only enter smart card data), standard LSTM under multisource data, and compare the prediction effect of each model at different time granularity.
The prediction results of 4 method are calculated. The prediction results including the OD passenger flow distribution of a full week from March 29, 2014 to April 4, 2014 under the time granularity of 30min and 60min time. The statistical model has the overall prediction error at different time granularity listed. And the results are compared with the results of the STLSTM. The results are shown in Fig.29.
These summaries can be seen from the results: (1) In comparison of the prediction results of each model, it can be seen that the STLSTM obtains a better prediction effect. MAE value of STLSTM network is about 3%-13% lower than other methods', and MAPE value is about 5%-13% lower than other methods', which shows that the STLSTM is better than the other four methods under multisource data.
(2) The standard LSTM network under single-source data is compared with the historical average method and ARIMA model under single-source data, it is difficult for the model to extract enough features to draw the peak of passenger flow when no other features are collected. Therefore, some OD passenger flow cannot be accurately predicted, so the prediction effect at peak time is poor. Because only smart card data is used, other information that affects passenger travel, especially during peak hours, is poorly predicted for peak passenger volume.
(3) In comparison of the standard LSTM network under the single-source data and the multisource data, it can be seen that by adding multisource data, the model can extract more features, so that the prediction effect of the model under different time granularities is significantly improved. MAE is reduced by about 0.2 and MAPE by about 4%, indicating that the introduction of multisource data is very effective for the improvement of the model. (4) In comparison of the standard LSTM network and the STLSTM, it can be seen that after improving the model structure, the STLSTM can effective capture the characteristics of each OD in the OD passenger flow distribution time series. That makes the prediction effect of the model significantly improve under different time granularities, and the MAPE reduce by about 6%. It shows that the improvement of the model structure effectively promotes the prediction accuracy.

VI. CONCLUSION
Short term OD matrix prediction of rail transit is a wellknown difficult problem due to stochasticity. In order to improve the accuracy of the prediction, STLSTM is proposed to utilize multisource spatiotemporal data to improve the short-term predictability of the dynamic passengers' origin and destination demand (OD matrix) in a rail transit system. The proposed network extends the standard LSTM by incorporating spatial information available from multisource data. The approach is validated on the dataset obtained from the Beijing subway network. The results show an improved prediction of the OD matrix over existing predictive models and the standard LSTM.
(1) It is very effective to introduce multisource data as input to improve prediction accuracy. Comparing the prediction effect of the standard LSTM network under the single source data and the multisource data, the prediction effect can be extracted to more characteristics by increasing the multisource data, so that the prediction effect under the granularity of time can be significantly improved.
(2) The results proved that STLSTM model can effectively exploit the potential information of multisource data and improve the prediction effect. Comparing the standard LSTM and STLSTM under multisource data, the prediction effect of STLSTM at each time granularity is significantly improved, and the MAPE is reduced by about 6%, which explained that the improvement of model structure effectively promotes the prediction accuracy.
(3) STLSTM has a good prediction effect in the case of large-scale complex rail network. The results of case verification conducted through the whole network of Beijing, which contains 233 stations and 54,056 ODs, can be obtained. In comparison of commonly used prediction methods, the STLSTM network under multisource data has obtained a good prediction effect. The partially stochastic passengers' activity increases the difficulty of the prediction. However, such 'stochasticity' is possible to be capture from the social networking data such as Facebook, WeChat etc. In the future, based on the STLSTM, the social networking site data will be further imputed into model to check whether it can improve the accuracy. At the same time, it is expected to obtain complete data with the same time interval and matching time dimension to improve prediction accuracy. Finally, to reduce calculation time, the more easily calculated GRU would be used to reduce the calculation number of large-scale networks in the future. Moreover, the application ability of the model for some abnormal cases such as sudden large passenger flow burst scenarios will be studied.
DEWEI LI received the Ph.D. degree in traffic planning and management from Beijing Jiaotong University, Beijing, China, in 2007. He is currently a Professor with the Department of Traffic and Transportation, Beijing Jiaotong University. He has participated in several major projects on traffic management funded by the Chinese government and rail transit operation undertakings. His research interests include railway timetabling, capacity management, transit assignment, and crowd management. Her research interest includes innovative route planning for suburban railways and urban rail transit timetable optimization. She takes participate in the national key research and development program of railway passenger and freight efficiency and service level improvement technology. VOLUME 8, 2020