Research on Sample Selection of Urban Rail Transit Passenger Flow Forecasting Based on SCBP Algorithm

,


I. INTRODUCTION
The demand for public transportation has become increasingly greater with the expansion of cities, and a unique method of urban rail transit operation is required to adapt to city development. The passenger flow of urban rail transit determines the scale of the station, the time of train operation, and the train formation. A reasonable urban rail transit operation organization can not only increase the safety and punctuality of travelers but also reduce the travel time. Therefore, it is of great significance for urban rail transit to determine the scale of passenger flow in the future. Urban rail transit passenger flow forecasting can be mainly categorized into near, early-stage, and long-term forecasting. Longterm and early-stage forecasting primarily rely on the typical four-stage method for prediction. In contrast, near passenger flow forecasting includes time series forecasting, time feature analysis, and regression analysis. Urban rail transit The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney. passenger flow is characterized by a long-term linear growth trend, cyclical seasonal variation, and random fluctuation and thereby exhibits complex nonlinear characteristics [1]. Many methods have been developed to improve passenger flow prediction accuracy and these methods can be primarily divided into two categories: parametric methods and nonparametric methods.
The Autoregressive Integrated Moving Average model (ARIMA) [2] is widely regarded as the classical parametric method. Milenkovic et al. [3] added seasonal changes in passenger flow to the ARIMA model to overcome the impact of passenger flow fluctuations with regular cycles. Parametric methods can achieve good performance and capture regular changes well, but cannot cope with the irregular changes of urban rail transit passenger flow. Thus, nonparametric methods, such as support vector machines (SVMs), long short-term memory neural network (LSTM), and so on [4]- [6], have been utilized to overcome this problem, and have achieved great success. Tang et al. [7] successfully added spatial variables to traditional time series prediction VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and applied it to LSTM networks. Qin et al. [8] used the Echo State Network (ESN) to predict the passenger flow of each part of seasonal decomposition. As well as, many studies about the rail transit passenger flow forecasting of events have got great progress [9], [10]. However, these methods do not take into account sample selection when making predictions, although existing research has proven that sample selection has a great impact on trained network models [11]- [14]. In addition, many studies have confirmed that appropriate sample selection can increase model accuracy [15], [16].
Urban rail transit passenger flow forecasting has strong requirements for sample integrity, such as the consideration of special periods, like holidays. To investigate the problem of the uneven distribution of training samples, the concept of sample contribution degree is proposed in this paper, the error back propagation (EBP) algorithm is deduced, and the EBP algorithm based on sample contribution degree (SCBP) is ultimately obtained. Based on the optimization strategy of the grey wolf optimizer (GWO) algorithm, the distribution of the sample contribution of each station is found on the premise of predicting error. The sample contribution of each station is analyzed, and the influencing factors of the sample contribution degree are obtained, which provides a reference for the sample selection of the deep learning prediction method in the field of urban rail transit passenger flow forecasting.
The remainder of this paper is organized as follows. A review of related work on data selection and sample selection is provided in Section 2. Section 3 introduces the concept of sample contribution, the principle of the EBP algorithm, the EBP algorithm combined with sample contribution, and the framework for determining sample contribution. In Section 4, taking the Xi'an rail transit out-flow forecast as an example, a feasible sample contribution optimization framework for this example is designed, and the complete process is demonstrated. In Section 5, the optimization result is verified and factors that affect the sample contribution in aspect of the training sample features and the prediction sample features are analyzed. In Section 6, stations are classified by the influencing factors of sample contribution, various types of central stations are analyzed, and specific fitting is conducted. Finally, the conclusions are drawn in Section 7.

II. RELATED WORK
The uncertainty of research object selection directly affects the accuracy of passenger flow prediction. Before training model, the data need to be organized, such as time granularity, time interval, etc. In the model training process, we also need to pay attention to the training sample selection. Therefore, considerable research efforts have been made to investigate the influence of prediction accuracy due to the data selection and sample selection in recent decades. According to its steps, we divided it into the data selection and sample selection.

A. THE DATA SELECTION WORK
Many scholars have done a lot of research in terms of data selection. The lack of selection criteria for passenger flow prediction data will cause large fluctuations in prediction accuracy. Zhang et al. [17] proposed to generate multi-day activity-travel data through sampling from readily available single-day household travel survey data. Li et al. [18] indicated that the time granularity of the prior passenger flow data and the difference in site classification will also lead to differences in prediction accuracy. Ma et al. [19] explored the distribution of passenger spatiotemporal characteristics based on big data of traffic smart cards, laying a foundation for the selection of passenger flow prediction objects. Chiu and Verma [20] used the method of data size to divide the data into different groups, and conducting training and testing of neural network ensembles.

B. THE SAMPLE SELECTION WORK
Existing research has proven that sample selection has a great impact on trained network models. The traditional focus has been on active learning and information compression.
Active learning is the use of the internal correlation of the sample to learn new samples autonomously, and some representative examples from the unmarked collection are determined, thus reducing the cost of marking. It generally contains three types of queries: uncertainty-based queries [21]- [24], version space-based queries [25], [26], and expected error-based queries [27], [28]. Tong [31] proposed a formal model for the incorporation of clustering into active learning.
Information compression is designed to reduce training time by filtering noise and redundant data to compress data sets. The representative algorithms of information compression are the condensed nearest neighbor (CNN) rule [32], [33], and principal component analysis (PCA) [34], [35], among others [36]. Nikolaidis et al. [37] introduced a multistage method for pruning the training set, which can much improve storage reduction and competitive execution speeds. Yang et al. [38] presented a novel editing algorithm called adaptive Edited Natural Neighbor algorithm (ENaN), which can improve the performance of condensation methods in terms of both accuracy and reduction rate greatly. Tsai et al. [39] presented a novel undersampling approach that combines clustering analysis and instance selection. The clustering analysis component groups similar data samples of the majority class dataset into 'subclasses', while the instance selection component filters out unrepresentative data samples from each of the 'subclasses'.
This paper aims to sample selection. Compared with active learning and information compression, we have the following differences: 1. The number of training samples was not reduced, which can improve the utilization rate of the sample. 2. We proposed the concept of sample contribution and applied it to the field of deep learning. The training sample contribution is given by the optimization algorithm in the 89426 VOLUME 8, 2020 training process. 3. Our proposed SCBP algorithm can be applied to all artificial neural network models based on the EBP algorithm.
The innovations of this paper are as follows: (i) the concept of the sample contribution is put forward and integrated into the process of adjusting weights by the classical EBP algorithm; (ii) a feasible optimization framework based on the intelligent optimization algorithm is proposed; (iii) taking the passenger flow forecasting of urban rail transit as an example, characteristics of passenger flow are analyzed, and a hierarchical method for optimizing sample contribution is proposed; (iv) the proposed model is highly extensible in other networks based on the EBP algorithm.

III. METHODS
Sample selection is a core issue in the process of network training. Depending on the size and attributes of the samples, the effects of neural network training can vary greatly. Existing research has indicated that the impact of training samples on training outcomes is different. Assuming that the contribution of each training sample to the training effect can be calculated, then during the process of training, a better neural network model can be obtained by adjusting the weight of the network according to the contribution degree. In this section, the determination of how to find the contribution of each training sample from the existing training sample and forecasted sample is presented in detail.
First, the details of the concept of sample contribution are discussed. Then, by analyzing the principle of the classical error back propagation algorithm, the contribution of each training sample is added, and the error adjustment strategy based on the sample contribution degree is obtained. Next, the more commonly used GWO intelligent optimization algorithm is introduced. Finally, after integrating the sample contribution into the EBP algorithm, the GWO algorithm is used to optimize each training sample contribution, which can be applied to future training to obtain a better neural network prediction model.

A. SAMPLE CONTRIBUTION
Contribution is an indicator of economic efficiency, and refers to the ratio of the number of effective or useful results to the consumption and occupancy of resources, i.e., the ratio of output to input, or the ratio of the amount obtained to the total amount. Contribution is also used to analyze the extent of the effects of various factors in economic growth. The 80/20 rule (Pareto principle) [40] was put forward by the Italian economist Pareto in the late 19th and early 20th centuries. He believed that in any group of things, the most important part is only a small portion of about 20%, and the remaining 80%, although the majority, is secondary. The 80/20 rule states that there is an unexplained imbalance between cause and effect. In general, inputs and efforts can be categorized into two different types: a majority, which only has a small impact, or a few, which has major and significant impacts. By applying this rule to sample training, it can be concluded that in the sample training process, 80% of attention is placed on 20% of the sample, and 20% of attention is placed on 80% of the sample. A better prediction model may thus be obtained. Therefore, the sample contribution degree can be defined as the guiding effect of the training process on the adjustment error. When the contribution degree of the sample is larger, the direction of the adjustment error is closer to this sample. Assuming that the number of training samples is n, the contribution of each sample is c i , and n i=1 c i = 1.

B. EBP ALGORITHM BASED ON SAMPLE CONTRIBUTION
In this section, the EBP algorithm is introduced. By analyzing the characteristics of the EBP algorithm to adjust the weight matrix, the idea of sample contribution degree is integrated into it, and the EBP algorithm based on sample contribution degree (SCBP) is obtained.

1) EBP ALGORITHM
The EBP algorithm has been the training foundation of many current neural network models since it was proposed by Rumelhart et al. [41] in 1986. The basic idea is that the learning process is caused by the forward propagation of signals and the back propagation of errors; two processes are therefore composed. Fig 1 illustrates a basic three-layer EBP neural network. The information of the entire network is stored in the weight matrix. When a sample value vector is input, the network calculates the actual output according to the weight matrix. Each sample has an expected output. The error between the expected and actual output is calculated. Based on this error, the error is reversely propagated, and the gradient descent algorithm is used to adjust the weight between the layers to achieve the purpose of memory.

2) SCBP ALGORITHM
The EBP algorithm is primarily adjusted based on the prediction error of a single input sample. The concept of the sample contribution was added to the adjustment strategy VOLUME 8, 2020 of the EBP algorithm, and the following conclusion can be obtained. Each time a sample is input, the prediction error of all the previous trained samples is calculated. The contribution degree is used as the weight basis, the weight adjustment error can then place importance on the direction of prediction error adjustment, and a more accurate prediction model can be obtained. Taking the classic threelayer Back Propagation Neural Network(BPNN) as an example, the error adjustment strategy proposed in this paper is derived.
On the p-th input, the previous p samples are input, and the contribution of samples is The output of the hidden layer is y( y 1 , y 2 , . . . y t , . . . y p ),

a: DEFINITIONS OF SYMBOLS
The symbols are summarized in the Table1.

Input layer:
The total virtual input of the p-th input sample is x p = (x p1 , x p2 , . . . x pi , . . . x pn ), and x pi can be calculated by equation (1).

Hidden layer:
The total virtual output of the hidden layer of the p-th input sample is y p (y p1 , y p2 , . . . y pj , . . . y pm ), and y pj can be calculated by equation (2) and (3).
It is assumed that transfer functions f (·) are unipolar sigmoid functions, and

Output layer:
The output of the total virtual output of the p-th input sam- , and o pk can be calculated by equation (4) and (5).
The learning process is caused by the forward propagation of signals and the back propagation of errors. The forward process calculates the actual output of the network, and the reverse process adjusts the weight matrix. When the expectation does not match the output, the weighted output error is E, which is defined as equation (6).
It is expanded to the hidden layer: When further expanded to the input layer, From equation (8), it is evident that the error E is a function of w jk and v ij , and the adjustment rule is: By applying equations (7-8), the following is obtained: The error signals defined for the hidden layer and output layer are respectively: Combined with equations (1-4) and (10)(11), the following is obtained: The following is obtained by applying equations (6-7).
Additionally, the following is defined: By applying equations (1, 2, 12, and 13), the final result is: Compared with the traditional EBP algorithm, SCBP is no longer based on the prediction error of the current sample in each adjustment of the network weight, but comprehensively refers to all the samples trained, and uses the weighted prediction error as the adjustment basis. It enhances the memory and utilization of training samples, especially for training samples with obvious timing characteristics.

C. GREY WOLF OPTIMIZER ALGORITHM
The GWO algorithm is a group intelligent optimization algorithm proposed by Mirjalili [42] in 2014. The algorithm was inspired by the prey activity of the grey wolf, and is an optimized search method. It is characterized by strong convergence performance, few parameters, and easy implementation. It has been of great concern for scholars in recent years, and has been successfully applied to the fields of shop scheduling [43], parameter optimization [44], and image classification [45], among others.
The grey wolf population in nature has a strict hierarchical system, and the wolf that is mainly responsible for making decisions on predation, habitat, and other activities is denoted as the α wolf.
The β wolf assists the α wolf. When the α wolf is old, it becomes a candidate for the α wolf, and dominates the other ranks of wolves.
The δ wolves obey the α and β wolves, and dominate the remaining wolf level. They are generally composed of young, sentinel, and hunting wolves.
The ω wolves must usually obey the wolves of other ranks. The GWO optimization process includes the steps of the grey wolf's social hierarchy, tracking, enveloping, and attacking prey. The mathematical model for simulating the swarm behavior is as follows.
In equations. (22)(23)(24)(25)(26)(27)(28), t is the current iteration, A and C are coefficients of concordance, X p (t) is the prey position, X (t) is the location of each wolf in the population, α linearly decreases from 2 to 0 as the number of iterations increases, and γ 1 and γ 2 are random vectors from the interval (0, 1). The pseudocode of GWO is as follows.

D. GWO-SCBP ALGORITHM
By using the optimization strategy of the GWO algorithm, the optimization of the sample contribution is considered.
To obtain the optimal combination of each sample contribution, the EBP algorithm based on the optimal sample contribution algorithm (GWO-SCBP) is finally established. The steps are presented in Fig. 2, and the specific steps are as follows.
Step 1: Input the training sample and random sample contributions; Step 2: Use the SCBP algorithm to obtain the trained model and predictive value; Step 3: Use the predictive value and actual value to obtain the fitness; Step 4: Sort the population according to the degree of fitness, and obtain the wolf class; Step 5: The population searches for prey according to the management strategy of the α, β, and γ wolves.
Step 6: Update the positions of the wolves and limit the group position;  Step7: Repeat steps 2-6 until the maximum number of iterations is reached. Obtain the α wolf position, which is the optimal sample contribution.

IV. EXPERIMENT
The effect of the model was evaluated with a realistic data set, and the results were presented in this section. The Xi'an rail transit system and data sources were first introduced, and a specific calculation framework based on the changing rules of rail transit out-flow was designed. In addition, the experimental environment and settings, including the hardware and parameters, of the models were introduced, and the results of a single station were provided as an example.

A. STUDY AREA
Xi'an is the capital city of Shannxi Province and is an important cultural, economic, and educational city in China. By the middle of 2018, Xi'an had opened 3 subway lines, namely Line 1, Line 2, and Line 3. There are 63 stations, including 3 interchange stations.
The first metro line, Line2, began operation on September 16, 2011. Xi'an Metro Line 2 runs along the central axis of the city, connecting Xi'an's administrative center, transportation hub, university gathering area, scientific research area, commercial center, and many tourist attractions.
Xi'an Metro Line 1 began operation on September 15, 2013. It was the first rail transit line to connect Xi'an and the Xianyang New Area. The beginning of the operation of the second phase of Xi'an Metro Line 1 is of great significance for accelerating the integration of Xi'an and Xianyang, optimizing the urban layout structure, and promoting the rapid development of the urban social economy.
The last metro line, Line 3, began operation on November 8, 2016. It connects the Xi'an International Port Area and the Xi'an High-tech Industrial Development Zone, which is the skeleton line of Xi'an rail transit network planning. Therefore, it is necessary to establish an efficient productive system of metro passenger flow. For the convenience of research, the labels for each station are listed in Fig. 3.

B. DATA ACQUISITION
Due to the popularity of digitalization, the use of automatic fare collection (AFC) systems offers the possibility of the collection of urban rail transit data. Therefore, the out-flow from May 2017 to August 2018 was collected from this dataset. There were more than 560,000 statistical data points, and, after processing, 28,000 daily out-flow data points were calculated.

C. OPTIMIZATION FRAMEWORK
Due to the large amount of data selected, it was unrealistic to determine the contribution of a single sample in the prediction process. Therefore, station No. 41 (YHZ) was taken as an example, and its out-flow within one year is presented in Fig. 4. Its weekly out-flow exhibited the same fluctuation trends in different months and had the same cyclical variation in the same month. In other words, the distributions of daily sample contributions in the same week of different months were the same. Therefore, a hierarchical daily contribution calculation method was established. The steps were as follows.
Step 1: The training sample data was divided into 12 months to determine the contribution of each sample month to the forecast month; Step 2: According to the weekly classification, the weekly contribution of different days in a week to the forecast month was examined from Monday to Sunday;  Step 3: Combined with the monthly sample contribution and the weekly sample contribution, the contributions of the daily samples were obtained.
The mathematical model is as follows. It is assumed that the sample monthly out-flow contribution for the forecasted monthly out-flow is C m , which is defined as: The c mi vector is: where i is the number of stations and j is the sample month number. The sample weekly out-flow contribution for the forecasted weekly out-flow is C w , which is defined as: The matrix of c wi is: and the c wik vector is: where k is the number of days in a week. The daily contribution of each sample C r (c r1 , c r2 , . . . c ri ) is then obtained by the monthly sample contribution and weekly sample contribution. The matrix of c ri is: (34) where the t is the number of all days in one week in all sample months. c ri is calculated by equation (35).
The specific framework is illustrated in Fig. 5. The hierarchical calculation method can effectively reduce the dimension of the optimal contribution group, making it possible to find the optimal contribution.

D. ALGORITHM COMPARISON AND PARAMETER SETTINGS
To validate the optimization ability of GWO, particle swarm optimization (PSO) was introduced as the benchmark for comparison. The experimental environment included MATLAB 2018b, self-written MATLAB programs, and a computer with an Intel(R) Core (TM)-i5 and Windows 10 operating system. The parameters of the PSO algorithm were set as follows: the individual and group learning factor was 2.0, the inertia weight was 0.4 and 0.8. The number of iterations and the particle swarm number was 50. The parameters of the GWO algorithm were set as follows: the number of iterations and the wolf group number was 50.

E. CALCULATION PROCESS
Taking the contribution of out-flow at station No. 41 as an example, the value is c w41 × c m41 . In the subsequent section, a detailed description of the determination of the sample contribution for station No. 41 is provided.

1) MONTHLY SAMPLE CONTRIBUTION
Based on the monthly out-flow from May 2017 to April 2018, the total out-flow in May 2018 was predicted on a monthly basis. The fitness of the EBP network is defined as follows: where H n is the predictive value, c n is the actual value(month), n is set to eliminate random fluctuations in network prediction, and n = 10. The monthly sample contribution and the iterative process of the sample obtained by the GWO-SCBP algorithm and PSO-SCBP algorithm are presented in Fig. 6. It proves that GWO has a good effect.

2) WEEKLY SAMPLE CONTRIBUTION
The out-flow of all Mondays in May 2018 was predicted on a weekly basis. The fitness of the EBP network is defined as follows: where H n4 is the predictive value (the forecast month has four Mondays), c n4 is the actual value(four Mondays), n is set to eliminate random fluctuations in network prediction, and n = 5. The weekly sample contribution and the iterative process of the sample obtained by the GWO-SCBP algorithm and PSO-SCBP algorithm are presented in Fig. 7.

3) DAILY SAMPLE CONTRIBUTION
As presented in Fig. 8, the daily contribution of each sample was obtained by equation. (35), and presented periodic variations and differences between different months.

V. RESULTS AND ANALYSIS
In this section, the experimental result is verified. In addition, the factors that affect the sample contribution in aspect of the training sample features and the prediction sample features are analyzed.

A. RESULTS VERIFICATION
The contribution of the daily out-flow of urban rail transit depends largely on the contribution of monthly out-flow, i.e., the monthly sample size is used to verify the results of the sought sample contribution. The BPNN, Elman, and LSTM neural network based on the EBP algorithm were selected to verify the optimization results. The average contribution, optimal contribution, and neural network without considering contribution were also examined for comparison. The methods are defined in Table 3.

1) ELMAN NEURAL NETWORK
The Elman neural network, also known as the SRN (simple recurrent network), and was proposed by Elman in 1990. The SRN considers the timing information, and the output is not only related to the input of the current time, but also the input of all the previous moments. The SRN is the simplest of the RNN architectures. It adds timing feedback connections only to the fully connected layer as compared to traditional twolayer fully connected feedforward networks.

2) LSTM NEURAL NETWORK
As a variant of the Recurrent Neural Network (RNN), LSTM is used to selectively forget the useless information in the learning process of neural network. It includes forget gate, input gate, and output gate. The forget gate can decide which information should be retained and discarded. The information from the previous hidden state and the current input are input into the Sigmoid function at the same time. The output value is between 0 and 1. The closer to 0 means the more you should forget. The input gate is used to update the unit status.
First input the information of the previously hidden state and the current input information into the Sigmoid function, and adjust the output value between 0 and 1 to determine which information to update. The output gate determines the value of the next hidden state. The hidden state contains information about the previous input.

3) VALIDATION RESULTS
These methods were used to predict the out-flow of station No. 41 (YHZ) in May 2018 to verify the optimization results.
To evaluate and compare the forecasting performance of the proposed models and other alternatives, three common forecasting errors were selected as indices of performance assessment, namely the root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error(MAPE). The RMSE, MAE and MAPE can be calculated as follows: where y i is the forecasted value,ȳ is the average of forecasted value,ŷ is the actual value and n is the number of predictions. The prediction errors are shown in Fig. 9 and Fig. 10. It is evident from Fig.9 and Fig.10 that the search for the optimal sample contribution achieved good results in these prediction methods, indicating that the optimization result is credible. VOLUME 8, 2020

B. RESULT ANALYSIS
Studies have shown that the impact of training and prediction sample selection for the model. The factors that affect the sample contribution in aspect of the training sample features and the prediction sample features are analyzed.

1) TRAINING SAMPLE
Due to the characteristics of the EBP algorithm, the predicted value is determined by the input and output of the training sample. The distribution of the monthly sample contribution of the search in both the temporal and spatial dimensions is analyzed.

a: TEMPORAL DIMENSION
In the temporal dimension, the Euclidean distance between the sample and predicted input, the absolute distance between the sample month and forecast month out-flow, and the Pearson coefficient of the sample contribution with the distance were calculated. The Pearson coefficients of each station were calculated separately, and the results are presented in Fig. 11.
It can be seen from Fig. 11 that the Pearson coefficients of most stations were in the range of [−0.5, 0.5], indicating the weak correlation between input/output variables distance with the sample contribution. The relationship between input/output distance and sample contribution for stations No. 5 (HCL) and No. 14/54 (THM) is presented in Fig. 12.
From the figure, it can be found that the larger the distance between the sample and predicted input, the higher the con-  tribution of the sample. The larger the distance between the sample and predicted output, the smaller the contribution of the sample.

b: SPATIAL DIMENSION
In the spatial dimension, the sample contribution distributions of different stations are different, so it is necessary to analyze the relationship between the sample contribution distribution and the station characteristics. The Pearson coefficient of the monthly out-flow and the sample contribution distribution at each station were calculated, and the results are presented in Fig. 13.
It can be seen from Fig. 13 that except for the smaller absolute value Pearson coefficients of stations No. 14/54 (THM) and No. 20 (BKZ), the Pearson coefficients of all other stations were in the range of [0.5, 1], indicating that the monthly out-flow at each station has a strong correlation with the monthly sample contributions.
Further, taking station No. 41 (YHZ) as an example, the relationship between monthly out-flow and sample contribution is presented in Fig. 14.
It can be found that the smaller the monthly out-flow, the greater the sample contribution of this month. In other words, the out-flow is more sensitive for months with small out-flow, and the more important it is for the forecast month, the greater its sample contribution.

2) PREDICT SAMPLE
To verify the adaptability of the sample contribution to different prediction months, the sample contributions were  compared. Taking station No. 41 (YHZ) as an example, and the results are presented in Fig. 15.
It can be seen from Fig. 15 that there was the same trend of the sample contributions for different prediction months. In other words, the sample contribution trend does not change with the prediction months.

VI. DISCUSSION
In the previous section, the qualitative relationship between the out-flow, the input/output distance, and its monthly contribution distribution was given. In this section, stations are classified according to the relationship between the temporal and spatial dimensions and the distribution of the sample contributions. Based on the out-flow distance at each station, the central station of each type of station is calculated, and the sample contribution of each type of the central station is fitted.

A. STATION CLASSIFICATION
According to the results in Section 5, it was found that the influencing factors of sample contribution for each station are very different. The stations were thus divided into four categories based on the different influencing factors, and are defined as follows.
Type I: The sample contribution is related to the distance between the sample and the predicted input. It includes station No. 14/54 (THM). Type II: The sample contribution is related to the distance between the sample and the predicted output. It includes station No.20 (BKZ).
Type III: The sample contribution is not only related to the monthly out-flow trend, but also to the distance between the sample and the predicted output. It includes stations No. 5 (HCL), No. 10/29 (BDJ), No. 11 (WLK), and No. 34/47 (XZ).
Type IV: The sample contribution is related to the monthly out-flow trend. It includes all other stations.

B. CENTRAL STATION
In practical research, choosing a representative example is a common method to simplify the calculation for complex problems. Similarly, to effectively explain the characteristics of each type of station, it is necessary to select the central station of each type of station for further research. By measuring the distance, the central station of each type of station was determined.

1) FRECHET DISTANCE ALGORITHM
The concept of distance space was proposed by the French mathematician Frechet in 1906, and is also known as the Frechet distance. It extends the concept of distance in real space to a general set and provides a theoretical basis for distance measurement between abstract spaces. It focuses on taking the path-space distance into account, thereby improving the efficiency of the evaluation of the similarity of curves with a certain spatial timing. A where d is the metric function on S. The smaller the Frechet distance, the higher the similarity between the two curves.

2) CENTRALIZATION PROCESS
The central station of each type of station can be calculated by measuring the distance, and the specific steps are as follows.
Step 1: For each station, the Min-Max Normalization theory is used to normalize historical month out-flow; Step 2: Calculate the Frechet distance between the normalized out-flow values of any two stations of the same type of station; Step 3: The station with the smallest sum of distances from other stations is the central station of this type of station.
The results are presented in Fig. 16 and Table 4.  (Type III and Type IV). Types I and II each contain only one station.

C. TYPE OF STATION ANALYSIS
As determined by the temporal dimension, the first to third types of stations are generally interchange stations or tourist stations, and their out-flow is greatly affected by the fluctuation of the time series. The fourth type of station is generally a commuter station, and due to the elimination of the disturbance of major holidays, its out-flow changes are less affected by the time series, and its random fluctuations occupy a significant position on the impact of such stations. Further, specific expressions are given to various central stations via linear fitting in Fig. 17. Type I station includes only one station, No. 14/54 (THM), which is the interchange station for Line 1 and Line 3. Its function is primarily to undertake the interchange passenger flow, and its passenger flow is small and stable. Its monthly sample contribution is affected by sudden passenger flow caused by the time series. The fitting equation is: where x is the distance between the sample and forecast input, and R 2 = 0.67.
Type II station includes only one station, No. 20 (BKZ), which is located in the suburbs. It is the largest external transportation hub, and its passenger flow is affected by holidays. Its monthly sample contribution is affected by regular changes of passenger flow. The fitting equation is: where x is the distance countdown between sample and forecast output, and R 2 = 0.95. Type III stations include four stations, namely where x1 is the distance between the sample and forecast output, x2 is the monthly out-flow, and R 2 = 0.65. Type IV stations include all other 57 stations. Their monthly sample contributions are affected by commuter passenger flow. The central station fitting equation is: where x is the monthly out-flow countdown, and R 2 = 0.81.

VII. CONCLUSION
In this paper, the concept of sample contribution was proposed and integrated into the classical EBP algorithm. The EBP algorithm with the consideration of sample contribution was deduced, and the SCBP algorithm was obtained. The GWO algorithm was used to optimize the sample contribution degree, and the optimization framework of the sample contribution degree was constructed (GWO-SCBP). Taking the out-flow forecast of each station of the Xi'an urban rail network as an example, the contribution of historical samples in the forecasting process was optimized. The multiple prediction method was used to verify the optimization results, which were demonstrated to be credible. The optimization framework is therefore feasible. Furthermore, the optimization results were analyzed in aspect of the training sample features and the prediction sample features. It was found that the distribution of the historical sample contribution of each station is not only related to the time series, but also has a strong connection with the unique out-flow changes of each type of station.
Finally, by selecting various types of central sites, specific fitting formulas were provided, and were found to obtain good fitting effects.
Additionally, for urban rail transit passenger flow forecasting, although months with large contributions are more important, months with lesser contributions must also be included to create a complete time series to accurately reflect the characteristics of historical passenger flow.
Although the SCBP algorithm was used to study the sample contribution in urban rail passenger flow prediction, holidays, weather, events, and sudden passenger flow have always been a major difficulty in prediction. Therefore, research on different types of time series and passenger flow should be carried out in future work.