Forecasting the Short-Term Metro Ridership With Seasonal and Trend Decomposition Using Loess and LSTM Neural Networks

,


I. INTRODUCTION
The popularity of private cars has led to increasingly congested roads and an increase in CO 2 emissions. From an ecological perspective, it would be better to use public transport rather than private cars. The government increases investment in constructing metro for its unique advantages. First, a well-designed metro system can save travel cost and improve reliability by reducing traffic congestion on the roads. Second, the railway is a safe form as the chances of accidents and breakdown of railways are minimum as compared to other modes of transport. Besides, traveling by metro can save energy and reduce air pollution. It is doubtless that increasing investment in metro can improve traffic condition.
The quality of metro services has been concerned by more and more passengers with the rapid development of metro system. As the first choice of mass passenger, the issues including insufficient capacity and trampling must be solved urgently. The ridership is the fundamental basis of metro services. Accurate metro ridership prediction can get the The associate editor coordinating the review of this manuscript and approving it for publication was Zhan Bu . real-time situation of ridership and reflect the actual travel demand. It provides an important basis for traffic regulation and emergency warning. Besides, it can be used to adjust metro planning and minimize operating costs.
But the prediction accuracy of a single model on some datasets is limited. We need to make some attempts to improve prediction accuracy. The combination of multiple models may have a good result [13]- [15]. For time series data, it is promising to combine decomposition model and prediction model.
The decomposition of ridership data can eliminate the influence of periodic factors and reflect the real laws and trends of metro ridership, which can improve the accuracy of metro ridership prediction. Thus, some researchers tried to propose the hybrid model which combines decomposition method and prediction algorithm, such as a hybrid model combining ensemble empirical mode decomposition and gray support vector machine [16], a prediction model based on wavelet decomposition with support vector machine [17], an empirical mode decomposition based long short-term memory neural network forecasting model [18], a hybrid model based on ARIMA and wavelet decomposition [19], a new model based on long short-term memory network and Gaussian mixture model [20], a novel model based on variational mode decomposition and other outstanding models [21], [22].
Metro ridership might be influenced by holidays and major events easily. Taking the annual average ridership as a baseline, daily ridership on the Spring Festival is lower than it, but the one on the National Day is higher than it. The celebrations attract a great number of people which leads to an increment in metro ridership. While the metro ridership will drop sharply if there are cases of extreme weather. Besides, it is obvious that metro ridership owns a typical weekly pattern. Those irregular fluctuations bring the single model some troubles. The above decomposition models can decompose series and improve prediction accuracy, but they do not make full use of the characteristics of time series if applied in the field of metro ridership prediction.
The seasonal-trend decomposition based on loess (STL) fully considers the characteristics of ridership data and decomposes it to seasonal, trend and residual components. It can perform well in prediction [23], [24] and could be used in exponential smoothing [25]. Besides, it was used in long-term traffic prediction recently [26], but the authors did not discuss the impact of frequency of decomposition and the time-step of LSTM neural network.
In this paper, the STL is used to separate the seasonal series, trend series and residual series from metro ridership data. The LSTM neural network is employed to forecast three decomposed series respectively. Finally, the predicted values of each LSTM model are added up to attain the predicted ridership. In addition, the prediction results of single LSTM, STL-LSTM, EMD-LSTM and SVR are compared.
The remainder of this paper is structured as follows. Section II introduces the STL, LSTM and SVR models. In Section III, the hybrid algorithm of the STL-LSTM, EMD-LSTM and SVR are described. A case analysis is proposed, and the experimental results are provided in Section IV. Finally, the conclusions are presented and limitations of this algorithm are discussed.

A. SEASONAL-TREND DECOMPOSITION BASED ON LOESS
STL is a time series decomposition method which is based on locally weighted scatterplot smoothing (loess) [27].ĝ (x) is a loess regression curve which is used for smoothing. It can be computed in the following way. Suppose x i and y i for i = 1 to n are measurement of a dependent variable, respectively.
At first, choose an integer q as the number of values that are closest to x. And then set a neighborhood weight for x i according to the distance between x i and x when q ≤ n. W denotes a weight function.
The longer the distance λ q (x) is, the smaller the weight STL is made up of two procedures, including an inner loop and an outer loop. The inner loop is nested inside the outer loop. The main steps of inner loop are seasonal smoothing and trend smoothing. The detailed steps of kth epoch are as follows: Step 1: Detrending. Get a new series by subtracting trend values T k v from original values Y v .
Step 2: Cycle-subseries Smoothing. Each cycle-subserie obtained from Step 1 is regressed by loess, the result is recorded as C k+1 v .
Step 3: Low-Pass Filtering. The filter for C k+1 v includes three steps. The first step is a moving average of length n. n is the number of samples. The followed step is also a moving average of length n. The last step is a moving average of length 3. And then, the loess is applied to the results of low-pass filtering. The result is recorded as L k+1 v .
Step 4: Detrending. Get seasonal series S k+1 v from C minus L.
Step 5: Deseasonalizing. Get deseasonalized series from Y minus S.
Step 6: Trend Smoothing. The trend series T (k+1) v is got after the loess is applied to deseasonalized series.
The steps of outer loop are as follows: The values of T v and S v are got after the inner loop. Then the residual series R v can be calculated according to Eq. 4: The robustness weight ρ is defined for evaluating the robustness of R v . ρ v is the robustness weight at time v.
The formula of bisquare weight function B is as follows:

B. LONG SHORT-TERM MEMORY NEURAL NETWORK
The traditional neural network has been plagued by the inability to interpret input sequences that depend on information and context. The Recurrent Neural Network (RNN) feeds back the output to the input to provide the context of the last input to solve this problem [28]. However, RNN has a considerable disadvantage, which will lead to the problem of vanishing gradient. The LSTM neural network is an improved algorithm whose neurons can keep memory in their channels to mitigate the vanishing gradient problem effectively [29]. That makes LSTM neural network become popular in time series prediction [30]- [32].
The key to LSTM neural network is the cell state. The LSTM can remove or add information to the cell state through the three gates.

1) FORGET GATE
The forget gate can decide how much information obtained from the previous moment can be retained at current moment.
where W f is the weight matrix of the forgetting gate, x t is the current input, h t−1 is the previous output of memory block, b f is the bias term of the forgetting gate, and σ is the sigmoid function.

2) INPUT GATE
The input gate decides how much information obtained from the current input x t can be saved in the cell state c t :

3) OUTPUT GATE
The output gate is closely related to the output value h t .
The entire transition process of old cell c t−1 to the new state c t is as follows. The tanh is an active function.
The structure of three gates is described as Fig. 1.

Support vector machine (SVM) is proposed for binary classification, and SVR is an important application branch of SVM.
The regression expression is as follows.
SVR can be formalized as follows. where C is a constant value for regularization, and l is a function of -insensitive loss. The loss is only calculated when the difference between f (x i ) and y i is greater than .
The utilized function is radial basis function.

III. PROCEDURAL FRAMEWORK A. STL-LSTM PREDICTION MODEL
The STL-LSTM is a hybrid algorithm which combines the addition mode of STL and LSTM neural network to improve prediction performance. The specific steps are as follows: Step 1: Calculate the number of metro station ridership during the statistical period to obtain the original series.
Step 2: Set the decomposition period and use the STL to decompose the original series into three sub-series.
Step 3: The data is divided into training set and test set, and the time-step and predict-step are set before starting training. The three sub-series obtained by Step 2 are trained using the same parameters of the LSTM neural network, and their test sets are predicted separately.
Step 4: Add the prediction results of the sub-series as the prediction results of the original data, and then calculate the R 2 , mean absolute error (MAE), and root mean square error (RMSE).
where Q denotes the predicted value, Q is the real value,Q is the mean value of Q, and n is the number of data in the test set.
Step 5: Adjust parameters to find the optimal frequency of decomposition and the time-step of LSTM until indexes can not attain a massive improvement. VOLUME 8, 2020 Step 6: The model with the best indexes in the test set can be used to predict the further metro ridership.
The flow chart of STL-LSTM is presented in Fig. 2.

B. EMD-LSTM PREDICTION MODEL
EMD-LSTM is a similar model which combines the EMD and the LSTM neural networks for short-term load forecasting, financial time series forecasting, foreign exchange rates forecasting [33]- [35] and so on. First, the original series is decomposed into several sub-series by the EMD. Second, establish the LSTM prediction models for the sub-series to get the prediction value respectively. Finally, reconstruct the prediction values and get the final result.

IV. CASE STUDY A. STATISTICS
The original series is the daily metro ridership of Fuzhou Metro Line 1 in 2018 calculated from the AFC transaction data. The average metro ridership in 2018 is 167,000. During holidays, the number of metro ridership was lower than the annual average one during the Spring Festival, Qingming Festival, Dragon Boat Festival and Mid-Autumn Festival but higher during the other holidays. Especially during the Spring Festival, many metro passengers have not returned to the city yet, and the cold and rainy weather during holidays is not conducive to public travel. Besides, the influence of typhoon makes the ridership on July 11 become the lowest. The celebration of the Lantern Festival held at Dongjiekou attracted a large number of passengers, leading to the highest metro ridership on March 2, 2018. These reasons lead to many irregular fluctuations in the dataset. Fig. 3 shows the original series utilised in this paper. The periodicity of metro ridership is also obvious, especially the values of autocorrelation function (ACF) and partial autocorrelation function (PCAF) are rising sharply when the Lag at multiples of seven as shown in Fig. 4.

B. PROBLEM STATEMENT
This study tries to establish a model which can mitigate the influences of irregular fluctuation and improve the performance of short-term metro ridership prediction.
In the following section, first, we decompose the original series via STL. Second, a single LSTM prediction model, an SVR model, an STL-LSTM hybrid prediction model and an EMD-LSTM model are used to predict the ridership in the test set. Finally, the experimental results will be analysed to select the better model.

C. THE FREQUENCY OF DECOMPOSITION
The frequency of decomposition has an impact in the final results. Fig. 5 shows the decomposition results of STL.
The mean relative error (MRE) is used to select the frequency of decomposition.
where Q denotes the predicted value, Q is the real value and n is the number of data in the test set. It's found that the improvement is not obvious after the frequency of decomposition is more than 10 from Fig. 6. So this paper just discusses the results of frequency range during 2-10.

D. THE DECOMPOSITION RESULT OF EMD
The difference between EMD-LSTM and STL-LSTM is the decomposition method. Fig. 7 shows the data distribution after the original series is decomposed by the EMD.   Six intrinsic mode functions (IMFs) and one residue are obtained after the original series decomposed by EMD.

E. PARAMETERS OF LSTM
All experiments involving LSTM use the following parameters. The training set consists of 70 % of data and the test set is the remaining 30 %. The number of hidden layers of LSTM is 1. There are four neurons in the hidden layer. The activation function is tanh. The loss function is mean squared error. The optimizer is adaptive moment estimation. The number of neurons in the output layer is 1.

F. PREDICTION RESULTS
The highest accuracy occurs when the frequency of decomposition is set as 2 (see Fig. 6), so the index of STL-LSTM in this section is all got when the frequency of decomposition is 2. VOLUME 8, 2020 We choose R 2 to compare the fitting precision of different model. The closer the value is to 1, the better the performance. It can be found from Fig. 8 that the single LSTM for this dataset does not perform well, the SVR can perform better, but the SVR underperform EMD-LSTM, and STL-LSTM can get highest rating at every time-step.
As shown in Table 1, the MAE of the STL-LSTM is the lowest in those models. The mean value of STL-LSTM is a little smaller than half of the value of LSTM. But there is little improvement in EMD-LSTM and SVR.
We can find that the RMSE of the STL-LSTM is also the lowest in Table 2. There is an interesting fact that the MAE of SVR is smaller than that of EMD-LSTM, but the RMSE of SVR is larger than that of EMD-LSTM, that means the deviations of some prediction values in SVR are much larger than those in EMD-LSTM.

V. CONCLUSION
In this paper, a hybrid algorithm that combines the addition mode of STL and the LSTM neural network was proposed to improve the performance of short-term metro ridership prediction. The STL decomposed the ridership data into seasonal, trend and residual series. And then the LSTM prediction model was established for each of the three sub-series. Finally, the accurate prediction of metro ridership was achieved by adding up all the prediction results.
The study found that the frequency of decomposition has an influence on the final single-step prediction accuracy. Generally speaking, smaller frequency of decomposition can achieve higher prediction accuracy. In addition, the indexes of STL-LSTM are much better than other models, which denotes the effectiveness of decomposition. Compared with EMD-LSTM, the prediction accuracy of STL-LSTM is also improved in the test set, which means that STL-LSTM is more suitable for the decomposition of metro ridership.
In the future, we will try to use different data sources, set different parameters of LSTM and use different models for each decomposed series to explore the generalization and optimization of the algorithm. In addition, it is a good direction to apply optimization methods for adaptive step size in the model to improve the forecasting accuracy further.