Short-Term Prediction of Available Parking Space Based on Machine Learning Approaches

Reliable short-term prediction of available parking space (APS) is the basic theory of parking guidance information system (PGIS). Based on the Intelligent parking system at the Eastern New Town, Yinzhou District, Ningbo, China, this study collected the data of parking availability in the on-street parking areas. The variation characteristics of APS were investigated and analyzed at different spatial-temporal levels. Then the APS prediction models based on Gradient Boosting Decision Tree (GBDT) and Wavelet Neural Network (WNN) were proposed. Furthermore, an improved WNN algorithm with (WA) decomposition and Particle Swarm Optimization (PSO) were presented. The original time series was decomposed and reconstructed by wavelet analysis, and the WNN algorithm found the optimal threshold of initial weight through PSO. The result of GBDT (weekday: MSE = 27.37, SMSE = 0, TIME = 35min, weekend: MSE = 9.9, SMSE = 0, TIME = 35min) and WA-PSO-WNN (weekday: MSE = 14.93, SMSE = 1.88, TIME = 160.32s, weekend: MSE = 12.33, SMSE = 10.23, TIME = 160.95s) approximated the true value. But the prediction time of GBDT was too long to be applicable to the short-term prediction of APS in this paper. Compared with the methods of GBDT, WNN, and PSO-WNN, the WA-PSO-WNN algorithm performs much better. The average differences in MSE between WA-PSO-WNN and GBDT for weekday and weekend data are 45.45% and 58.76%, respectively, indicating that WA-PSO-WNN can increase the prediction accuracy of weekday and weekend data by an average of 45.45% and 58.76% compared with the GBDT model. Finally, the application prospects of short-term APS forecasting were also discussed in reducing cruising parking behavior, reducing illegal parking behavior and adjusting dynamic parking rates to verify the importance of APS short-term forecasting.


I. INTRODUCTION
The increase of available parking spaces (APS) cannot catch up with the rapid increase of motor vehicles in the central business district of the cities. If most parking spaces were occupied, the parkers would cruise to search an available The associate editor coordinating the review of this manuscript and approving it for publication was Maurice J. Khabbaz . space with low-speed and frequent lane-change behavior. This phenomenon of cruising for parking deteriorates the traffic congestion and pollution in the cities. The inefficiency of cruising for parking attributes to no provision of available parking information. In order to guide the cruising and travel behavior of parker, lots of intelligent parking management and guidance systems have emerged in the recent years. The provision of available parking information is a prerequisite for advanced traveler information system, smart parking management and guidance systems. Accurate and reliable short-term APS prediction is the most important factors of parking guidance information system (PGIS). If shortterm APS information could be accurately forecasted and released in a timely manner, the drivers could reduce the time of searching the vacated parking facilities, and effectively reduce the contribution of cruising vehicles to traffic congestion and pollution. In addition, the managers of parking facilities can foresee the performance of parking facilities and make timely intervention to improve the management efficiency of parking lots, so as to increase the parking turnover, adjust the parking fees dynamically and earn more revenue. Moreover, accurate APS predictions, suitable parking location recommendations, and reasonable parking allocation advices would be even more demanding under autonomous vehicles environment in the future.
There are two main methods for APS prediction: One is traditional statistical theory of prediction, such as Autoregressive moving average (ARIMA) model [1], [2], Markov model [3], [4], Kalman filter [5] and Multiple Regression model [6]. The advantages of these models are the simple calculation and fast solution. However, these kinds of methods can not reflect the uncertainty and nonlinearity of APS time series. They do not have the ability of self-adaptation and self-learning, the robustness of the prediction system is not guaranteed and the prediction accuracy cannot meet the actual requirements greatly.
The other is the combinatorial model based on machine learning approach. Machine learning approaches mainly include tree-based algorithms, neural network-based algorithms, integration algorithms and so on. Gradient Boosting Decision Tree (GBDT) is an iterative Decision Tree algorithm composed of Multiple Decision trees. The conclusions of all trees are added up to make the final answer. It is the best several algorithms for real distribution fitting in the traditional machine learning algorithm. Li [7] presented an improved gradient-boosted decision tree algorithm based on Kalman filter to predict the future traffic of mobile base stations in urban areas. Qiu [8] explored the correlation between pavement temperature of asphalt pavements and meteorological factors and implemented an accurate trend prediction of the asphalt pavement temperature based on GBDT.
Neural network model has become a common method in nonlinear prediction. It has the advantages of nonlinearity and strong self-learning, adaptive ability and robustness. It is generally applied to short time series with nonlinear and complex characteristics. Many scholars combined nonlinear system theory and optimization algorithm with neural network to improve the prediction accuracy, such as fuzzy neural network [9], the combination of chaos theory and neural network [10], wavelet neural network [11]. Among these prediction methods, the neural network was prone to fall into local optimization and low accuracy of prediction result. On the basis of phase space reconstruction, Chen [12] used Elman neural network to predict short-term available parking spaces. Vlahogianni et al. [13] predicted the occupancy of regional parking in the next 30 minutes with the multi-layer perception (MLP) optimized by genetic algorithm. Liu [14] proposed an APS prediction algorithm based on the LSTM model of convolutional neural network (CNN).
Many scholars have improved the neural network prediction method on the original basis, so as to improve the efficiency of the model. Ebtehaj and Bonakdari [15] proposed a method based on a combination of particle swarm optimization (PSO) algorithm with adaptive neuro-fuzzy inference system (ANFIS) to estimate the minimum densimetric Froude number required for sediment transport with no solid substance deposition in channel pipes. Huang et al. [16] proposed a maximum likelihood estimation method based on particle swarm optimization for generalized Pareto model to detect outliers of time series, which can be called Generalized Pareto Model Based on Particle Swarm Optimization (GPMPSO).
In this paper, a combined prediction model of wavelet neural network (WNN) and wavelet analysis (WA) has a good ability to extract local information of time series, and the global search capability of particle swarm optimization (PSO) algorithm can greatly improve the prediction stability.
The main structure of the article was organized as follows: Gradient Boosting Decision Tree (GBDT) model and wavelet neural network (WNN) model were used to predict the APS in the parking lots respectively. Furthermore, the wavelet neural network algorithm was modified. The initial time series was decomposed and reconstructed by wavelet analysis (WA) method of nonlinear system theory. Then the decomposed sequence was taken as input information, the prediction was carried out by particle swarm optimization wavelet neural network (PSO-WNN), and the predicted results were added linearly to obtain the final predicted results. This paper compared the prediction results of GBDT, wavelet neural network and improved wavelet neural network to verify the prediction advantages of the combined model of wavelet analysis and particle swarm optimization wavelet neural network (WA-PSO-WNN).
The study aimed to develop short-term dynamic prediction models for the available parking space using the machine learning approaches with large-scale parking data. The information of APS is important because it provides useful information to identify the vacant parking spaces, decrease the cruising behaviors and traffic congestion and rebalance the supply and demand of parking. The paper has two contributions: (a) developed the short-term prediction model for the APS using the machine learning approaches with large-scale parking data; (b) revealed the spatial-temporal characteristics of the available parking spaces in the central city. The results of this study provide useful parking guidance information to develop effective and timely rebalance strategies to increase the operational efficiency of the parking infrastructure and systems. VOLUME 8, 2020

II. CHARACTERISTICS OF APS
The APS data was taken from the Eastern New Town, Yinzhou district, Ningbo, Zhejiang province, China, which is the new economic and political center of Ningbo city. Obviously, parking demand is growing in the Eastern New Town. As experienced, the predictive performances of the models increase in the aggregation time interval. The data aggregated at 5-min interval have greater data noises and more useless fluctuation information than the longer time intervals. The data aggregated at shorter time intervals are more difficult to be predicted. Previous studies about shortterm traffic flow forecast have also found similar results that prediction accuracy of traffic flow data aggregated at a shorter time interval is worse than a longer time interval. On the other hand, a shorter time interval is useless for parking guidance and travel planning. On the contrary, a longer time interval (like 30-min) filtrates the fluctuation information of the data. It is too long to be suitable for the parking planning in our daily trip. Therefore, the 15-minute for APS forecast interval is more appropriate for drivers' travel plan and more flexible for parking management.

A. LOCATION AND TIME-VARYING CHARACTTERISTICS OF PARKING AREAS
As shown in Fig.1, the scope of our study was mainly divided into the following three parts, the area of Dongfangyipin, Financial Silicon Valley and The Central Area. The land types included residential, administrative office, business, entertainment, education and so on.
According to geographical locations in the areas, the research area was divided into three zones. Therefore, the geographical location was input into the models as a spatial feature. Dongfangyipin, Financial Silicon Valley and The Central Area were numbered as NO.1, NO.2, NO.3. The APS data for all the on-street parking areas were collected from April 1st to April 30th in smart parking system of Eastern Town. The APS data included the parking location, parking duration, parking fee, the start and end time of each parking, and each parking space ID.   According to the original data, the parking characteristics of parking areas by taking the situation on April 30th were collected as an example. The Fig. 2 to 4 showed the variation of the arrival rate, departure rate, occupancy rate and turnover rate of parking areas in The Central Area, Dongfangyipin and Financial Silicon Valley respectively. The arrival rate reached at peak around 8:00, while the departure rate reached at peak around 18:00. The occupancy rate of parking areas was high until 18:00, and then showed a downward trend. And the parking occupancy rate of The Central Area was the lowest among the three areas. The changing characteristics of turnover rate had a similar trend with occupancy rate.  Determining the degree of stationarity of a given time series is an important task of time series analysis. The mean and variance of a stationary time series remain constant in time. Fig.5 exhibited the time series' row values and moving average. It could be seen that the daily change of APS was regular, and the number of APS on weekends was a little more than working days as shown in Fig. 6 and 7. The parking occupancy on weekdays was lower, which was consistent with the parking characteristics of Financial Silicon Valley as a business district.
The APS variations have been relatively much more stable and moderate as it can be seen from the moving average curve. To measure the degree of stationarity of the time series, this paper used the Augmented Dickey Fuller (ADF) test [17]. The p-value and ADF statistics of applying the ADF test on the time series were given in Table 1. Here, the p-value was 0.00, and the critical values were all smaller than ADF Statistic value. Therefore, the null hypothesis (H 0 ) can be rejected and the time series was stable. The APS time series could be further predicted.
According to the stationarity test result, the self-learning and adaptive function of GBDT and neural network could  learn the short-term changing characteristic well, and it can be seen that the time-varying characteristics of APS was obviously non-linear. The shorter the time interval is, the greater the fluctuation performances. When the APS time series were generally stable, part of data were fluctuation. Wavelet decomposition has the function of shielding interference signals in high frequency signals. It was applicable to the effective tool of preprocessing local fluctuations in time series and could transform to highlight some features of samples. Because of the advantages of local analysis, it could also accurately measure local information of time series.

III. METHODOLOGY A. GRADIENT BOOTING DECISION TREE (GBDT)
Gradient boosting decision tree (GBDT) algorithm integrated by boosting is an iterative decision tree algorithm, which is an addition model (linear combination of the basic functions, and the decreasing of the residuals in the training process is used to complete the data classification or regression [18].
GBDT has strong predominance in large-sample prediction: It is very flexible in dealing with complex nonlinear relations, and it can simultaneously process different types of data, etc. These characteristics make GBDT method an appropriate means to predict the future APS of the parking lots [19].
Boosting Tree algorithm adopted in this study belongs to Boosting algorithm in integrated learning, and its learning mechanism is to construct M different individual decision trees iteratively: h (x, a 1 ) . . . h (x, a M ). Where, the nth decision tree can be expressed as where, f n−1 (x) is decision tree from the first decision tree to the (n − 1)th, bn represents the node weight of the nth tree. VOLUME 8, 2020 Assuming that the learner obtained by the iteration of the (n−1)th round is f n−1 (x), the loss function is L (x, f n-1 (x)) the goal of the nth iteration is finding an b n h (x, a n ) to minimize the loss function L (x, f n-1 (x)) for this round.
The regression algorithm of GBDT is summarized as follows: Input For the number of iteration round i = (1, 2, · · · , T ), 1) For sample i = (1, 2, · · · , m), the negative gradient is calculated as Fitting one CART regression tree with to get the first t regression tree, and the corresponding leaf node region is j, where J is the number of leaf nodes of the regression tree t.
2) The optimal fitting value for the leaf region (j = (1, 2, · · · , J ) is 3) Update learner: 4) The expression of strong learner f(x) is Artificial neural network has strong learning and mapping ability to fit arbitrarily complex nonlinear relations easily, and has the characteristics of parallel operation for information processing and reasoning process. It has great potential in solving the control of highly nonlinear and seriously uncertain systems [20]. At present, however, there is no set of definite theory to guide the design of network structure, such as the determination of the number of hidden layer nodes and the selection of input layer. The randomness of neural network input will affect the prediction results. The wavelet neural network model combines the self-learning ability of the neural network with the advantages of wavelet transform, so the wavelet neural network model has strong approximation ability and relatively strong fault-tolerant ability. Meanwhile, the wavelet neural network model has good robustness and convergence, so as to better realize the prediction. The topology of the wavelet neural network (WNN) is similar to BP neural network, but the transfer function of the hidden layer of the WNN is replaced by the wavelet basis function, and the forward signal propagation is accompanied by the back propagation of the error. By introducing expansion and translation factors into the WNN, the blindness problem of BP neural network is solved. In addition, the selflearning habit is stronger, the prediction accuracy is higher, and the approximation ability of the network is enhanced, so that the convergence speed is faster.
The fusion WNN places the node function of BP neural network with the wavelet basis function, and replaces the weight and threshold of the input layer to the hidden layer with the scale factor and translation factor in the wavelet function. The topological structure of the fused WNN is a three-layer network structure, with one input layer and one output layer, but the number of hidden layers is only one. The training process of the network is also divided into two parts: the forward transmission process of signal and the back propagation of the error. The difference is the transfer function of the hidden layer is replaced by the wavelet basis function, and the weight and threshold are replaced by the scaling and shifting factors of the wavelet [21]. The common excitation functions of the output layer include Sigmoid function and linear Purline function.
The structure of WNN is shown as following: 1) Forward propagation of signal Where x i is the input parameter of WNN, y k (k = 1, 2 . . . K ) is the predicted result as output value of the WNN, ω ij and ω jk are the weights of WNN, the calculation formula of hidden layer output can be computed as below: where v j is the output value of the jth node of the hidden layer, ω ij is the link weight from the input layer to the hidden layer, a j is the translation factor of the wavelet basis function, b j is the scaling factor of the wavelet basis function, (x) is the wavelet basis function (x) = cos(1.75x) (−x 2 /2) . The formula of output layer of WNN as follows: where ω jk is the weight from the hidden layer to the output layer, J is the number of hidden layer nodes, k is the number of nodes in the output layer, f is the non-decreasing and nonlinear differentiable function, generally is taken as sigmoid function: The purpose of neural network learning is to minimize E p for each object. Thus, the total network error E can be minimized.
where m is the number of output points, y nk and y k are the expected output and actual output of the kth node in the output layer respectively. 1) Weight changes determination and error back propagation based on gradient descent method WNN modifies the weights of the network by gradient correction method, making the actual output of the network approximate to the expected output.
In this process, the parameters are adjusted repeatedly until the training error condition is satisfied.

C. WAVELET TRANFORM
The essence of using wavelet analysis theory to transform time series is to select appropriate wavelet function to change the amplitude of its stretching displacement on the time axis. It leads to generate a batch of analytical wavelet. Comparing the time series with the translation of these analysis wavelets on the time axis, the wavelet coefficients that characterize the similarity between the sequence and the wavelet are obtained [22]. Mallet algorithm is used for wavelet decomposition and reconstruction in this paper. Mallet algorithm is expressed as follows: where H and G are low-pass filter and high-pass filter respectively, m is the decomposition scale, and c 0 is the original time series. Through the above formula, the original time series can be decomposed into high frequency coefficient vectors d 1 , d 2 , . . . , d M , and low frequency coefficient vector c M . As the observation scale of wavelet decomposition increases, the number of samples decreases geometrically. The samples in the time series will be reduced by half after one decomposition. But it is bad for forecasting. However, reconstruction algorithm can be used to reconstruct the sequence decomposed by Mallet algorithm. The reconstruction algorithm is described as follows: where h and g are dual operators of H and G respectively. · · · + D M By the method of wavelet decomposition and reconstruction, the time series is divided into low frequency and high frequency parts, and the interference signals are shielded in the high frequency signals, which can effectively improve the prediction accuracy [23].
The combination of wavelet transform and neural network is called loose wavelet neural network. The input of neural network is preliminarily processed by wavelet analysis, which makes the input signal more conducive to neural network processing. In this paper, WNN refers to the fusing wavelet neural network, which fuses neural network and wavelet directly, replaces neuron with wavelet element, and the weight value and threshold value of the input layer to the hidden layer are determined by the scale and shift parameters of the wavelet function.

D. PARTICLE SWARM OPTIMIZATION ALGORITHM
The initial weights and thresholds (w ij and w jk ) of WNN have an important impact on the predictive performance and the stability of model. But they randomly assigned. In order to overcome this drawback of WNN, PSO mainly uses its global search ability to determine the optimal initial weights and thresholds (w ij and w jk ) in the WNN algorithm.
Particle swarm optimization (PSO) is an optimization algorithm based on swarm intelligence theory. Each member of the group represents a feasible solution, and the location of the food is considered to be the global optimal solution. Each member learns from the personal best position (p-best) and the global best position (g-best) in the population, and finally approaches the position of the global optimal solution [24]. The mathematical expression of PSO algorithm is given as following: . . x in ) T represent the position of the ith particle in the n-dimensional optimization space, V i = (v i1 , v i2 , . . . v in ) T represent the velocity of the particle I, P i = (p i1 , p i2 , . . . p in ) T represent the p-best, and P g = (p g1 , p g2 , . . . p gn ) T represent the g-best.
The particle updates its own velocity and position through p-best and g-best: VOLUME 8, 2020 where i = 1, 2, · · · , N , w umax = 0.9, w umin = 0.4. t is the current iteration number of the algorithm, T is the total iteration number of the algorithm, r 1 and r 2 are random number between 0 and 1, and c 1 , c 2 , c 3 are the constant coefficients to adjust the learning rate of the algorithm. In addition, in order to prevent the particle's speed from too large, there is supposed to set the upper limit of V max , The first term on the right in (18) presents the previous velocity state of the particle. The second is the ''cognition'' term, which means that the particle searches with its own experience. The third is the ''social'' item, which means that particles conduct collaborative searches based on their own and other particles' experiences [25].
Process of PSO is shown as following: Step 1: Initialize the learning parameters, the maximum number of iterations, and the limited space of position and speed values. Generate the random particle population (t) in the feasible domain of the search space. And the particles which generated within the feasible range of velocity are the initial velocity matrix (t).
Step 2: Determine the evaluation function of the particles. Each particle in the particle swarm calculates its corresponding fitness value through fitness function.
Step 3: Update p-best and its adaptive value information. Through comparing p-best and the fitness value of the particle, assign the better location information to p-best, and it is taken as the fitness value at p-best. Similarly, the g-best value and its adaptive value information are updated Step 4: According to (18) and (19), the velocity and position of particles are updated to a new generation of population X (t+1) .

E. ALGORITHNM DESCRIPTION
Process of WA-PSO-WNN prediction in Fig. 9: Step 1: Collect APS data, and produce the time series C. The data were normalized by formula x i = x i −x min x max −x min . Finally, they are de-normalized.
Step 2: Using ''db32'' function as wavelet basis, decomposing the original effective APS time series into N layers, it can be obtained N + 1 sub-effective APS time series. They are the high-frequency interference time series D 1 , . . . D N of the response uncertainty and the low-frequency determination time series C 1 , . . . , C N of the change trend of the response essence.
Step 3: The decomposed time series is taken as the data sample input of hidden layer for wavelet neural network prediction. The prediction error sum of squares as the fitness function of the improved particle swarm optimization algorithm. The PSO algorithm is used to compare the current value of each particle and the local optimal value. At the same time, comparing the local optimal value of each particle and the global optimal value, iteration update global optimal fitness calculation, and finally output the optimal initial weight and threshold.  Step 4: The weight threshold is fed back to the WNN for prediction, and the N+1 prediction result was superimposed linearly to obtain the final prediction result of APS.

A. PARAMETER SETTING
The algorithm procedure was written by Matlab language. The experimental data was the APS in the Eastern town from April 1st to April 29th.The inputs of the neural network were the numbers of APS at t min, t-15min, t-30min, t-45min, t-60min, t-75min, t-90min, date characteristics and location characteristics. The output node was the APS prediction in the future 15min. The APS data of past 27 days (April 1st to 27th) were used as training samples and the APS data from April 28th to May 10th were testing samples. The predictive results on the weekday (April 29 th ) and weekend (April 28 th ) were chosen as the examples.
''db32'' function was used as the wavelet basis function to decompose the original effective APS time series in three layers and get four APS sub-time series. They were high frequency interference time series D 1 , D 2 , D 3 of the response uncertainty factor and low frequency determination time series C 4 of the change trend of the response essence. The results were shown in the Fig. 10.  In order to optimize the model structure and increase predictive performance, the sensitivity analysis was conducted to adjust the parameters of GBDT, WNN, PSO-WNN, and WA-PSO-WNN, including the number of training epochs, batch size, number of nodes, dropout rate, the network weights and thresholds, and the number of iterations. The number of input nodes of the GBDT and WNN were both 8, including the APS of past seven time periods and location number. The number of output node was 1, which was the APS of current time. The learning rate of GBDT model was 0.1 and the number of iterations was 500 times. The number of hidden layer node of WNN was 8, and the learning probability was 0.01 and the number of iterations was 500 times. The initial network weights and thresholds were the output results of PSO. The convergence error was 0.01 as the optimum fitness of particle swarm. The population size of the PSO was set to 200.

B. ANALYSIS OF PREDICTION RESULTS
According to the above parameters, Prediction algorithm of GBDT, WNN, PSO-WNN, WA-PSO-WNN were respectively carried out for 5 times, and the average values of the 5 times of prediction were taken as the final results. The predicted results and error were shown in the Fig. 11 to 14 on weekday and weekend respectively.
For the illustrative purposes, Table 2 and 3 were developed to demonstrate the predicted results of the proposed models  and actual values for the peak and off-peak on the weekday. Comparing the prediction results on weekday and weekend, we could see from the Figure 11 to 14 that the results of the weekday was better than weekend, which because the training sample of weekdays was more than weekends in April significantly. The limitation of the forecasts limit was that the training samples of weekends were less than weekdays. The reduction of training samples affected the model's comprehensiveness of feature capture to a certain extent.
Comparing the prediction results in three areas, we could find that the prediction errors of the three areas were basically similar. The generation of prediction error was mainly related to the stability of the time series itself. The more stable the time series is, the better the prediction result will be.
Comparing the prediction results in the different methods, the prediction error of GBDT and WA-PSO-WNN models were both within 0.10. The integrated learning algorithm was superior to the wavelet neural network in the feature capture capability of time series, while the improved wavelet VOLUME 8, 2020  neural network improves the feature capture capability. Furthermore, after the WNN algorithm improved by PSO algorithm, the prediction result became much stable and wasn't easy to fall into local optimization. However, this algorithm suffered from the unsatisfied time consumption. PSO-WNN uses PSO to help WNN find the optimal initial parameters, but the interference caused by the original sequence stability did not improve. The WA decomposition algorithm was used to optimize the input information of the network. On the basis of the PSO-WNN algorithm, the predicted result was further optimized to extract the local information in the initial sequence, for approximating the expected output better when the original sequence fluctuated severely, WA decomposition algorithm solved the problem that the part of original sequences was not stability, and lead a better approximation of the actual value.
Between 6:00 am to 8:00 am, there was a huge increasing in demand of parking space. During the off-peak periods, the  demand of parking space changed much slowly. Comparing the prediction results on weekday between peak periods and off-peak periods, the error in peak periods were great than off-peak time. The APS in the morning peak periods was volatility, which caused a lower accuracy. Fig.15 showed the variation of training error before and after PSO joining the network and the fitness of PSO varies with evolutionary algebra. It can be seen that when wavelet neural network was only used for iterative training, the training error t no longer reduced until the number of iterations was more than 350, and the algorithm fell into local optimization. The convergence error was greater than the algorithm after adding PSO. Obviously, the global search capability of PSO was beneficial to decrease the convergence error of the neural network, and accelerate the convergence speed. Meanwhile, PSO could also avoid the network training falling into local optimization to some extent, and reduced training error. Figure 16 showed the fitness of PSO algorithm. It can be seen that the training error of the whole network reached the preset value of 0.01, and the network had good approximation ability.  To better test the relative predictive performance of the proposed methods, the datasets were divided into the training set and test set. The training dataset was applied for the model calibration, and the test dataset was applied for the evaluation of the model performance. Finally, the results of the three models were evaluated from the aspects of predicted mean square error (MSE), time consuming and stability, shown in the Table 4 and 5.
The stability of the model is measured by the standard deviation of the MSE of the five times predictions: where, MSE is the mean square error of the results obtained from the average of the five predictions. MSE i is the mean square error of the five predictions respectively. The prediction result showed that: 1) Comparison of prediction accuracy among different models. Table 4 and 5 compare the predictive performance of GBDT, WNN, PSO-WNN, and WA-PSO-WNN on the validation data sample of weekday and weekend. The MSE was used to measure the predictive performance. The results show that the WA-PSO-WNN produces significantly higher prediction accuracy than other methods. Taking GBDT as an example, the average differences in MSE between WA-PSO-WNN and GBDT for weekday and weekend data are 45.45% and 58.76%, respectively, indicating that WA-PSO-WNN can increase the prediction accuracy of weekday and weekend data by an average of 45.45% and 58.76% compared with the GBDT model. The predictive performances of WA-PSO-WNN are also better than those of three machining learning models. The WA-PSO-WNN increase the prediction accuracy of weekday and weekend data by an average of 80.86% and 71.37% compared with WNN. Therefore, the predicted result of WNN is far inferior to GBDT. Therefore, compared with WNN, GBDT has a better performance in APS prediction accuracy. The GBDT model has better ability to capture features than the single-layer WNN. We further improve the defect of WNN. PSO-WNN model algorithm used global optimization to determine the initial weight threshold, and the generalization ability of the model was greatly improved. On the basis of PSO-WNN, WA transform algorithm could decompose and reconstruct the initial input information of high frequency and low frequency, which had better approximation ability when the non-stationary series fluctuated greatly. In general, WA-PSO-WNN improves the prediction accuracy of WNN by about 3-5 times, which is comparable to GBDT algorithm. Summarily, the reason for this result is that the parking availability data aggregated at the 30-min interval have data noises and more fluctuation information and WA transform algorithm could decompose and reconstruct the initial input information of high frequency and low frequency to decrease the fluctuation of data.
2) Comparisons of predictive stability among different models. The predictive stability of GBDT was excellent. Compared with WNN model, the standard deviations of prediction errors for PSO-WNN and WA-PSO-WNN model were smaller. PSO algorithm could make the prediction accuracy with the data fluctuation, improve the stability, and enhance the reliability of APS prediction.
3) Comparisons of prediction time among different models. The training time of GBDT model was much longer than neural network. The prediction time of GBDT was too long to be applicable to the short-term prediction of APS in this paper. The time consumption of WNN was shortest. PSO-WNN and WA-PSO-WNN algorithm increased the training time of the model. While the global search capability of PSO improved the prediction accuracy and stability of the model, there must be a longer training time problem comparing with WNN model.

V. THE APPLICATION OF APS PREDICTION
APS prediction in advance is very important for active realtime parking guidance and management.
1) Reduce the behavior of cruising for parking When drivers arrive at the destinations, they may find parking spaces near the destinations at a low speed. If they do not clear about the occupation of parking area, the drivers may fall into the behavior of cruising for parking. And it also leads to traffic congestion, energy consumption and environmental burden. If we evaluate the parking space in advance and make a short-term prediction of APS near the destinations, it would greatly improve the parking efficiency of the parkers, and provide parking guidance for the parkers to choose the optimal parking space.
2) Reduce the occurrence of illegal parking Illegal parking behavior is one of the major factors affecting traffic operation and congestion. It is also serious parking management problem. Short-term prediction of parking space provides the vacant parking space in a timely manner for the parkers, so that they can understand the parking space status in the parking area in advance and make a correct parking planning.
3) Dynamic parking rates Dynamic parking rates could be applied to control parking demand in each parking area when the short-time APS could be predicted accurately. It was found in previous studies that the 15-minute interval was more stable in terms of the APS time series. The research interval of time series in this paper is just like this. Therefore, 15 minutes is taken as a time period. Counting the number of APS, parking rates and other data in this time period can provide a basis data for price adjustment in the next 15 minutes. Managers adjust parking rates by forecasting the future APS. When parking occupancy rate is higher, the parking fee increases, and when the parking occupancy rate is low, the parking fee decreases. The change of parking rates is one of the main factors influencing the parking choice of parker. As a result, it can keep the balance between supply and demand of parking areas, not only realize the effective utilization of idle parking space, but also relieve the tense situation of high saturation parking areas.

VI. CONCLUSION
The short-term forecasting methods of available parking space were proposed by the GBDT model and WA-PSO-WNN combined model in this paper. The prediction time of GBDT was too long to be applicable to the short-term prediction of APS in this paper. So the WNN model is further improved by Wavelet transform and PSO algorithm. PSO algorithm can find the global optimal initial weight threshold, and improve the robustness of this model greatly. On the basis of wavelet analysis theory, the time series was divided into low-frequency part and high-frequency part by wavelet decomposition and reconstruction, and the interference signal in the time series was shielded in the highfrequency signal, which provided a powerful condition for the model to achieve high precision. The establishment of the combined wavelet neural network model for particle swarm optimization and wavelet transform (WA-PSO-WNN) finally accomplished the short-term prediction of APS. The results showed that, comparing with GBDT, wavelet neural network (WNN), particle swarm optimization wavelet neural network (PSO-WNN), the algorithm of combined wavelet neural network model for particle swarm optimization and wavelet transform (WA-PSO-WNN) greatly improved the prediction stability and accuracy. However, there are limitations of PSO-WNN and WA-PSOWNN algorithms. Both PSO-WNN and WA-PSOWNN algorithms suffered from the unsatisfied time consumption. PSO-WNN used PSO to help WNN find the optimal initial parameters, but the interference caused by the original sequence stability did not improve. The WA-PSO-WNN algorithm applied the WA decomposition to optimize the input information of the network. Although this the original sequence stability could be improved by the WA-PSO-WNN algorithm, the time consumption got longer. Furthermore, the application prospects of short-term APS forecasting were also discussed in reducing cruising parking behavior, reducing illegal parking behavior and adjusting dynamic parking rates to verify the importance of APS shorttime forecasting.
JINFEN WANG was born in Shaoxing, Zhejiang, China, in 1995. She received the bachelor's degree in traffic engineering from the Ningbo University of Engineering, in 2018. Since 2018, she has been in shipping and ocean engineering with Ningbo University. Her research interests include parking lot planning and management, advanced traffic information systems, intelligent traffic systems, and data processing.
TAO WANG received the bachelor's degree in traffic engineering from the Guilin University of Electronic Science and Technology, in 2007, and the master's degree in traffic planning and management and the Ph.D. degree in traffic engineering from Southeast University, in 2010 and 2017, respectively.
From 2010 to 2019, he was a Teacher with the School of Architecture and Transportation Engineering, Guilin University of Electronic Science and Technology. His research interests include traffic behavior and safety, urban traffic planning and design, traffic data analysis, and intelligent transportation.
Dr. Wang received awards and honors, include the Guangxi Science and Technology Progress Award, 1000 young and middle-aged backbone teachers in Guangxi colleges and universities, and so on.
XINGCHEN YAN received the bachelor's degree in transportation from Nanjing Forestry University, in 2008, the master's degree in transport planning and management from Southeast University, in 2009, and the Ph.D. degree in forestry economic management from Nanjing Forestry University, in 2012.
From 2004 to 2019, he was a Teacher with the College of Mechanical and Electrical Engineering and the College of Automobile and Transportation Engineering, Nanjing Forestry University. His research interests include transportation planning, intelligent transportation, and complex network modeling.
QIMING YE was born in Ningbo, Zhejiang, China, in 1996. He received the bachelor's degree in traffic engineering from the Ningbo Institute of Engineering, in 2019. He is currently pursuing the master's degree with Ningbo University. His research interests include research the demand of network car hailing, large-data technology, and so on.
JUN CHEN received the bachelor's degree in electronics and mechanics from the Xi'an University of Electronic Science and Technology, in 1995, and the master's degree in automatic control theory and application and the Ph.D. degree in transportation planning and management from Southeast University, in 1998 and 2000, respectively.
From 2000 to 2019, he was a Teacher with the School of Communications, Southeast University. His research interests include urban comprehensive transportation planning and management, urban parking facilities planning and management, urban public transport planning, intelligent transportation management and control, and so on.
Dr. Chen received awards and honors, include the Ministry of Education's Supporting Plan for Excellent Talents in the New Century, the Ministry of Transportation's Young Transportation Science and Technology Talents title, and the Jiangsu's 333 Talents Project Supporting Plan.