Traffic State Estimation of Bus Line With Sparse Sampled Data

The traffic state of the bus line is the information basis for the bus company to make bus dispatch and travel time prediction. However, the bus GPS data is severely sparse in time and space coverage of traffic state, due to the long data sampling time interval and low bus departure frequency. Because of ignoring the severe sparseness of the bus data, the existing traffic state methods cannot reconstruct the traffic state accurately. To deal with this problem, a new traffic state estimation method for the bus line, named GAN_BS, is proposed. First, an improved generative adversarial network (GAN-I) is used to generate reasonable bus data. GAN-I aims to find the probability space of the data distribution under sparse sampling. And to reduce the size of the latent space of data, the traffic knowledge is introduced as prior information layers. Then, a traffic adaptive bilateral smoothing method (BS) is used to map discrete bus data into the continuous traffic state. The BS convolves data with a bilateral kernel, which multiplies the local action kernel with a mask of traffic state similarity. Therefore, the BS can maintain transitions between different traffic patterns while separating noise from traffic state. Finally, a set of numerical experiments are performed on the real bus data set in Changchun. The results show that the GAN-I can accurately reproduce the traffic state when the missing rate of data exceeds 50%. And the BS can eliminate the noise better compared with other methods.


I. INTRODUCTION
Reasonable real-time bus dispatch and reliable bus travel time prediction are important means to improve the passenger travel experience. And the traffic state of bus lines is the calculation basis for these tasks. However, due to the low frequency of bus departure and data upload, the sampling rate of bus GPS data is lower than the social vehicles. The traffic state sampled from the bus data will be missing in a large area. It is difficult to infer the real traffic state from the incomplete bus data. Estimating the traffic state of bus lines is a challenging task for bus companies.
In general, the bus departure interval is relatively large (15 minutes in normal situations) and the GPS data sampling interval is generally about 30 seconds. This makes the missing rate of bus data in the time and space more than 50%. Using the extremely sparse data to estimate the traffic state is more uncertain than using the data with a low missing rate.
The associate editor coordinating the review of this manuscript and approving it for publication was Mohamad Afendee Mohamed .
As shown in Fig. 1(a), the social vehicles (driving vehicles other than buses, such as private cars, taxis, etc.) on the road are denser than buses. They can provide more samples than buses can. It is easy to reconstruct the traffic state when the traffic state is effectively sampled by a valid number of points. However, as shown in Fig. 1(b), there are only two buses on the road. The buses have sampled two traffic state patterns respectively. The distance between the sampling locations is large. It is difficult to infer the transition position of two traffic patterns. The goal of this paper is to generate reasonable bus data at the missing location as shown by the dotted boxes in Fig. 1(c). Then, the continuous and smooth traffic state can be reconstructed based on all the bus data shown in Fig. 1(c).
Many methods have been presented to estimate the traffic state (e.g. Kriging [1], ASM [2], GAN [3], etc.). One of the popular filtering methods is ASM, which can eliminate the noise in the data and reconstruct the smooth traffic state. However, ASM requires complete data to compute. Otherwise, it has to borrow irrelevant data from remote locations. Therefore, we need to impute the sparse bus data in advance. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ There are several statistical learning methods that have shown a good ability to deal with low-frequency missing data, such as KNN [4], tensor decomposition [5], LSTM [6]. But when large areas of data are missing, the size of the possibility space of the unobserved location is large. These methods cannot find the optimal solution because of their limited generalization ability. Recently, GAN [7]- [9] has been widely used in the traffic imputation field. Due to the introduction of the discriminator, GAN has an outstanding ability to generate the data as real as possible. However, to the best of the authors' knowledge, GAN-based studies that considered the high missing rate of bus data are very limited. Generating the complete bus data for traffic state estimation is still a problem to be solved.
To deal with the problem, a new traffic state estimating method for the bus line, named GAN_BS, is proposed in this paper. This model includes two parts: data generation and traffic state reconstruction. In the data generation part, a generative adversarial network is used to generate dense data under severely sparse sampling. This network learns the traffic states probability distribution to infer the real data. To reduce the uncertainty of distribution fitting, a conditional reasoning layer packed with traffic knowledge is introduced. In the traffic state reconstruction part, a bilateral smoothing method is proposed to convert discrete bus data points into the traffic state of the bus line. This method employs the traffic-adaptive convolution to eliminate the noise of the bus data via a localized kernel. To prevent the incorrect estimation of the edge of the traffic pattern, the smoothing method introduces an additional kernel, which limits the action scope of the localized kernel when the traffic state changes.
The main contributions of this paper can be summarized as follows: 1) An improved generative adversarial network is proposed to impute the sparse bus GPS data. This deep learning network can learn the spatial-temporal probability distribution of traffic state from historical data. And the prior knowledge of traffic flow is introduced to make the generated data of the network can conform to the actual traffic state. The valid data generated by the model can provide a data basis for traffic state estimation. 2) A traffic bilateral adaptive smoothing method is introduced to reconstruct the continuous traffic state from discrete bus data. This method considers the influence of intersections and bus stops on the traffic state and designs a bilateral kernel. This kernel can not only eliminate noise caused by driving characteristics bus also retains the edge of traffic state switching. The model can effectively express the traffic state of the bus line and provide information for bus management. 3) Several popular traffic state estimation methods are evaluated on real-world data with a missing rate of more than 50%. The comparison results provide a comprehensive reference for related research. 4) Although there are many traffic state estimation studies based on multi-source data, data sharing has not been completely realized in actual operations. GAN_BS can directly serve bus companies that only have a single data source. The rest of this paper is organized as follows. Section II reviews the studies on traffic state estimation. Section III presents a traffic state estimation method for bus lines under the condition that the bus GPS data are pathologically sparse sampled. Section IV discusses the experimental results. Finally, Section V concludes the paper.

II. LITERATURE REVIEW
As discussed earlier, traffic state estimation is vital for both traffic managers and passengers. And to estimate the traffic state of bus lines, firstly, we need to generate dense sampling points from sparse data. Then the continuous traffic state is restored from discrete data points. Therefore, this part will discuss related research from two parts: traffic state estimation and missing data imputation.

A. TRAFFIC STATE ESTIMATION
There are many studies to estimate the traffic state of the bus line based on different state metrics (such as speed, travel time, etc.). From the perspective of model types, these methods can be divided into two categories: regression methods and filtering methods.
The regression methods [10], [11] mainly estimate the traffic state of the bus line by fitting the functional relationship between some traffic variables and the state variables. For example, Yu et al. [12] proposed the relevance vector machine to estimate the probabilistic bus headway by considering the travel time of several vehicles ahead and other factors. Yu et al. [13] used the support vector machine to learn the relationship between the estimated bus traffic state and the travel time of the former bus. These methods assume that the traffic state detected by the former bus is similar to the current one. This assumption is real in the case of data-intensive sampling. But it may lead to wrong estimates when the bus departure interval is large.
The filtering methods directly process the sampled traffic state data into continuous traffic state data by considering traffic dynamics. For example, Chen et al. [14] proposed a self-adaptive exponential smoothing method based on the Kalman Filter to predict the link bus travel time. And some studies [15]- [17] improved the accuracy of the Kalman Filter by introducing traffic flow theory. For example, Duert and Yuan [18] derived the traffic control law from the Lighthill-Whitham-Richards model to estimate the traffic state on the road network. These methods estimated the traffic parameters along with the trend of the time series of the traffic state. Therefore, they are often used when the series has a single point missing. Some researchers have extended the traffic state estimation to consider the spatial-temporal correlation, which improves the robustness of the methods to the lack of more data. One of the popular methods is the adaptive smoothing method (ASM) [2]. It is based on the traffic phenomenon that the free flow wave propagates downstream and the congested flow wave propagates upstream. The kernel function of this filtering method [19] is set to capture the spatial-temporal range of traffic patterns. And F. Rampe et al. [20] extended the method to the field of traffic state estimation using floating car data and named this method ars. The ARS is local low-pass filtering, which can eliminate the noise in the data. But when the data in the local computing domain is insufficient, ARS needs to borrow data with less correlation in the distance to estimate. Overall, the existing methods for traffic state estimation are based on data with a low missing rate. For highly sparse data (such as bus GPS data), it is inappropriate to use existing methods estimate the traffic state.

B. MISSING DATA IMPUTATION
Researchers have proposed various methods for missing data imputation and we discuss these methods from two parts: prediction methods and interpolation methods.
The prediction methods are usually filled in the missing data according to the relationship of the time series or spatial series of traffic state. Historical average [21], ARIMA [33], LSTM [6], [23]- [25], and other methods have been proved to be effective in predicting the data based on the developing trend of traffic state. And the trend of traffic state is analyzed by using the observations from the previous steps. However, there is often no observed samples in the bus line because the interval of bus departure is large. The bus data may be missing in large blocks. Although deep learning has shown great performance in dealing with the problem of missing data at stationary points for fixed detectors, these methods have poor resistance to the sparseness of bus data.
The interpolation methods regress the model of traffic state relationships by analyzing the spatial-temporal distribution of the traffic. There are several common interpolation methods for estimating traffic state such as Probabilistic principal component analysis (PPCA) [26], [27], tensor decomposition [28]- [31], Convolutional neural network (CNN) [32], [33], auto-encoders [34]- [36], Fuzzy neural network [37], Random forest [38]. Li et al. [26] employed the PPCA to estimate the traffic state by extracting the periodic spatial-temporal dependencies in traffic flow. Chen et al. [5] used the Bayesian probabilistic matrix factorization to derive missing data based on the similarity of spatial-temporal traffic states. Li et al. [36] used two parallel auto-encoders to capture the spatial-temporal dependencies of the traffic state. These methods can deal with the problem of lost data with a low missing rate. This problem can be viewed from a probabilistic perspective. And these deep learning methods calculate the speed v t at the time t by calculating the conditional probability p v x,t |v m , where v m is the neighboring point in time and space. These methods select the most likely value in the probability space composed of all v n and v m . When the number of observations in the field is too small, the uncertainty of the data increases. It becomes harder to estimate missing data accurately.
Recently, generative adversarial networks (GAN) have gained increasing interest in traffic imputation because of its excellent capability in generating data [3], [7], [8], [39]. GAN can calculate the joint distribution p(v x,t , v m ) based on sampling points of surrounding locations without an explicit sequence. And the introduction of the discriminator has improved the network's ability to generate data when the data are severely sparse. For example, Xu et al. [9] used DEEPWALK technology to embed the road network structure in GAN and carried out the traffic state estimation based on the correlation between links. Liang et al. [40] used the LSTM as the network layer of GAN to capture the correlation of traffic state in space and time to estimate the traffic flow and density. However, GAN cannot eliminate the interference of noise in the data and directly give a continuous traffic state.
As mentioned above, the bus data are sparser than social vehicle data. Most of the existing methods are difficult to estimate the traffic state when the data is missing in blocks. Some advanced methods, such as GAN, can generate data under sparse sampling, but it is difficult to generate the continuous traffic state at the same time. To solve the above problem, a GAN_BS model, which can reconstruct a smooth and accurate traffic state, is proposed in this study.

III. METHOD
This section aims to propose a traffic state estimate method for the bus line, which uses the sparse and noisy bus GPS data. To achieve this task, first, the generative adversarial network VOLUME 8, 2020 is introduced to impute the missing bus data. Then, a bilateral smoothing method is proposed to construct the continuous traffic state from the discrete bus data.

A. THE BUS DATA GENERATION BASED ON THE GENERATIVE ADVERSARIAL NETWORK 1) GRID PROCESSING OF BUS LINES
To make the calculation easy, the bus line is gridded, and the corresponding traffic state is recorded as a two-dimensional matrix in this paper. The trajectory of a bus can be written as a function x bus (t), and the bus speed v(t) is the derivative of x. We assume that the bus starts from point (x, t) and travels through a road section of length x in the time interval t. And the traffic state in the space-time domain [x + x, t + t] has been sampled as the bus speed v(t).
Considering that the space domain occupied by a bus includes the length of the bus and the saturation space headway, the sum x c of these two factors is used to modify the spacetime domain of traffic state collection. Therefore, the Mask function M marks whether the traffic state information is observed at point (x, t) and can be expressed as As shown in Fig. 2, the input of GAN includes the mask matrix, noise matrix, and speed matrix. Accordingly, when the M (x, t) is 1, the speedṽ(x, t) is equal to the average bus speed, and when the

2) GENERATIVE ADVERSARIAL NETWORK WITH TRAFFIC PRIOR KNOWLEDGE
Due to the large interval of bus departure and GPS data collection, the missing rate of bus data is high. It requires that the data imputation algorithm can fit the distribution of traffic state without dense observations. In this part, an improved generative adversarial network (GAN-I) is proposed to impute the bus data with a high missing rate. Fig. 2 shows the overall network architecture included two parts, a generator G, and a discriminator D. The generator G attempts to estimate the bus datav that match the real traffic state as much as possible. And the discriminator D aims to find the fake ones ofv by estimating the MaskM . Writinĝ M = D v , the objective of GAN is the minimax problem given by The process of solving (3) is similar to the game between G and D. The D is optimized to distinguish the estimated data by maxing the (3). The G accepts feedback from the D to improve its generation ability. And the performance of G is measured by (3). When the result of (3) is reduced, it indicates that the generated data is closer to truth and confuses the D. This interaction makes the generation of GAN is better than ordinary deep learning.
As shown in Fig. 2, the generator G regards the bus data as the sampling points of the traffic state spatial-temporal distribution P (v,ṽ). Our model is to learn the continuous traffic state distribution from these sparse sampling points to produce the matrixv. Before learning the patterns of the traffic state, we introduce three prior knowledge of traffic flow to guide the network to impute the bus data. In this section, these traffic laws are packaged as a conditional reasoning layer whereṽ is the input data, k is the prior knowledge set of traffic,v is the reasoning output and f is the default activation function.
The application of three traffic prior information in GAN-I is shown below.
Prior I: There are fewer cars on the road in the early morning or late at night. But it is easy to know that the traffic state is in a free flow state. This condition can be expressed as where t is the moment of the traffic state to be estimated, t mor and t eve are the critical values triggered by the Prior I, respectively representing the end time and the start time of the free state from midnight to early morning,ṽ is the speed to be estimated later and v free is the speed in free flow. Prior II: The gradually changing traffic state on urban roads may be interrupted at the intersections and the bus stops due to signal control and passenger boarding and alighting, respectively. We regard the road intersections and the bus stops as the demarcation points of the traffic state and divide the bus line into several links. A step function S is used as a coefficient of activation function in the network. The S controls each neuron as a valve to ensure that each neuron only processes the link data of its corresponding position. And the S can be expressed as where x is the position of the traffic state to be estimated, where x pre and x next are the locations of the upstream intersection (or bus stop) and the downstream intersection (or bus stop) closest to x. Prior III: Some areas adopt bus signal priority control to reduce bus delays at intersections. We assume that the traffic of the bus line under the signal absolute priority control keeps flowing. Therefore, a binary variable F is used to mark whether the intersection performs signal absolute priority. When F = 1(there is bus priority at the intersection), the traffic state within l meters upstream of the intersection can be imputed in advance. And the traffic state can be estimated as where k is the number of sampled points in the space-time domain [x − l, x] meters × [t − h, t] minutes and the k cre is the confidence value of the k. Data other than the above cases need to be imputed by the generator. To learn the traffic state spatial-temporal distribution, we reshape the two-dimensional input matrix into one-dimensional as GAN-I's input. And the shared multi-layer perceptron (MLP) is used in the generator to find the traffic correlations in space and time. The form of this hidden layer isv where w and b are weights and biases, the f sig is the activation function which is considered as the logistic sigmoid function in this paper.
Then, a batch norm layer is used to normalize the result v and improve the imputed accuracy of the algorithm. The normalized speed can be calculated as where σ norm and µ are the standard deviation and average ofv, respectively. ε is a hyper-parameter. γ and β are the parameters to be learned. The generator consists of several hidden layers mentioned above. The output of each layer is utilized as the input of the next layer. And the last layer maps the traffic feature extracted by the previous layers to the final imputed bus speedv.
As in the GAN framework, discriminator D is used as an adversary to train generator. As an independent network, the discriminator is trained separately. As shown in Fig. 2, the discriminator receives the imputation results of the generator and estimates the MaskM . Under the supervision of the real Mask M , the discriminator learns the ability to distinguish the observed and estimated data. And the discrimination results will in turn prompt the generator to optimize the generation performance in the next step. To ensure that the discriminator works well, the discriminator first marks the positions imputed by the prior conditions. Then the discriminator uses MLPs to find other estimated positions.

3) PARAMETER OPTIMIZATION
The objective function in GAN includes generator loss and discriminator loss. The generator loss includes the loss for the reconstructed data and the imputed data according to different goals. We use the mean squared error (MSE) to make the reconstructed data as close to the observed data as possible. And the imputation loss receives feedback from the discriminator to ensure that the imputed data can fool the discriminator. Overall, we train the generator end-to-end by VOLUME 8, 2020 minimizing the L G (11) where v i andv i represent the observed speed value and estimated one, h is the total number of the training data, α is a hyper-parameter, m i andm i represent the observed mask value and estimated one.
The discriminator identifies whether the data in the matrix output by the generator is true or false as a classification problem. Therefore, the discriminator loss L D is defined as the cross-entropy loss between the estimated mask and the real mask In GAN, the generator and discriminator are optimized alternatively. Since the learning rate of the commonly used stochastic gradient descent (SGD) method is fixed, using SGD may fall into a local sub-optimal solution. Recently, many studies [41]- [43] have used Adam to find optimal solutions and verified its effectiveness through numerical examples. Therefore, we use Adam [44] instead of SGD to optimize the network. The training process is demonstrated in Algorithm 1. During the testing, GAN generates the bus data covering the entire bus line when the bus uploads real-time detection data.

B. THE TRAFFIC STATE RECONSTRUCTION BASED ON THE ADAPTIVE BILATERAL SMOOTHING METHOD
In this part, we want to average the speed of several sample buses and get the smoothing traffic state at location x 0 . As shown in Fig. 3, there is a set of buses that depart at a specified time interval. These buses collect the speed of location x 0 at different times, and the obtained speed curve of the traffic state is fluctuating. However, the final traffic state estimated result is supposed to vary gradually (like the dotted line in Fig. 3 (b)). Because the GAN used above cannot guarantee that the generated traffic state is smooth, a traffic state smoothing method is needed to eliminate noise and reconstruct the traffic state of the bus line accurately.
The traffic adaptive bilateral smoothing method aims to reconstruct the smooth traffic state by performing a spatial-temporal collaborative filter on sparse and noisy traffic data. It employs the convolution to derive a continuous speed field via a localized kernel. This smoothing process is expressed as where the kernel φ (x, t) determines the correlation of the traffic state in time and space. And the kernel function designed in this paper takes into account the locality and similarity of the traffic state.  16: end for 17: End while The locality of the traffic state means that the closer the distance in the time and space, the more similar the traffic state. And a bivariate Gaussian function is selected as the local smoothing kernel in this paper, which is supposed to increase when the distance between target location (x, t) and neighboring location (x i , t i ) decreases. To adapt the traffic dynamics, the BS skews this isotropic kernel by introducing the characteristic wave speed parameters. The characteristic wave speed means the slope of the line between the two traffic state points in the flow-density curve and represents the propagation speed of the traffic state. The inclined Gaussian kernel is formed as the second term on the right side of (14) and shown as the kernel of domain locality in Fig. 4. The action scope of the inclined kernel can approximate the spatial-temporal propagation plane of the traffic state.
It is worth noting that the local smoothing's assumption of slow variations in traffic state may fail at edges of traffic pattern transition. There is not a clear separation of different traffic states in the results of local smoothing, and some features such as the boundaries of the neighboring free flow wave may shrink. Therefore, the traffic state similarity is also considered in the kernel to maintain the dividing line between different traffic state patterns. The idea of the traffic state smoothing method based on the similarity is to aggregate the bus speed data with weights that decline with dissimilarity in a traffic state. The similarity kernel is formed as the third term on the right side of (14) and shown as the kernel of traffic similarity in Fig. 4. However, domain locality is still a necessary concept. Only using the state similarity to smooth the traffic state make no sense because the speed far away from point (x, t) should not affect the value at (x, t). The appropriate solution is to combine the traffic patterns similarity and domain locality. In the flat area of the traffic state where the speed changes a few, the corresponding weight of the traffic state similarity tends to be the same. In this situation, the domain weight plays a major role, which is equivalent to perform the Gaussian filtering in this area. In the edge area of the traffic pattern, the speed changes rapidly. The difference in the similarity kernel in this area becomes larger. Therefore, the edge information can be maintained.
The typical switching edges of traffic state patterns in the bus lines are the intersections and bus stops that the bus lines pass. And the impact of this road structure on the traffic state can be used as prior knowledge to be introduced into the smoothing method. Therefore, the step function (14) shown in Fig. 4 is introduced to limit the aggregated range of bus speed not to exceed the range of the homogeneous state. Finally, the bilateral kernel is formed as where the constants σ x and σ t decide the width of the spatial-temporal action scope of the kernel, the constants σ s decide the range scope of speed difference. The traffic state is simply divided into two patterns: the free flow V free and congested flow V cong . And the traffic state in any patterns can be seen as a superposition of two speed fields via a convex combination as follows: where weight w (x, t) dynamically controls the superposition ratio of the two speed fields according to the detection information. In this paper, w (x, t) is a smooth sigmoid function which can ensure that w (x, t) is 1 in the congested state and w (x, t) is 0 in the free flow state.
where V c is threshold between the free flow and congested flow and V is the transition width. The speed fields in two traffic patterns with corresponding characteristic wave speed are constructed by where c free and c cong are the characteristic wave speed in free-flow traffic and congested traffic, respectively.

IV. EXPERIMENTS AND RESULTS
In this section, we discuss the performance of GAN_BS. And we use the real-world bus GPS data to compare the accuracy of several estimate models under the different traffic state scenarios. This section contains four parts: dataset introduction, evaluation index for the performance of traffic state estimation, comparison methods parameter setting, and methods performance analysis.

A. TEST DATA SET
The numerical examples perform on the bus GPS data collected from the Public Transport Corporation in Changchun, China. To verify the reliability of GAN_BS for different road environments, we used two bus routes, Line 6 (the exclusive bus lane) and Line 13 (the non-exclusive bus lane). The data missing rate of the two lines is about 50%. So they can be used to test the model's ability to process data sparsity. Moreover, these two lines are the main road of Changchun City, and their traffic conditions are changeable. So they can be used to verify the adaptability of the model to different traffic patterns. At the same time, we also use GPS data from other lines that overlap with the selected lines. The experiment uses two sections taken from two bus lines. And the detailed information of the sections is shown in Table 1.
The geographies of the lines are shown in Fig. 5. These GPS trajectory data were collected from October 6, 2017, to January 16, 2018, with an updating frequency of 30 seconds from 06:30 to 19:00. It is worth noting that how to set the bus GPS data aggregation scale in time and space. If the aggregation scale is too large, there is a risk of averaging different traffic states. But, if the scale is too small, it will increase the computational burden. By referencing research [2] and [29], these GPS data are aggregated as spatial-temporal matrix data with a 5-minutes time interval and a 20-meters space interval. We take the data from the first 70 days as our training set and the rest as the test set. In this case, the missing data rate of the prepared dataset is around 50%. To verify the ability of GAN_BS to process severely sparse data, we constructed a test set by randomly removing 20% of the original data. The performance of the model can be evaluated by comparing the removed data and estimated data of the corresponding location.

B. MODEL SETTINGS AND INDEX OF PERFORMANCE
The model set contains two parts: the parameters of the GAN and the parameters of the BS. And the two test data sets use the same parameters of these two parts.
The parameters of the GAN are set as follows: the generator and the discriminator both have five hidden layers. These hidden layers are the fully connected layers and use the sigmoid function as their activation function. The number of hidden units in the network layer will affect the accuracy of estimation. We calculated the error of the estimation result when the number of units in each layer ranges from 250 to 900 (step size is 50). Finally, we found that the performance of GAN-I is best when the numbers of units are taken as 750,500,300,750, space number × time number, respectively.
The parameters of the bilateral smoothing method are also determined by the traversal method. They are set as follows: the characteristic propagation velocity under free traffic flow c free is +30km/h and the characteristic propagation velocity under congestion traffic flow c cong is -15km/h. These two parameters are set for reference from the literature [2], [19]. The parameters of the action scope of the kernel σ x , σ t , σ s are 500 meters, 10 minutes, 10 km/h, respectively.
To evaluate the effectiveness of the proposed model, we use three performance measures, which are the mean absolute error (MAE), the normalized mean square error (NMSE), and the root mean square error (RMSE). To measure the error of the overall traffic state estimation, MAE and RMSE are usually used in similar studies. However, these two indicators cannot reflect the ratio of the error to the observed value. We have introduced a relative error indicator, NMSE, which can measure the estimated performance under different traffic patterns. These three indicators are defined as follows where n is the total number of the testing data, v i andv i represent the observed speed value and estimated one.

C. COMPARATIVE EXPERIMENTS
The comparative experiments in this paper include three parts. First, we conduct an ablation experiment to verify the effectiveness of each component in GAN_BS. Second, the performance of GAN_BS is tested by comparing it with some popular estimation methods. Last, we compare the accuracy of the methods in different periods to see its adaptability to different traffic patterns.

1) COMPARED UNDER ABLATION STUDY
To evaluate the effect of two components in the method proposed by this paper, we use the GAN-I and the bilateral smoothing method separately to estimate the traffic state and compare the results with the full pipeline method. To verify the performance of the methods using data with a high missing rate, we set the scale of GPS data aggregation in time to 1 minute, 2 minutes, 3 minutes, 5 minutes, and 10 minutes, respectively. And the data missing rate corresponding to the input matrix under these different time aggregation scales is about 80%, 70%, 60%, 50%, 45%, respectively.

2) COMPARED WITH OTHER METHODS
In this part, the proposed method compares with other classic and advanced algorithms mentioned above under the data with different degrees of sparsity. The compared traffic state estimation algorithms are as follows: (1) Co-kriging: This method proposed by Bae et al. [1] uses the uniformity of the traffic state in the time-space domain to regress the traffic state random field. We use the Gaussian model as the variogram of this method and set the minimum estimate of error n = 0.01, the maximum dissimilarity s = 20, and the distance r = 10.
(2) KNN: This algorithm adopted by Tak et al. [4] searches for the first k numbers of historical data that are most similar to the divided section to estimate its traffic states. We set K to 10, which gets the best accuracy among values of k from 5 to 20. (3) TAS-LR: This method has been used for traffic state estimation in [45], which aims to explore the spatial-temporal relationship of the traffic state through low-rank decomposition. In this study, the latent rank r = 10, the number of neighbors k = 10, the parameters λ 1 , λ 2 , λ 3 , λ 4 are set as 0.5, 10, 5, and 5, respectively. (4) BGCP: This algorithm proposed by [5] extends the Bayesian probability decomposition model to the imputation problem of the high-order tensor of traffic state. We use the third-order tensor (space number × day × time number) as the input of this method. (5) PD-GAN: This traffic state estimation method proposed by [7] uses parallel data to be a temporal hint for GAN. We use three convolutional layers as hidden layers in the generator and discriminator. The filter size of the convolutional layers is 3 × 3, and the number of kernels in each hidden layer is 150. All the methods use the spatial-temporal matrix of speed as input so that they can consider the time and space relationship of the traffic state at the same time. And each model for comparison has been carefully tuned.

3) COMPARED UNDER DIFFERENT TRAFFIC PATTERNS
To evaluate the traffic state estimation performance under the different traffic state conditions, we have selected three time zones with different traffic patterns of 06:30, 09:30, and 18:30 to analyze the performance.

1) PERFORMANCE COMPARISON UNDER ABLATION STUDY
The calculated error indexes of the ablation study are shown in Table 2 and Table 3. With the aggregated interval changing from 1 minute to 10 minutes, the combination of the GAN-I and ARS method improves the accuracy of the two separate   methods by nearly 20% in most cases. The difference in the results of the three methods can be intuitively seen from Fig. 6 and Fig. 7. The traffic state generated by ARS is messy because it is a local filter and must borrow data from unrelated locations in the distance when the entire block of data is missing. The GAN-I can restore the distribution characteristics of traffic state in time and space. But it cannot avoid the interference of noise, and its generated state is  fragmented and discontinuous. But the full pipeline method can generate a continuous state from the noisy data because of the ability to fill reasonable data and eliminate noise. Besides, we compared the error indicators of GAN_BS in Table 2 and Table 3 and found that the traffic state estimation of Line 13 and Line 6 are both accurate. This model can estimate the traffic state of exclusive bus lanes and non-exclusive bus lanes well.
Furthermore, to observe the performance of GAN_BS in estimating traffic state and distinguishing the edge of traffic patterns, we have chosen the traffic state of the two bus lines at three different times and compared them with the estimated speed (as shown in Fig. 6 (1) (2) (3) and Fig. 7 (1) (2) (3)). GAN_BS can reconstruct the traffic state from sparse data points. Because of the smoothing method, it tends to ignore the interference of noise and generate a continuous state.

2) COMPARISON RESULTS WITH OTHER METHODS
It can be observed from Table 2 and Table 3 that GAN_BS achieves better performance than the compared methods in terms of all evaluation metrics. More specifically, we can see the estimated results of each method under the data missing rate of 50% on January 9th from Fig. 8 and Fig. 9. For the Co-kriging interpolation method, it tends to fit a smooth state VOLUME 8, 2020  surface while ignoring the intermittent effects of bottlenecks such as road intersections and bus stops. The result of KNN is similar to the output of GAN. It is fragmented because the KNN looks for similar values in the historical data to fill in, and the original data set is noisy. On average GAN_BS have relatively 1.90 lower RMSE, 2.57 lower MAE, 0.08 lower NMSE than the two tensor decomposition methods, including TAS_LR and BGCP in two test data sets. It is easy to see that the tensor decomposition method can eliminate noise while filling the traffic state. But it is easy to overestimate the congestion range at the temporal-spatial location (16 : 15, 3000km) of Line 13. Last, we use the PD-GAN test whether the convolutional network can filter noise better than the fully connected layer. The result shows that PD-GAN still cannot directly produce a smoothing traffic state of the bus lane that conforms to the traffic pattern.
Moreover, as the aggregation scale of data in time shrinks, the missing rate of data increases, and the calculation accuracy of GAN_BS is generally declining. The acceptable accuracy means that GAN_BS is effective in estimating the traffic state when the missing rate of the data ranges from 45% to 80%. However, the accuracy of Co-kriging and TAS_LR on Line 6 decreases with the increase of the aggregation time scale. It may because the boundaries between different states are getting closer as the aggregation scale increases. And these two methods are less capable of distinguishing the boundaries between different traffic patterns. On the Line 13 test set, the accuracy of estimation methods is worse than other cases when the aggregation scale is 3 minutes and 10 minutes. During the experiment, it is found that the noise interference is the most obvious in the original data when the aggregation scale is 3 minutes. Therefore, the performance of all test methods has declined. However, the methods with noise immunity, such as SVD, BGCP, and GAN_BS perform relatively better. And the assumption that the traffic state remains stable within 10 minutes is weak for the bus lane of Line 13. Therefore, the test methods will have some distortions in restoring the traffic state patterns.

3) PERFORMANCE COMPARISON UNDER DIFFERENT TRAFFIC PATTERNS
The observed and estimated values of the corresponding traffic state are shown in Table 4 and Table 5. The accuracy of these methods under free-flow conditions is better than under congestion conditions. And we can see that GAN_BS can output accurate estimation results under different traffic conditions. Fig. 10 compares the estimated speed from GAN_BS and observed speed from the bus as scatter diagrams. Generally speaking, the estimated speed is likely to be higher than the observed speed in the congested state with speed slower than 20km/h. In the free flow state, the estimated speed is lower than the observed speed, and the error distribution is more scattered.

V. CONCLUSION
The present study proposed a method named GAN_BS for estimating the traffic state of the bus line. This method first proposes an improved generative adversarial network to generate the bus data under sparse sampling. Then, a traffic adaptive bilateral smoothing method is proposed to reconstruct the accurate traffic state pattern from discrete speed. It can be seen through a series of numerical experiments that GAN_BS can generate accurate traffic state under sparse sampling. In particular, it outperformed other traffic estimation methods in terms of noise elimination and traffic pattern boundary maintenance. GAN_BS also behaved quite robustly with respect to the sparse data whose missing rate ranges from 45% to 80%. GAN_BS can not only impute the sparse data, but also accurately estimate the traffic state. However, parameters such as characteristic wave speed in BS are fixed. Future work should improve the model in dynamically adjusting parameters to adapt to different traffic patterns.