Research on Lane Occupancy Rate Forecasting Based on the Capsule Network

This paper proposes a hybrid lane occupancy rate prediction model called 2LayersCapsNet, which combines the improved capsule network and convolutional neural networks (CNNs). The model uses CNNs to mine the spatial-temporal correlation characteristics of the lane occupancy rate and then uses an improved capsule network to mine the interrelationships in traffic data measured by sensors in continuous time intervals to predict the lane occupancy rate. The model can solve the problem of CNNs losing important spatiotemporal information caused by the maximum pooling operation and can obtain better prediction results. To verify the efficiency of the 2LayersCapsNet model, the model is compared with the capsule network model (CapsNet), convolutional neural networks model (CNNs), recurrent neural networks model (RNNs), long short-term memory model (LSTM) and stacked autoencoders model (SAEs) with similar network structures and parameter settings on the PEMS-SF data set. The experimental results indicate that the 2LayersCapsNet model can obtain a prediction model faster than the CapsNet model and that 2LayersCapsNet performs better than the CNNs, RNNs, LSTM and SAEs with respect to three evaluation metrics, namely, MAPE, MAE and RMSE, on four prediction tasks.


I. INTRODUCTION
Due to the growing population and substantially increased automobile usage, traffic jams and accidents caused by illegal occupation of lanes have become a major problem around the world [1]. Intelligent transportation systems (ITS) have been developed to solve this problem [2]. Lane occupancy rate forecasting, a key component of ITS [3], can provide real-time, accurate and fast traffic guidance management and traffic flow control and is also an effective way to alleviate urban traffic congestion, reduce vehicle exhaust emissions, and prevent and address traffic accidents. Therefore, lane occupancy rate forecasting has important research significance [4]. Lane occupancy rate forecasting is different from traffic flow prediction: attention is focused on traffic congestion and traffic accidents caused by lane occupancy that result in personal injury [5].
The associate editor coordinating the review of this manuscript and approving it for publication was Sabah Mohammed . The urban traffic system is open and complex, which makes lane occupancy rates subject to external factors, including weather conditions, traffic accidents, and traffic control. An analysis of a large quantity of traffic detector data has shown that the lane occupancy rate has periodicity and similarity on long time scales but is time-varying, chaotic and correlated on short time scales [6]. Based on these characteristics, experts and scholars mainly use nonlinear mapping models [7] to predict the short-term lane occupancy rate. Short-term lane occupancy rate forecasting is similar to traffic flow prediction, which has achieved rich research results in recent years. The commonly used methods include mathematical models such as historical average time series models [8] and Kalman state space filtering models [8]; and prediction methods based on nonlinear theories such as the ARIMA model [10], fuzzy logic model [11], K-nearest neighbor model [12], Bayesian network model [13], back propagation neural network model [14], radial basis function neural networks [15], wavelet theory [16], support vector machines [17], and Gaussian process methods [18].
In these forecasting methods, neural networks have become a research hotspot because of their large parallel structure and distributed storage characteristics. Neural network models, such as DBM, RBF, RNN and CNN, are used to solve forecasting problems. Because these in-depth learning models are characterized by powerful function approximation ability and pattern classification ability and good self-organization, self-adaptability and fault tolerance, they have achieved many good results. For example, Yang et al. [19] proposed the stacked autoencoder Levenberg-Marquardt model, which is a deep neural network approach that aims to improve the forecasting accuracy. Huang et al. [20] proposed a two-part deep architecture, first using an underlying deep belief network (DBN) for unsupervised feature learning and then using the multitasking regression layer on the DBN for supervisory prediction. Lv et al. [21] proposed a deep hierarchical structure model based on large flow data that uses an automatic encoder as a building block to represent traffic flow characteristics for prediction. Du et al. [22] proposed a multilayer integrated deep learning architecture that obtains long-term trend features by means of recursive neural networks (RNNs) and local trend features by means of convolutional neural networks (CNNs). Wu and Tan [23] used a one-dimensional convolutional network to capture the spatial characteristics of traffic flow, used two long short-term memory methods to mine the short-term periodicity of traffic flow and designed a deep architecture with feature-level fusion to predict short-term traffic flow. Wang et al. [24] proposed a traffic flow velocity prediction model with single hidden layer CNNs combined with error feedback. Ma et al. [25] proposed a new method of transforming traffic flow into an image. In this method, the traffic speed data of each section in each time step are expressed in three dimensions, and the space-time features of images are captured by CNNs. This method has been proven to be superior to other advanced methods. These models have achieved good results in traffic flow forecasting using a deep learning algorithm with the ability to approximate arbitrary nonlinear functions and self-learning. Some other models combine prediction models with other factors such as weather [26] and use deep neural networks to produce good results.
However, in practical applications, the existing deep learning models have two problems. On the one hand, the time characteristics of the lane occupancy rate are difficult to mine. Under special circumstances, weather conditions, traffic accidents, traffic control and other factors will have different impacts on the occupancy rate of different lanes, which makes the lane occupancy rate abnormal. However, most existing deep learning models are based on the principle of empirical risk minimization. Their structure selection lacks theoretical guidance, and it is easy to fall into local extrema. For example, in RBF neural networks, the selection of a central vector and the number of hidden layer neurons has a considerable impact on the learning and generalization ability. However, the correlation between the spatial-temporal characteristics of the lane occupancy rate are difficult to extract with existing deep learning algorithm models. In terms of time characteristics, there is a relationship between the occupancy rate of a lane and that of its adjacent lanes.Existing deep learning models typically simulate the time dimension by increasing the depth of the model. For example, a deep belief neural network (DBN) can only train on one-dimensional data, not two-dimensional space-time data, and unreasonable hyperparameter settings will negatively affect the performance of the network. In terms of spatial characteristics, the lane occupancy information states of the sections and their upstream and downstream will influence each other. However, the existing deep learning models can only identify the existence of features because the input is scalar, and the spatial features between inputs are difficult to analyze. For example, in CNNs, the maximum pooling layer loses valuable information and ignores the correlation between local and global features.
CapsNet is a neural network model proposed by Sabour et al. [27] that uses the concept of a capsule to overcome the characterization limitations of CNNs and RNNs. CapsNet uses a vector output capsule instead of the scalar output feature detector of a CNN, uses dynamic routing instead of maximum pooling, and achieves good results on MNIST data sets. Kim et al. [28] applied CapsNet to traffic flow forecasting and achieved better results than those obtained using CNNs. However, because of its dynamic routing mechanism, CapsNet is 30 times slower than CNNs. Moreover, a prediction model can only output one predicted value of a time unit. If we need to predict the lane occupancy rate for a period of time (such as n time units), n models must be trained. Therefore, from the perspective of improving the speed of lane occupancy prediction, this paper proposes a prediction model via which multiple prediction periods can be obtained simultaneously based on a capsule network.
The main contributions of the paper are the following : 1) An improved CapsNet algorithm is proposed.
2) A lane occupancy prediction model called 2LayersCap-sNet is proposed. The experimental results show that 2LayersCapsNet can obtain a prediction model with 90% accuracy within 4 minutes, while the traditional single-layer capsule network model (CapsNet) takes approximately 9 minutes: 2LayersCapsNet's convergence speed is 120% faster than that of CapsNet.

3) The experimental results of four prediction tasks
show that 2LayersCapsNet is essentially equivalent to CapsNet in terms of three evaluation metrics (MAPE, MAE, RMSE) and is superior to CNN and LSTM models with similar network structures and parameter settings.

II. LANE OCCUPANCY RATE FORECASTING MODEL
In this section, we propose the 2LayersCapsNet model for lane occupancy rate forecasting. We first define the lane occupancy rate input matrix and output vector based on spatialtemporal characteristics (Section A). Then, we review the VOLUME 8, 2020 traditional capsule network and change its routing component to make it more suitable for regression tasks. (Section B). Finally, the detailed architecture of the 2LayersCapsNet prediction model is introduced (Section C).

A. INPUT MATRIX AND OUTPUT VECTOR OF THE LANE OCCUPANCY RATE BASED ON SPATIOTEMPORAL CHARACTERISTICS
Because of the spatial-temporal correlation of the lane occupancy rate, we must consider the traffic information status of the forecasting segment and its upstream and downstream when constructing the forecasting vector. The lane occupancy rate follows some change rules over time: the lane occupancy rate in the next period can be seen as the continuation of the previous lane occupancy rate. Moreover, the lane occupancy rate is affected by upstream and downstream traffic conditions in space and shows a certain correlation: the downstream lane occupancy rate can be estimated by the upstream lane occupancy rate. In this paper, the original lane occupancy rate is processed into a lane occupancy rate matrix with temporal and spatial information based on the spatialtemporal characteristics. For S monitoring points in T time periods, the input matrix X can be expressed as follows: Then, the lane occupancy rate in n periods after the predicted monitoring point s can be expressed as the output vector y: Assuming that the lane occupancy rate of monitoring point s at time t is represented by x s,t , the data monitoring point s is assumed to be related only to the flow of its 2 upstream data monitoring points in space. Suppose we need to predict the lane occupancy rate at point s for one hour, which is related to the lane occupancy rate of the previous four hours. Assuming that the lane occupancy rate is measured every 10 minutes, the input matrix of the prediction can be expressed as: The predicted vector can be expressed as:

B. CAPSULE NETWORK MODEL
The original goal of CapsNet was to solve the target recognition task, and CapsNet demonstrates excellent performance on the MNIST data set. A capsule is the basic element of the capsule network, similar to the neuron of the traditional neural network. Unlike traditional neurons, capsules are composed of a group of neurons and are a feature vector that is used as input and output of CapsNet. Traditional neural networks use affine transformation and nonlinear activation functions (such as ReLU, sigmoid, and tanh) to transform low-level neurons into high-level neurons. CapsNet first uses affine transformation to transform low-level neurons into high-level neurons and then updates the weight by applying a dynamic routing algorithm between two continuous capsules, thereby increasing or reducing the connection strength between multiple input and output vectors, which is more effective than the basic routing strategy in CNNs. In contrast to the maximum pooling layer commonly used in CNNs, CapsNet does not drop information that may be related to output, such as the correlation between the temporal and spatial characteristics of the lane occupancy rate. Assume that the low-level input capsule is defined as u = (u 1 , u 2 , . . . , u n ), u i ∈ R d is the ith capsule of input. The highlevel input capsule is defined as v = (v 1 , v 2 , . . . , v m ), and v j ∈ R p is the jth capsule of output. The mechanism of dynamic routing can be divided into two steps: Step 1: Affine transformation. Traditional CapsNet obtains the intermediate eigenvector u j|i by multiplying the input vector u i and the weight matrix W ij .
The weight matrix W ij is a matrix of n × m, as shown in Fig. 1 (a).
To avoid the overfitting phenomenon caused by a large number of matrix parameters and improve the calculation speed of the model, this article uses a matrix weight sharing mechanism [29] to reduce weights and accelerate the calculations. In the weight sharing mechanism shown in Fig. 1 (b), the shared weight matrix of the input capsule and the output capsule connected to the lower layer is W j , and the intermediate feature vectorû j|i is: Step 2: Dynamic routing. Traditional CapsNet uses dynamic routing to calculate the output vector. The core of CapsNet is to update parameters according to its own

Algorithm 1 Dynamic Routing
Define Dynamic_Routing (u j|i , r, l): For i in r iterations do for all capsule i in layer l and capsule j in layer l + 1: characteristics by iteration of dynamic routing to obtain the output vector that best reflects the characteristics of the intermediate eigenvectors. However, the traditional routing parameters are usually initialized with uniform distribution, ignoring the differences among low-level capsules. By contrast, the routing parameters can be initialized via k-means clustering to improve the routing process. In the squashing function, we refer ReLU. We set the weights of short vectors to 0, so these vectors are not involved in the next dynamic routing calculation process, thereby reducing the training time. The calculation process of dynamic routing is shown in Algorithm 1.
First, we use k-means clustering to calculate the initial c j|i of the low-level capsules. Before the iteration begins, the following steps are performed: (1) Calculate the cluster centerû j ofû j|i .
(2) Initialize b j|i as the distance from eachû j|i to the cluster centerû j .
The vector distance û j|i ,û j is calculated as the cosine similarity.
c j|i indicates the degree of association between low-level capsule u j|i and cluster centerû j .
Before the first iteration, the traditional routing algorithm in CapsNet initializes b j|i to 0. Because c j|i = Softmax(b j|i ), the initial c j|i is the same, and the route in the first iteration does not play a substantial role, as shown in Fig. 2(a). Therefore, we use k-means clustering to obtain the cluster center u j and initialize b j|i based on the distance betweenû j|i and u j to obtain the value of c j|i so that the first iteration plays an important role, as shown in Fig. 2 (b). At the same time, we change the softmax function to the Leaky_Relu function because the goal is to predict rather than classify. Moreover, the LeakyRelu function results in faster computing speeds and better results.
Second, we use a nonlinear squash function to ensure that the short vector can be compressed to 0, and the long vector can be completely saved to obtain a high level of capsules α is defined as 1/4 of the maximum length of the output vector. If the vector is short, there are insufficient features in the vector, and there is no need to reserve them. According to the obtained v j , the similarity û j|i , v j ofû j|i and v j is calculated, and the next iteration parameter c j|i is obtained by Leaky_ReLU.
Then, the next iteration is performed based on c j|i . After all iterations are completed, output v j is the calculation result of the dynamic route.
In summary, compared with traditional CapsNet, clustering is introduced in the dynamic route, and Leaky_ReLU is used instead of softmax. Experiments show that such changes improve the performance of CapsNet on regression tasks.

C. PREDICTION MODEL
The capsule network regards a feature as an active vector that can be used in many time series prediction tasks. In general, the input of many prediction tasks is a time series representing historical values. We can assign each historical value in the sequence to each region and then transform the region into the distribution representation of the region. The distribution of the region can be regarded as a vector such that the sequence of such regions can be regarded as a group of capsules. According to the characteristics of the input and output vectors of the capsule network, we can use the capsule network to process the time series prediction task and obtain  the output of multiple prediction values in one calculation. Therefore, two prediction models are designed in this paper. The first prediction model, CapsNet, uses a basic model similar to text recognition (Fig. 3). The second prediction model, 2LayersCapsNet, takes into account the spatial-temporal correlation of the lane occupancy rate (Fig. 4) and uses a convolutional layer to acquire spatial-temporal characteristics to enhance the capsule layer.

1) CONSTRUCTION OF THE CONVOLUTIONAL LAYER (CONV_LAYER)
The purpose of constructing the convolutional layer Conv_Layer is to obtain a more abstract feature. The convolutional layer H 1 is calculated by convolution and contains many convolution surfaces. The a-th convolution surface of the convolutional layer H 1 is denoted by h 1,a , w 1 is the convolution kernel matrix, and the activation function is ReLU.
x s,t is the value of the convolution kernel in node (s, t) and · represents point multiplication. The input matrix is [station_nums, train_time_interval], station_nums is the prediction site and station_nums − 1 is the site before the prediction site. train_time_interval is the predicted time period of the station before the predicted time node. In the CapsNet model, the convolution kernel size is [3,3], the stride is 1, and the number of convolution kernels is 150. In the 2LayersCapsNet model, the convolution kernel size of the first layer is [3, train_time_interval], the convolution kernel size of the second layer is [station_nums, 3], all layers have a stride of 1, and the number of convolution kernels is pred_time_interval × 32(pred_time_interval is the number of time periods to be predicted).

2) CONSTRUCTION OF PRIMARYCAPSLAYER
The purpose of PrimaryCaps is to reconstruct the input data of the convolutional layer into vector data with a characteristic direction. The construction process is divided into two steps: construct the convolutional capsule layer and calculate the dynamic routing. The process is shown in Fig. 5.
The convolution process of the first step of PrimaryCap-sLayer is similar to that of Conv_Layer. The activation function of the CapsNet model is ReLU, the size of the convolution kernel is [3,3], the stride is 2, and the convolved output is reconstructed into many pred_time_interval dimension vectors with 32 channels (each vector contains pred_time_interval features). The second step is to calculate the process of dynamic routing. The detailed calculation process is shown in Algorithm 1. Finally, the output capsule that best reflects the input characteristics is output to the Digit-CapsLayer. The CapsNet and 2LayersCapsNet models have the same activation function and output shape, but the convolution cores of 2LayersCapsNet are [2, train_time_interval] and [station_nums, 2] in 2 layers (Fig. 4).
Finally, the output vectors are obtained by the weighting calculation after the dynamic routing calculation is completed and output to the DigitCapsLayer. The dynamic routing output of the first layer of 2LayersCapsNet is v 1 , and the output of PrimaryCapsLayer is obtained as the weighted summation of the outputs of the two layers (Formula 13).

3) CONSTRUCTION OF DIGITCAPSLAYER
The purpose of DigitCapsLayer is to reconstruct the vector data with feature direction into prediction data through three fully connected layers, as shown in Fig. 4: First, the data output from PramiryCapsLayer is flattened. Then, the data pass through 3 fully connected layers. The outputs of the three fully connected layers are 256,

4) LOSS FUNCTION
To minimize the error loss Loss(y,ŷ) between the true value y and the predicted valueŷ, the following formula is established:

5) CALCULATION OF PARAMETERS
In the training process of the network, parameters W L1 must be determined in the convolutional layer (L1:Conv_Layer), two parameters c L2 and W L2 must be determined in Prima-ryCapsLayer (L2), and parameters W L3 and b L3 must be determined in DigitCapsLayer (L3). c L2 can be calculated by dynamic routing, and the other parameters can be calculated by the adaptive moment estimation (Adam) gradient descent algorithm. Adam uses the first moment estimate and the second moment estimate of the gradient to dynamically adjust the learning rate of each parameter. After bias correction, the learning rate of each iteration is within a certain range, which makes the parameters more stable. This process is conducive to fully extracting the lane occupancy rate characteristics [30].

A. DATA DESCRIPTION
The experimental data in this paper are from PEMS-SF (http://archive.ics.uci.edu/ml/datasets/PEMS-SF) and are obtained from the PEMS website of the California Department of Transportation. The data represent the occupancy rate, between 0 and 1, of different car lanes of San Francisco Bay area freeways. The data cover 440 days from January 1, 2008 to March 30, 2009, with samples taken every 10 minutes. The data set considers each day as a 963-dimensional single time series (i.e., the number of sensors that work continuously throughout the study period). The training data cover 267 days, and the test data cover 173 days.
As an example, we consider the first sensor and present its lane occupancy rate data in Fig. 7.  As shown in Fig. 7, the lane occupancy rate data are concentrated in (0,0.1), and the maximum value does not exceed 0.5. Moreover, Fig. 8 shows that the lane occupancy rate changes of adjacent sensors have a causal relationship, so the influence of peripheral sensors on lane occupancy rate prediction of a sensor cannot be ignored.

B. EXPERIMENTAL PLATFORM AND PREDICTION TASK
The experimental platform of this paper is an Intel i7-6700K, 16 GB. The project is based on TensorFlow 1.13 and python 3.6 and is programmed in PyCharm. Time series prediction tasks use historical data to predict future data. The four prediction tasks are defined as follows: TASK 1: 30-min prediction with 240-min traffic history on 5 road segments; TASK 2: 60-min prediction with 240-min traffic history on 5 road segments; TASK 3: 90-min prediction with 240-min traffic history on 5 road segments; TASK 4: 120-min prediction with 240-min traffic history on 5 road segments;

C. EVALUATION METRICS
The mean absolute percentage error (MAPE), mean absolute error (MAE) and root mean square error (RMSE) are used to evaluate the prediction results.
VOLUME 8, 2020  y i denotes the true value, y i denotes the predicted value, and y i denotes the average value of y i . For illustration purposes, the accuracy of prediction is defined as:

D. EXPERIMENTAL RESULTS AND ANALYSIS 1) PREDICTION RESULTS OF THE MODELS
From the intuitive prediction results, model CapsNet and model 2LayersCapsNet take 60 monitoring points from the PEMS-SF data set and predict the results on Task 2, as shown in Fig. 9. (60 × 6 = 360 predicted values).
To obtain a clearer view, a random point is selected from the test set to illustrate the prediction results of different models, as shown in Fig. 10. Fig. 10 shows that the predicted values of 2LayersCapsNet are more accurate than those of CapsNet in reflecting the changing trend of the lane occupancy rate on Task 2. The change in RMSE on the training set and test set is shown in Fig. 11.   Fig. 11 indicates that the RMSEs of the training set and test set are basically consistent, which shows that 2LayersCap-sNet does not overfit the data.

2) COMPARISON OF THE EXPERIMENTAL RESULTS OF THE IMPROVED CAPSULE MODEL
The parameters of the CapsNet model are shown in Table 1.
With the same model as that Fig. 2, we compare the traditional method in the CapsNet model with the improved method with shared weight, k-means and a new squashing function in terms of the MAPE of lane occupancy prediction of Task 2. The comparison result is shown in Fig. 13.

3) INFLUENCE OF PARAMETERS ON THE MODEL RESULTS
The change curves of MAPE for the CapsNet model and 2LayersCapsNet model during training are shown in Fig. 12. Fig. 12 shows that the first MAPE value < 10 of 9.58 for 2LayersCapsNet is achieved when Globalsteps = 135 and time = 212.89 s. However, the first MAPE value < 10 of 9.97 for CapsNet is achieved when Globalsteps = 531 and time = 840.23 s. Therefore, the convergence rate of 2LayersCapsNet is faster than that of CapsNet.
The effect of the dynamic routing parameters on 2LayersCapsNet is shown in Fig. 14 (the smoothing function is Savitzky-Golay, points = 200).
The number of iterations for dynamic routing is varied in iter_routing = 1, 3, 5. The convergence speed of RMSE is the fastest when iter_routing = 5; in other words, the greater the number of iterations of dynamic routing is, the greater the number of output vectors, the greater the number of features used to reflect the input data, and the greater the quantity of spatiotemporal information and internal connections that can be found in the traffic data. However, from the perspective of the RMSE convergence rate, the speed is similar when iter_routing = 1 and iter_routing = 3, but when iter_routing = 5, more computing time is required for convergence. In the following experiments, we set iter_routing = 3.

4) RESULTS AND COMPARISON
To verify the validity of the proposed model, five deep learning-based models are chosen for comparison. CapsNet performs regression using the traditional CapsNet network structure shown in Fig. 1. CNNs [22] is a neural network consisting of three convolutional layers, three pooling layers and a fully connected layer. RNN takes into account the context of the data during training and can learn these characteristics by expanding time series, sharing parameters and hiding the state capture patterns at each time step. LSTM NN is an extension of RNNs and has become popular since the architecture can address long-term memories and avoid vanishing gradient issues. The RNN [31] and LSTM [32] models are optimized to contain three hidden layers with 1000 hidden units in each layer. SAEs is a neural network consisting of multiple layers of autoencoders, where the model inputs are encoded into dense or sparse representations before being fed into the next layer, and the SAEs model [18] is tuned to form three autoencoder layers with 3000, 2500 and 2000 hidden units in the three layers. The results for the 4 prediction tasks are shown in Table 2. Table 2 shows that 2LayersCapsNet is superior to CNNs and LSTM. Fig. 15 shows the prediction error rate (PER) for 6 time intervals at 20 traffic monitoring points (6 × 20 = 120 predicted values). PER = 100% × |ŷ − y| y (21) VOLUME 8, 2020    15 shows that the PER of 2LayersCapsNet is usually lower than of CNNs. In other words, 2LayersCapsNet can obtain more accurate prediction values than those produced by CNNs.
In summary, three conclusions can be drawn from Table 2, Fig. 12, Fig. 13 and Fig. 15. 1) Fig. 12 shows that the MAPE of 2LayersCapsNet converges faster than that of CapsNet, and the other two evaluation metrics (MAE and RMSE) have similar convergence curves; that is, 2LayersCapsNet trains models faster than does CapsNet. 2) Fig. 13 shows that when iter_routing = 1 and iter_routing = 3, the training time of the model is similar. When iter_routing = 5, the training time increases substantially, but the final RMSE convergence interval is not significantly improved. Therefore, iter_routing = 1 or 3 can be used when training 2LayersCapsNet. 3) Table 2 shows that 2LayersCapsNet is superior to CNNs, RNNs, LSTM and SAEs with respect to the three evaluation metrics for the four prediction tasks. Furthermore, Fig. 15 shows that 2LayersCapsNet has better prediction accuracy than CNNs.

IV. CONCLUSION
In this paper, we propose a lane occupancy rate prediction model, called 2LayersCapsNet, based on a capsule network and CNN. The model uses CNNs to mine the spatialtemporal correlation characteristics of the lane occupancy rate and then uses the capsule network to mine the interrelationships in traffic data measured by sensors in continuous time intervals to predict the lane occupancy rate. The model can solve the problem of CNNs losing important spatiotemporal information caused by the maximum pooling operation and can obtain better prediction results. Experiments on PEMS-SF data sets show that the convergence rate of 2LayersCapsNet is faster than that of CapsNet. Finally, we compare the model with the CNN model and LSTM model. The experimental results show that 2LayersCapsNet substantially improves the prediction accuracy, which demonstrates the feasibility of the model for lane occupancy rate prediction. Determining how to combine CapsNet with 2LayersCapsNet to improve the generalization ability and application scope of the model is our next goal.