Driver Lane Change Intention Recognition Based on Attention Enhanced Residual-MBi-LSTM Network

Accurate identification of lane-changing intention can effectively assist intelligent driving vehicles in terms of decision-making and trajectory planning, which plays a significant role in enhancing driving safety by reducing traffic accidents caused by lane-changing. Based on the trajectory characteristics and vehicle interaction information, an attention-enhanced bidirectional multi-layer residual long-short term memory neural network (Attention Enhanced Residual-MBi-LSTM) model is proposed for lane change intention recognition in this paper. Firstly, an EWMA filter is employed to smooth the noisy data collected from the vehicle. Then a four-layer bidirectional residual LSTM memory (Residual-MBi-LSTM) structure is used to extract lane-changing features from the historical driving trajectories of ego-vehicle and vehicle interaction information. Besides, the attention mechanism is added to adjust the weight of data in different time frames. After that, the current lane-changing probability is calculated and output by the Softmax function. Finally, the vehicle lane-changing intention recognition model is firstly trained and then verified in the HighD dataset. According to the HIL experiment, the proposed model has the ability to identify driver intention on average of 2.07 seconds in advance.


I. INTRODUCTION
According to statistics, human drivers' operational errors are one of the principal origins of most traffic accidents. Among all the traffic accidents, 18% occur in the lane changing process [1]- [3]. With the rapid development of artificial intelligence, advanced driving assisted systems (ADAS) have been widely developed, hence significantly reducing the probability of traffic accidents. Driving intention recognition has been reckoned as one of the core technologies of next-generation ADAS products. By recognizing driving intention, ADAS can give early warning to vehicle collisions and take collision avoidance operations to avoid accidents [4].
There is a clear difference between lane change behavior perception and intention recognition: lane-changing behavior perception refers to perceiving the process by sensors when the vehicle has started to change lanes [5], while lane-changing intention recognition refers to the process of identifying or predicting possible upcoming behaviors such as lane-changing, cutting, overtaking or turning before starting. But both of them are essentially the process of classifying multidimensional time series (or vectors) constructed from vehicle motion information, driver gaze (or head-turning) information, or traffic context information [6].
Many scholars have carried out research on the recognition of driver lane change intention. Focused on the relationship between ego-vehicle and directly surrounding vehicles, Schlechtriemen et al. [7] used a generative model based on an improved naive Bayesian approach, which was trained and validated on a test drive in the real world, to classify the lane-changing intention. Based on Support Vector Machine (SVM), Kumar et al. [8] proposed a recognition model, which can predict lane change intention on average 1.3s in advance for online recognition. However, these above methods only classify according to the current state parameters and cannot use the context information of the continuous dynamic driving process to mine deep driving intentions.
The Hidden Markov method (HMM) [9] overcomes the shortcomings of the above simple classifiers and is able to model time series of arbitrary length, but its accuracy is relatively lower than current mainstream methods. Furthermore, the above machine learning methods need a priori model, which may cause bias due to inaccurate prior assumptions [10]. In conclusion, for the lane-changing intention recognition problem, statistical-based machine learning methods have certain defects and limitations.
The rapid development of high-performance computing and big data technology in recent years has made datadriven lane change intention recognition possible. However, earlier studies paid less attention to the correlation between time series information [11].
With the success of LSTM in speech recognition and machine translation, realizing the advantages of LSTM in solving time series problems, some scholars have introduced LSTM into the field of lane changing intention recognition. Jian et al. [12] proposed a driving intention detection LSTM based approach with better accuracy in driver intention recognition. Xie et al. [13] proposed an LSTM based lane change implementation (LCI) model, however, the proposed LCI model cannot recognize the lane changing intention until the vehicle is about to cross the lane marks. In order to improve the recognition accuracy in an early stage, classical single-layer-LSTM was extended to multi-layer-LSTM by Tang et al. [14]. However, these methods do not take the weight assignment of data in different time frames into account. Due to the weighting scheme of LSTM [15], lane changing intention can only be detected after significant lateral movement. In order to improve the timeliness, the attention mechanism was introduced to reallocate the weight. Guo et al. [17] proposed an AT-LSTM Based intention recognition model. In addition to vehicle motion parameters, Guo also introduces human features such as pupil gaze point and head pose by using the eye tracker, which is not suitable for a real driving environment. Similar to AT-LSTM, Hao et al. [18] also proposed Attention-based GRU for intention recognition. However, the use of single-layer-GRU inhibits the improvement in recognition accuracy.
Based on the trajectory characteristics and vehicle interaction information, this paper proposes an attentionenhanced residual MBi-LSTM model for driver lane change intention recognition.
Our contributions in this study are twofold: (1) A novel attention enhanced Residual-MBi-LSTM Network is proposed for driver lane change intention recognition. In this paper, A 4-layer sequential bidirectional LSTM neural network with short cut structure is proposed to extract lane change intention from the history trajectory of ego-vehicle and the interaction between the ego-vehicle and the preceding vehicle, while a soft attention mechanism is introduced to adjust the weight of different data frames.
(2) The introduction of residual block enables our model to become deeper, solving the optimization bottleneck problem.
The structure of this paper is organized as follows: in part Ⅱ, the data preprocessing method are introduced; in part Ⅲ , the proposed model structure, parameter determination method, and model training process is introduced; in the last part, the performance of the model is tested through ablation experiment together with other experiments, a hardware-in-loop simulation is also conducted to test the model performance in the real working condition; finally, a conclusion is given in part Ⅶ.

A. HIGHD INTRODUCTION
The HighD dataset [16] is used to extract the trajectory information of the ego-vehicle and surrounding vehicles in this paper. The HighD dataset is an unmanned aerial vehicle (UAV) sampled large-scale vehicle natural trajectory database.
The dataset includes trajectory data of 110,000 vehicles at 6 locations, as is shown in Figure 1, around Cologne, Germany, for 16.5 hours, with a total mileage of 45,000 km, including 5,600 complete lane change records.
The dataset includes data information such as time t, vehicle ID, vehicle ordinates, lateral and longitudinal speed etc., which can fully meet the requirements of this paper. It also provides complete surrounding vehicle information and vehicle interaction information, for example, surrounding vehicle IDs, TTC, THW, etc.

B. DATA PREPROCESSING
As mentioned above, the dataset is collected with a UAV located directly above the road segment. Due to the rotation and movement of the UAV, the data set has some measurement noise. For the purpose of improving the data quality, a symmetrical exponentially weighted moving average (EWMA) filter is adopted to smooth the noisy raw data.
The method is shown as follows [14]: The weighted moving average (WMA) refers to the method of assigning larger weight to recent data and less weight to past data. In this paper, the recent observations have a greater impact on the predicted value (PV), hence being able to reflect the trend better. The EWMA method means that the weighting coefficient of each value decreases exponentially with time.  After smoothing, a typical vehicle (id=1-13) trajectory is presented in Figure 3(a), while the smoothing effect of vehicle speed (id=1-13) and angular velocity (id=1-13) is presented in Figure 3(b), (c). As can be noticed from Figure 3 (a) and (c), both the Savitzky-Golay method and the EWMA method can better reflect the natural trajectory. Therefore, considering the filtering effect and hardware resource consumption, the EWMA method is adopted to smooth the vehicle trajectory, velocity, and angular velocity.
For the intention recognition module, the extracted trajectory segments need to be divided into three categories: left lane change, right lane change, and non-lane change, and then corresponding labels are attached respectively [19].
As shown in Figure 2, the intersection of the vehicle trajectory and the lane line is defined as the lane change point.
Once the lane change point has been determined, the startpoint and end-point of the lane changing process can be determined by calculating the heading angle  of the vehicle. Among that, the heading angle  can be defined as follows: arctan( ) Where ( , ) x y are the coordinates of the vehicle.
Traverse all the heading angle  from the lane changing point along both sides of the timeline. If | | e    (where e  is the threshold of the end heading angle for lane change) three times continuous, the point that reaches the threshold for the first time is defined as the end-point. If | | s    (where s  is the threshold of the initial heading angle for lane change) three times continuous, the point that reaches the threshold for the first time is defined as the start-point.
To make full use of the data, the sliding time window segmentation method is employed to process the vehicle trajectory segments with equal length [20], the time window length is P T , the sampling frequency is 25Hz, and the number of sampling points is 25 p n T   . Since the total number of lane change trajectories is far less than that of non-lane change trajectories, random downsampling is used to reduce the number of non-lane change trajectories. Finally, the number of left lane change, non-lane change, and right lane change data segments are 32828, 26419 and 10614, respectively. The three types of data and corresponding labels are randomly sorted as an entire data set, which is then divided into training set, validation set and test set according to the proportion of 70%, 15% and 15%.

III. VEHICLE LANE CHANGE INTENTION RECOGNITION MODEL
A. RESIDUAL MULTI-BILSTM NETWORK STRUCTURE

1)LSTM NETWORK STRUCTURE
LSTM is a special kind of recurrent neural network (RNN). LSTM overcomes the problems of gradient vanishing and exploding gradients in long-time sequence training [21], hence having better performance in long-time sequence training.
The structure of a single LSTM unit is shown in Figure 4. The calculation process of an LSTM unit is as follows: where  denotes the dot product, ( )   denotes the sigmoid function.
Among them, t f is the forget gate, which activates the current input state t x and the incoming hidden state 1 t h  of the previous layer unit through the sigmoid function; t i is the input gate, which activates the current input state t x and the previous layer unit through the sigmoid function and passes in the hidden state 1 t h  , but its weight c W is different from the forget gate weight f W . The input gate t i selects the current layer information t c  as a new memory to add to the current memory state t c , t o denotes the output gate, outputting the hidden state t h of the current layer.

2)MULTI-BILSTM NETWORK STRUCTURE
As is shown in Figure 5, the Multi-LSTM refers to a network structure formed by stacking layers of LSTM. Compared with the traditional single layer LSTM, Multi-LSTM structure can significantly increase the nonlinear components and hence improve generalization ability and robustness [14].  The BiLSTM network consists of two sub-layers: the forward LSTM (FW LSTM) layer and the backward LSTM (BW LSTM) layer. Unlike the traditional LSTM neural network backward propagation method, the BiLSTM network has bidirectional characteristics so that it can effectively use the past features through BW LSTM and future features through FW LSTM. BiLSTM can help solving long-term dependency problems and improve prediction accuracy by establishing two-way connections in neural networks [22]- [24]. The BiLSTM structure is shown in Figure 6. The final output of the forward and reverse LSTM units is concatenated as follows:

3)RESIDUAL BLOCK
Due to the deepening and increasing complexity of the network, the learning efficiency shall be reduced, hence making the accuracy not being effectively improved [25].
Inspired by Resnet [26], to solve the optimization bottleneck problem, this paper adds a Residual Block to the MBi-LSTM structure. As is shown in the Figure 7, The final output of the l-th residual block can be written as follows: Where l Batch Normalization (BN) [27] is an algorithm that speeds up the training process by fixing the distribution of layer inputs and reducing internal covariate shifts. The BN layer has trainable weights that normalize the layer output by setting the mean to 0 and the variance to 1. The standardization process is shown in Figure 8. The calculation process can be constructed as follow: ( ) Therefore, a scale transformation factor  and an offset transformation factor  are introduced as follows: k The training process after introducing the residual block and BN algorithm is shown in Figure 9. Figure 9 (a) is the loss of the training process, while Figure 9 (b) is the accuracy of the training process. As shown in Figure 9, after introducing the residual block and adopting the BN algorithm, the loss decreases more rapidly during the first several training iterations, hence making the model converges faster.   For a long time series, the importance of different data in different time frames are usually unequal. In order to allocate the weight of data in different time frames, a soft attention mechanism [28] is introduced.

C. ATTENTION MECHANISM
The structure of the soft attention mechanism is presented in Figure 10.
As shown in the figure, ' h is the output of the The importance value t  is the attention weight, and the larger t  is, the more important the frame data is. The final output can be calculated as follow: The weights of the final output will be assigned by data frame importance, not equal weight output.

D. DRIVER LANE CHANGE INTENTION RECOGNITION BASED ON ATTENTION ENHANCED RESIDUAL-MBI-LSTM NETWORK
This paper proposes an attention-enhanced bidirectional multi-layer residual long short-term memory neural network lane-changing intention recognition model, whose structure is presented in the Figure 11. The model consists of data preprocessing layer, input layer, residual-MBi-LSTM layer, soft-attention layer, Softmax layer, and output layer.   Table Ⅰ shows the main parameters [29], [30] .while considering the motion parameters of ego-vehicle, surrounding vehicles, especially the vehicle in front of the ego-vehicle, have a significant impact on the driving intention and driving trajectory of the ego-vehicle [31], [32]. Therefore, the relative distance If there is no preceding vehicle, the corresponding virtual vehicle shall be populated, with its relative speed set to 0 and the relative distance set to 420-x(420 is the distance of a sector) [16]. is the probability vector of the three driving intentions. 1 2 3 , ,    is the probability of the three driving intentions. The model output is: The data preprocessing layer is used to collect data and filter the data; the first layer of Bi-LSTM is used to extract vehicle lane change features, then the extracted lane change features are sent to the intent recognition neural network (Bi-LSTM layer2,3,4). The intention recognition neural network analyzes the temporal connection between the lane change features. After that, these features are sent to the attention layer, which assigns weights to data in different time frames. Finally, through the fully-connected (FC) layer, a Softmax function is used to calculate the probability of three driving intentions, namely the probability of left lane change and non-lane change, right lane change.

IV. INFLUENCE OF NETWORK STRUCTURE
A. THE RELATIONSHIP BETWEEN MODEL RECOGNITION ACCURACY AND TIME STEP Figure 12 shows the relevance between the recognition accuracy and the time step. When the time step is no less than 0.4s, the proposed model has good accuracy for lane changing intention. The overall accuracy of driving intent recognition reaches the best when the time step equals to 0.4s. Therefore, the time step of the driving intent recognition model proposed in this paper equals to 0.4s. When the value of ttime step is too small, too few features will be excavated by the MBi-LSTM layer, hence resulting in the lane change intention easily affected by random interference. When the time step is less than 0.3s, the accuracy drops significantly; that is because, when the time step is too long, MBi-LSTM tends to give higher weight to the later extracted features, hence misidentifying the lane change intention as non-lane change intention.

B. THE RELATIONSHIP BETWEEN MODEL RECOGNITION ACCURACY AND THE NUMBER OF BI-LSTM LAYERS
The performances of Bi-LSTM with different layers are compared to determine the optimal number of layers. The relevance between the accuracy and the number of Bi-LSTM layers is shown in Figure 13. As can be noticed from Figure  13, when layers are greater than three, the MBi-LSTM model has an optimization bottleneck. The recognition accuracy first decreases slightly, then remains unchanged; when layers are no more than three, the prediction accuracy of the MBi-LSTM model increases with the number of layers.
Since Residual MBi-LSTM can solve the optimization bottleneck and gradient disappearance problem, after introducing the residual block structure, the recognition accuracy still keeps increasing even when layers are greater than three. Suppose the number of layers is too small, the lane-changing intention recognition model cannot fully extract the intention, resulting in the lane-changing intention being easily affected by random interference, causing a relatively lower accuracy. Considering the recognition accuracy and prediction time-consuming, a four-layer network structure is adopted in this paper.  (18) Where ji y denotes the true distribution, ˆj i y denotes the prediction distribution, n denotes the total number of label categories. The SGD algorithm is used in order to train the model. After 12 epochs, Loss and accuracy gradually converge. The training process curves are shown in Figure 14.

VI. EXPERIMENTAL VERIFICATION
To verify the performance of the driver lane change intention recognition method we proposed, ablation experiments to test the influence of Attention mechanism, bidirectional characteristics and residual block, together with other performance tests were conducted. Finally, hardwarein-loop simulation experiments are also carried out to test the performance of our model.

A. TEST AND ABLATION EXPERIMENT
Because of the imbalance of the dataset, the actual performance of the classifier cannot be accurately reflected by the accuracy rate. Therefore, two metrics widely used for imbalanced multi-classification problems [35], Balanced Accuracy and F1-score, are introduced. Moreover, recall rate is used in this paper as another important evaluation criterion [36].

Balanced Accuracy= 2
TPR TNR  Where:  For the purpose of demonstrating the advantages of the proposed model, in this paper, we compared it with four state-of-the-art models. It is worth noting that all models mentioned above are conducted on the HighD data set. In order to ensure the fairness of the model comparison, AT-GRU based model proposed by Guo et al. [17], Multi-LSTM based model proposed by Tang et al. [14] and our model were tested on the HighD test set, the results were compared with those of DCIE [33] and basic LSTM [13] claimed in the cited literatures. The model performance comparison result is shown in Table II. Liu et al. [33] proposed the driver characteristic and intention (DCIE) based on the dynamic bayesian network (DBN). In DCIE, the driver characteristic recognition module and lane changing intention identification module are cascaded. In other words, the accuracy of the lane changing intention recognition will be affected by the accuracy of the driver characteristic recognition, hence resulting in lower recognition accuracy.
Similar to our method, Tang et al. [14] also employed the LSTM to model the lane changing behaviors. However, the proposed method ignores the weight allocation issues of data in different time frames. consequently, lane changing intention can only be identified after significant lateral movement. From the basic mechanism of the LSTM, Suzuki et al. [15] explained this limitation. The weight allocation mechanism of an LSTM network always tends to allocate higher weights to the variables which are close to the event, while variables far away from the event shall receive a tremendous penalty, hence inhibiting the model from predicting long-term lane changing intentions in advance.
The same limitation also happens in the studies of Xie et al. Similar to the method of Tang et al., Xie et al. [13] adopt an LSTM network to realize intention recognition, and the accuracy of lane changing intention recognition was higher than 99%. However, their experimental setup is tricky. In their model, 5 seconds data before the lane changing time is used as the model input. That is to say, their model was unable to understand lane changing intention until the vehicle is about to crosse the lane markings. Though the highest accuracy, it's obvious that their model fails to give a timely warning.
Hao et al. [34], Solves the above problems by introducing an attention mechanism. However, the single-layer GRU is still facing optimization bottlenecks, which limits the improvement of recognition accuracy.
In summary, the model proposed has good identification accuracy (98.01%), which is adequate with the four mentioned state-of-art models. The recognition accuracy is further improved by introducing the residual mechanism. By taking advantage of the soft attention mechanism, the proposed model can identify lane changing intentions 2.196 seconds before lane changing time on average, indicating the timeliness of the proposed model.
To visualize the data, the recognition results are plotted as a confusion matrix as shown in Figure 15. Combining Table  Ⅱ and Figure 15, it can be seen that our method has good lane-changing intention recognition accuracy. In this paper, ablation experiments and Receiver Operating Characteristic curve (ROC) are used to test the influence of the attention mechanism, residual block and bidirectional characteristics. The area under the ROC curve is defined as the AUC area. The closer the AUC region is to 1, the better the performance of the classifier.
As can be seen from Figure 16 (a) (b) (c), the performance of the MBi-LSTM classifier has a greater increase than that of the single-layer LSTM. In the three cases, the AUC increased by 6.6%, 9.17%, and 6.93%, respectively. That is, both the Multi structure and the bidirectional characteristics structure have greatly improved the recognition accuracy of the model. This is mainly because there are many dynamic time-varying factors that affect the vehicle lane change process, the recessive factors, such as the driver's Driving intent or incorrect operation, etc., are difficult to analyze [37], compared with single-layer LSTM, MBi-LSTM has more ability to infer driver intention from context information.
At the same time, the residual block structure, which can solve the gradient disappearance and optimization bottleneck problems of deep neural networks, improved the AUC by 1.04%, 0.03%, and 0.98%, respectively, compared with MBi-LSTM.

B. HARDWARE-IN-LOOP SIMULATION EXPERIMENTS AND CASE ANALYSIS
The hardware-in-the-loop (HIL) platform is shown in the Figure 17 and 18. The system consists of a force feedback steering wheel, brake and throttle pedals, screen, Matlab Simulink software, Carsim software and industrial personal computer (IPC). In the HIL system, Carsim RT dynamic Vehicle software generates virtual animation scenes projected onto a large screen. The IPC collects the states of the brake, throttle pedals and the force feedback steering wheel in real-time at a sampling rate of 25 Hz and feeds it to the intention recognition model running on the IPC.
The HIL experiment was performed on a highway with a test road of 5 km. The test road was a two-way six-lane road with a lane width of 3.75m, and there was an isolation fence in the middle of the road. The minimum speed limit and the maximum speed limit are 60km/h and 120km/h, respectively.  Before the experiment, each participant was asked to take a test drive to fully understand the HIL system. In the experiment, participants were asked to drive freely in the middle lane. When participants see an obstacle vehicle, they are free to choose between changing lanes or keeping lanes. In each experiment, the driver was required to perform at least five lane changes at any position. The test was repeated four times. This experiment also minimizes the learning effect by adjusting the position of obstacle vehicles.
As shown in Figure 19, the time to change lane (TTCL) is defined as the time interval from the current location of the lane-changing vehicle to the vehicle center's arrival at the lane line. The earlier the vehicle cut-in intention is identified, the better the vehicle's decision-making [38]. The lane-changing process of vehicle id11 is shown in Figure 20. In which, the ego-vehicle passes the slow-moving truck ahead from the left lane. The lane change intention during the overtaking process is shown in Figure 21   Across all simulation experiments, our model can identify driver intention on an average of 2.07 seconds before lane changing time, proving the practical value of our method.

VII. CONCLUSION
1) A driver lane change intention recognition method based on attention enhanced residual-MBi-LSTM is proposed in this paper. The introduction of a bidirectional, Multi-layer structure, attention mechanism and residual block, can help solve long-term dependence problem, optimization bottleneck and adjust information weight distribution, hence ameliorating the model's recognition accuracy. 2) This model was trained and tested on the HighD dataset. The results show that our method has a balanced accuracy rate of higher than 98.01% for the recognition of three types of driving intentions.
3)Based on the simulation experiments conducted in the hardware-in-loop platform, our method can identify driver intention on an average of 2.07 seconds in advance.