Probabilistic Risk Metric for Highway Driving Leveraging Multi-Modal Trajectory Predictions

Road traffic safety has attracted increasing research attention, in particular in the current transition from human-driven vehicles to autonomous vehicles. Surrogate measures of safety are widely used to assess traffic safety but they typically ignore motion uncertainties and are inflexible in dealing with two-dimensional motion. Meanwhile, learning-based lane-change and trajectory prediction models have shown potential to provide accurate prediction results. We therefore propose a prediction-based driving risk metric for two-dimensional motion on multi-lane highways, expressed by the maximum risk value over different time instants within a prediction horizon. At each time instant, the risk of the vehicle is estimated as the sum of weighted risks over each mode in a finite set of lane-change maneuver possibilities. Under each maneuver mode, the risk is calculated as the product of three factors: lane-change maneuver mode probability, collision probability and expected crash severity. The three factors are estimated leveraging two-stage multi-modal trajectory predictions for surrounding vehicles: first a lane-change intention prediction module is invoked to provide lane-change maneuver mode possibilities, and then the mode possibilities are used as partial input for a multi-modal trajectory prediction module. Working with the empirical trajectory dataset highD and simulated highway scenarios, the proposed two-stage model achieves superior performance compared to a state-of-the-art prediction model. The proposed risk metric is computationally efficient for real-time applications, and effective to identify potential crashes earlier thanks to the employed prediction model.


I. INTRODUCTION
A LTHOUGH the total number of road fatalities has dropped by 23% from 2010 to 2019 across the EU, over 20,000 people lost their lives and over one million people were seriously injured each year [1]. Meanwhile, innovations in automated vehicles (AVs) are boosting future transport and mobility. Ensuring safety of automated driving is one of the prerequisites for introducing AVs to consumers and communities [2]. As a result, road traffic safety has attracted continuously increasing research attention, in particular in the current transition from human-driven vehicles to AVs [3].
Systematic traffic safety analysis is crucial to identify the risk faced by road users and refine design of automated driving systems. One of the key components in safety analysis is the safety/risk metric that quantifies the risk level. While frequency and severity of crashes, injuries and fatalities are direct measures of safety, they are rare events and crash records are difficult to access [4]. In line with the assumption that crashes result from temporal sequential events in which conflict events occur prior to a crash event, the frequency of the conflict events can be considered to predict the crashes [5]. More specifically, the initial conditions of a regular non-crash event are used to calculate a "surrogate" to represent the likelihood of future possible crash events. Thus, this type of analysis approaches are characterised as Surrogate Measures of Safety (SMoS), and typically calculated in a time-series manner. For instance, Time To Collision (TTC) [6], [7], the time that remains until a collision between two vehicles would occur if they keep their current speeds, has been widely utilized to measure the driving risk for vehicle collision warning or avoidance. The definition of TTC is further extended in [8], [9] when the vehicle relative velocity is negative. Other SMoS, e.g., Time Headway [10], Time to Lane Crossing [11], have also been broadly adopted for traffic evaluations. This paper focuses on the development of a new SMoS.

A. Related Work
One major shortcoming of the current SMoS is that they are mostly deterministic, which neglects motion uncertainties that are inherent from behaviors of surrounding road users or the perception and actuation of driving support/automation systems. Moreover, many SMoS are based on the assumption that interacting vehicles move with unchanged velocity/ acceleration [3]. Consequently, such SMoS may fail to provide proactive and timely risk assessment in this case, since they cannot anticipate the uncertain motion dynamics [12]. To address the motion uncertainties, several probabilistic approaches have been integrated to calculate SMoS. Based on causal analysis, Davis et al. [13] addressed the motion uncertainties with different initial conditions. The crash probability was expressed as a mixture of probabilities over different sets of initial vehicle conditions and braking decelerations, and the mixing probabilities are governed by the evasive action of subject vehicle. Following [13], Kuang et al. [14] further developed an Aggregated Crash Index by adding disturbances This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ into the initial conditions. The motion uncertainties have been also considered with future dynamic motions. For instance, Saunier and Sayed [15] defined a set of motion patterns for both subject and surrounding vehicles and obtained corresponding likelihood in terms of the collision time using a learning based model. Similarly, Jansson [16] generated a set of motion predictions of the subject vehicle as a tree of possible trajectories using collision avoidance theory. However, their method suffers from computational complexity, especially for long-term horizon predictions. Besides, the majority of SMoS do not consider the crash severity (notable exceptions, e.g. [3], [17]), which could significantly impact the driving risk assessment [18].
Recently, the artificial potential field, where a surrounding object to the vehicle is modelled as a potential field, has been applied to model driving risk considering the influence of driver, vehicle and road characteristics [17]. Inspired by the paradigm of artificial field theory, Mullakkal-Babu et al. [3] further developed a probabilistic driving risk field (PDRF) metric to assess the driving risk for on-road vehicles, where a normal distribution of acceleration is pre-defined to approximate the motion uncertainty of surrounding vehicles; the pre-defined parameters may impact the adaptability of the proposed risk metric. Moreover, the interactions between subject and surrounding vehicles are not considered in PDRF [3], which could impact the collision probability calculation. How to provide more accurate prediction of surrounding vehicle motion and integrate them into a driving risk metric still remains unexplored.
The motion uncertainties are mainly formed by the stochastic behaviors of surrounding objects, corresponding to trajectory and lane-change prediction problems [12]. Traditionally trajectory predictions are categorized as three types: physicsbased [19], [20], maneuver-based [21], [22], and interactionaware ones [23], [24]. A comprehensive survey of trajectory predictions is provided in [25]. Vehicle lane-change intention prediction also plays an important role to anticipate uncertain behaviors of surrounding vehicles, especially on highways with a structured environment. Various lane-change intention prediction models have been proposed [26], and can be divided into four categories, i.e., generative model [27], [28], cognitive model [29], [30], discriminative model [31], [32] and deep learning [33]- [35]. Recently, due to the development of deep learning theories and parallel computation hardware, tremendous achievements for lane-change intention predictions have been made through deep learning approaches, especially Long Short-Term Memory (LSTM) networks which are capable of handling time series forecasting problems. For instance, Scheel et al. [33] developed an attention-based LSTM model, where the heading angle of the surrounding vehicle is used as input. A driver intention inference model based on LSTM ensemble was designed for highway lane-change maneuvers, while a facial features detection system was developed to obtain drivers' head gesture and eye gaze dynamics [34]. However, these features typically are not provided in naturalistic driving datasets (NDD), e.g., NGSIM [36] and highD [37]. Huang et al. [35] proposed an LSTM-based model for lanechange intention prediction in highways, and further integrated it into a risk assessment framework, while the crash severity is not considered.
As more NDD, e.g. NGSIM [36] and highD [37], have been collected to underpin data-driven approaches, a number of deep learning studies for multi-modal trajectory prediction have been conducted [38]- [40]. One advantage of the multi-modal trajectory prediction models is that they cannot only provide multiple predicted trajectories, but also output probabilities for each pre-defined maneuver mode, e.g., the lane-change maneuver categories. In particular, Deo and Trivedi [38] defined three lane-change modes to categorize the lane-keeping, turning-left and -right maneuvers, and the trajectory was then predicted respectively for each mode. In doing so, the vehicle lane-change intentions have been also predicted simultaneously. However, to realize superior performance of trajectory prediction, the location information of surrounding vehicles in [38] is aggregated by convolutional and maxpool layers in the designed deep neural network, which in turn could have negative impacts on distinguishing the lane locations (we will further discuss this issue later). This motivates us to further develop a more accurate twostage multi-modal prediction model based on [38], to better address the motion uncertainties for risk assessment.

B. Objective and Contributions
In this work, we aim to develop a probabilistic driving risk metric on highways leveraging two-stage multi-modal trajectory predictions for both online and offline applications. Here the multi-modes refer to different lane-change maneuvers, e.g., lane-keeping and turning-left/-right on highways. The proposed driving risk metric is measured by the maximum risk value over different time instants within a prediction horizon. At each time instant, the risk of the subject vehicle is estimated as the sum of weighted risks over each maneuver mode in a finite set of maneuver possibilities. Under each mode, the risk is calculated as the product of three factors: lane-change maneuver mode probability, collision probability and expected crash severity. To achieve accurate estimation of these three factors, we establish an LSTM-based two-stage multi-modal prediction model that consists of a lane-change intention prediction module and a trajectory prediction module. The predicted trajectory results are represented as bivariate normal distributions under each maneuver mode to address two-dimensional motion uncertainties.
The contributions in this work are: • Leveraging the multi-modal trajectory prediction model, the proposed driving risk metric can anticipate risk in the future, and does not rely on a system dynamics model, and a known normal distribution of surrounding vehicles. Based on the empirical trajectory dataset highD [37] and simulated highway scenarios, the proposed safety risk metric is validated to be capable of real-time and offline applications, and able to correctly classify crash and noncrash events. Moreover, it is verified to be effective to identify potential crashes earlier thanks to the employed prediction model. • Two prediction models with different input features are employed for the lane-change prediction module and the trajectory prediction module, respectively. Unlike existing multi-modal trajectory prediction models [38]- [40], we first specifically design a LSTM based lane-change intention mode considering not only historic subject and surrounding vehicle trajectories, but also additional subject features, e.g., lane related information and lateral deviation that we will define later. Then we adopted a trajectory prediction model in [38]. The two-stage structure including two different prediction modules provides more accurate prediction results compared to a state-ofthe-art prediction model. The proposed prediction model is trained and tested with highD, and achieves superior performance in terms of both lane-change and trajectory predictions. The remainder of the paper is organized as follows: Section II introduces an existing driving risk assessment metric PDRF and identifies its shortcomings. Then we propose a driving risk metric on highways, which leverages two-stage multi-modal trajectory predictions to address vehicle motion uncertainties; Section III details the proposed LSTM-based multi-modal prediction model, consisting of a lane-change and a trajectory prediction module. Simulations are conducted in Section IV to verify the superior performance of the proposed multi-modal prediction model; besides, the prediction accuracy, timeliness, and computational efficiency of the proposed driving risk metric have been validated in Section V; finally, conclusions are drawn in Section VI to highlight the important contributions of our work and potential future directions.

II. DRIVING RISK METRIC ON HIGHWAYS
In this section, we first introduce an existing probabilistic driving risk field metric, i.e., PDRF in [3] and identify its shortcomings. We then propose a new risk metric P-PDRF leveraging a two-stage multi-modal trajectory prediction model, which we will introduce later.

A. Probabilistic Driving Risk Field
Assuming that crashes results from a temporal sequence of events including conflict events prior to a crash [5], the driving risk is commonly described by the SMoS. This is because SMoSs can characterize initial conditions of a regular (noncrash) event as a surrogate for the likelihood of crash events. However, uncertainties are inherent components of driving risk assessment, while SMoSs do not typically account for uncertainties, assuming deterministic motion of interacting vehicles with unchanged velocity/acceleration. To address the motion uncertainties, Mullakkal-Babu et al. [3] follow the vehicle functional safety ISO 26262-1 [41] and define the driving risk as the consequence of the subject vehicle maintaining its planned trajectory, despite the unknown motion of the surrounding vehicle. The general idea of PDRF can thus be summarised as follows: the driving risk is estimated with a risk field at the subject vehicle's future location, and then at the end of prediction horizon, the driving risk is formulated as the product of two factors: a collision probability considering unknown future motion of the surrounding vehicle, and an expected crash severity.
Since the calculation of collision probability in the proposed risk metric is different from that in PDRF, here we only provide a brief description of collision probability calculation, and point out its shortcomings. To calculate the collision probability in PDRF, the motion prediction of subject and surrounding vehicles are modelled separately. For the subject vehicle, we assume that its future motion is known in advance: at each time step, the subject initially has an originally planned trajectory. The risk metric can be employed as one of the indicators to decide whether the previously planned subject trajectory is still suitable at the current time. If yes, the subject continues the planned trajectory; otherwise, an updated future subject trajectory should be generated via the motion planning module, which is out of the scope of this work.
The dynamic state of a surrounding vehicle is denoted by the position (x, y) of its center of mass and the velocity (v x , v y ) along the longitudinal and lateral directions, resulting in Eq. (1).
where a x and a y are the accelerations along with the longitudinal and lateral directions respectively. The dynamic state is to be propagated from the current time instant t to a future time t +t f . The surrounding vehicle motion is also subject to several physical constraints, including non-holonomic behavior, backward motion prohibition, and acceleration range limitation [3]. Given the predicted future motions of the surrounding vehicles, PDRF further assumes their accelerations have a normal distribution under a specific mean (typically set as 0 since we do not have trajectory planning information of the surrounding vehicle) and standard deviation. For a single future time instant, the collision probability can now be calculated as the double integral of the probability density functions (PDF) of the surrounding vehicle within the intersection area.
The expected crash severity s considering the vehicle mass and velocity is constructed as where M is the mass of subject vehicle, β = M sur M sur +M the mass ratio with M sur denotes the mass of surrounding vehicle, and V the relative velocity between subject and surrounding vehicle. The establishment of crash severity in Eq. (2) is under an assumption that the collision is inelastic, indicating both the vehicles would move together after the crash. Besides, the relative velocity between the vehicles is calculated using the current velocities at time instant t, rather than the velocities at a future time t + t f .
The PDRF metric is then calculated as the product of the collision probability and the expected crash severity, where c (t +t f ) is the obtained collision probability at time t + t f [3]. Remark 1: Although PDRF is effective to identify the factors influencing driving safety, several shortcomings exist. First, the crash severity is estimated using the velocity at the current time, as the future velocity of the surrounding vehicle is not predicted. Second, the accelerations are assumed to have a normal distribution, and the corresponding parameters need to pre-defined; this may impact the adaptability of the proposed risk metric. Third, the interactions between the subject and surrounding vehicles are not considered in PDRF, which could also impact the collision probability calculation.

B. Prediction Based Driving Risk Metric
Several shortcomings identified in PDRF above motivate us to develop a prediction based risk metric P-PDRF on highways in this section. The proposed driving risk metric is given by the maximum risk value over different time instants within a prediction horizon. At each time instant, the risk is estimated as the sum of weighted risks over each maneuver mode, and each weighted risk is calculated as the product of three factors, i.e., lane-change maneuver mode probability, collision probability and expected crash severity. The three factors are accurately estimated leveraging the proposed twostage multi-modal trajectory prediction model in Section III. An overview of P-PDRF calculation leveraging the proposed prediction model is illustrated in Fig 1. Given a current time t and future time instance t + t f to be predicted, the risk metric P-PDRF is expressed as . . ) denote the lane-change maneuver mode probability, collision probability and expected crash severity at a future time t + t under discrete modes. t + t is the time instance within the prediction horizon (t, t + t f ] with an increment δt, and |m| denotes the number of maneuver modes. In highway driving, there are typically three lane-change maneuver modes, including lane-keeping, turning-left and turning-right. The maneuver mode probability is obtained from the lane-change intention prediction module, as shown at the top of Fig. 2. For each maneuver mode at each future time instant, the prediction model outputs a bivariate normal distribution, i.e., the mean, standard deviation and correlation of the surrounding vehicles positions (in total five parameters in a two-dimensional plane), and the vehicle velocity can be predicted with the interpolation of the predicted mean positions. The predicted subject vehicle velocity is directly obtained under the assumption that the future motion of the subject vehicle is known in advance.
The collision probability between vehicles is calculated under each maneuver mode. Given a predicted maneuver mode i , the PDF of surrounding vehicle position is assumed with a bivariate normal distribution: where the five parameters μ (t + t ) denote the predicted mean and standard deviation along the longitudinal and lateral directions, and the correlation at future time instant t + t for each maneuver mode i , respectively. These parameters are obtained from the multi-modal trajectory prediction model in Section III-B. The superscript (t + t) and subscript i of the five parameters are omitted hereafter for the sake of brevity.
The above PDF of surrounding vehicle position is adopted to address objective motion uncertainties. When the subject vehicle is driven by a human driver, the driver perceives driving risk not only with objective motion uncertainties, but also with a subjective risk caused by surrounding vehicles [42], [43]. Thus, when the proposed risk metric is applied to understand human-driver behaviors, an additional bivariate normal distribution is contributed as a subjective risk factor. Assuming that the two bivariate normal distributions are independent and have the same values of mean and correlation, the PDF of surrounding vehicle position for human drivers is where x and σ h y are standard deviations of the additional bivariate normal distribution along two directions, respectively.
As the future motion of subject vehicle is assumed known in advance, the occupancy of subject vehicle at future time instant t + t can be expressed as a rectangle, whose center is determined by its future position ( ) and size by vehicle length and width. Therefore, the collision probability at future time t + t is constructed as a double integral s ± (W sub + W sur )/2, and L sub , L sur , W sub , W sur are the length and width of subject and surrounding vehicle, respectively. When estimating the collision probability for a human driver, Compared to the calculation of collision probability with pre-defined distribution parameters in PDRF [3], the surrounding vehicle position in the proposed risk metric is represented using specific output values from a predictor. Specifically, the predicted motion position of surrounding vehicle is with a bivariate normal distribution, i.e., Eq. (5) or Eq. (6), which can lead to an efficient double integral as in Eq. (7).
The severity crash at each future time instant can now be more accurately estimated as where V t + t i denotes the predicted relative velocity at time t + t under maneuver mode i , and V t + t i is obtained from the prediction model using trajectory interpolation.
Through constructing the proposed prediction based risk metric P-PDRF, the first two shortcomings of PDRF have been addressed. The third one is considered in the proposed prediction model, which simultaneously employs subject and surrounding vehicle information to address the vehicle interactions.

III. TWO-STAGE MULTI-MODAL TRAJECTORY PREDICTION
The computation of P-PDRF requires future trajectory predictions of surrounding vehicles, which are challenging at interactive scenarios involving maneuver decisions. In this section, to underpin the P-PDRF calculation, we propose a two-stage multi-modal prediction model as illustrated in Fig. 2, which consists of a lane-change intention prediction module and a trajectory prediction module. Here the multi-modes refer to different lane-change maneuver modes being predicted by the lane-change intention prediction module. Unlike existing multi-modal trajectory prediction models [38]- [40], we first predict the lane-change maneuver mode and then predict the corresponding trajectories for each maneuver mode at the second stage. The output of the lane-change intention prediction model is the probability for each pre-defined lanechange maneuver mode, and services as additional input for the trajectory prediction model. The output of the trajectory prediction model are time-series probabilistic trajectory, where the motion uncertainties have been considered in the predicted bivariate normal distribution parameters. We will show in the experimental section that such a separate prediction architecture results in better prediction results. The details of the two-stage prediction model and their training process are introduced as follows.

A. Lane-Change Maneuver Prediction Model
Generally the lane-change maneuver prediction model is formulated to estimate the probability distribution of the future lane-change maneuver mode of a vehicle k at each time instant from t + 1 to t + t f , based on historic information of vehicle k and its neighbors N k := {1, 2, . . . , |N k |} from time t − t h to t. Note that the vehicle k in the prediction model corresponds to a surrounding vehicle when estimating the driving risk in Section II. As shown in Fig. 2, we use vehicle k to denote the vehicle being predicted, and name its surrounding vehicles as neighbors N k to avoid confusions in this section.
The input of the lane-change intention prediction model contains several components. First, the historic positions of vehicle k are considered. The positions are recorded from time t − t h to t, along longitudinal and lateral directions respectively, using a stationary coordinate with the origin fixed at the mass center of vehicle k at time t [38]. Second, two binary values, which indicate whether vehicle k can turn left or turn right, has been added as additional input features [35]. This is because when the vehicle is on the most left/right lane, clearly it cannot further turn left/right and the additional lane related information could improve the prediction accuracy. Third, the lateral deviation between vehicle k and its current lane center is also an important indicator for lane-change intention prediction [33], [44]. The deviation value is normalized between 0 to 1, where 0/1 represents that the vehicle is at the most left/right of the current lane, and 0.5 at the lane center. Fourth, the historic positions of neighbors N k are applied as well to represent the vehicle interactions. The location relations between vehicle k and its neighbors can be classified as eight categories: preceding, following, left/right preceding, left/right alongside and left/right following, corresponding to eight coloured areas shown in Fig. 2. Thus eight LC-LSTM encoders are employed to process each trajectory from the eight categories. A masking layer is added before sending historic trajectories of the neighbors to the LC-LSTM encoders, in case there is no neighbor vehicle for certain categories. The input of the model is then represented as where are the longitudinal and lateral positions for vehicle k and its neighbors from vehicle 1 to |N k |, b (t ) the two binary values to check whether vehicle k can turn left/ right, and d (t ) the deviation value from the current lane center.
The output of the model is a probability distribution for each lane-change maneuver mode from time t + 1 to t + t f .
where m i is the i th maneuver mode. Notice that at each time instant, the sum of the probabilities for each maneuver mode is one. In our work, to enhance the trajectory prediction accuracy at the second stage, three lane-change maneuver modes, lanekeeping and turning left/right are defined as follows. Given a time instant t and future time instance t + t f to be predicted, check the lateral locations of vehicle k between the two time instants: if the vehicle is in the same lane, then the maneuver mode is labelled as lane-keeping; if vehicle k at t + t f has crossed the left/right lane marking, then the maneuver mode is labelled as turning left/right. Same choices of such lane-change definitions are referred to [33], [45]. One advantage of the above maneuver definition is that the labelling process is simple and straightforward. Besides, the vehicle locations in the future can be more accurately predicted, since the defined lane-change maneuver modes provide information whether vehicle k has moved to another lane in the prediction horizon.
The LC encoder illustrated in Fig. 2 is designed to learn the dynamics and interactions of vehicle motions. At each time instant t, the historic information X from time t − t h to t is provided as input. The LC-LSTM encoders have the same parameters after training. The obtained encoding of vehicle k and its neighbors is then passed to the fully connected layers and concatenated together. Finally, in the LC decoder component, an LSTM decoder (i.e., LC-LSTM decoder) combining with a Softmax output layer [46] is designed to generate maneuver probabilities, which serve as part of input of the trajectory prediction model.

B. Trajectory Prediction Model
The designed multi-modal trajectory prediction model is shown at the bottom of Fig. 2, including an LSTM-based encoder (i.e., T Encoder), convolutional social pooling layers and a maneuver-based LSTM decoder. Compared to the work in [38], the output of the decoder in our work is generated using the maneuver probabilities from the first-stage lanechange intention prediction model. Besides, we only consider three lane-change maneuver modes, while two longitudinal modes are classified in [38] as well.
The input of the trajectory prediction encoder is the historic trajectories of the subject and the surrounding vehicles Here we have X T ⊂ X, since besides the historic trajectories, the input of lane-change intention prediction also contains other features. To simplify the expression, X is to be used as input for both two modules thereafter.
The output P(Y|X) is a conditional trajectory distribution over where k ] are the predicted positions of vehicle k within the prediction horizon.
Given the three defined lane-change maneuvers m i (i = 1, 2, 3), the probabilistic multi-modal distributions are calculated as where outputs = [ (t +1) , . . . , (t +t f ) ] are a time-series bivariate normal distributions at each future time instant, corresponding to the means and variances of future locations.
The input of the trajectory prediction model is processed differently compared to that of the lane-change intention prediction model. In line with [38], the areas around vehicle k is divided into a spatial 13 × 3 grid, where each column corresponds to a single lane, and the rows are separated by a distance of 15 feet (≈4.57 meters) which approximately equals the length of one car. The social tensor is formed by populating this grid with the locations of neighbors, and then processed by two convolutional layers and one maxpool layer. For vehicle k, a fully connected layer is directly applied to represent the vehicle dynamics, and then concatenated with the convolutional social pooling results. Here the convolutional social pooling procedure is mainly designed to address the following issue: the grids adjacent to each other become equivalent to ones far away from each other in the fully connected layer. This can lead to problems in generalization to a test set especially if the vehicles can be in various different spatial configurations. Finally, the maneuver-based decoder outputs a multi-modal predictive distribution for the future motion of vehicle k.
We note that the proposed prediction model could be modified for conflict prediction through developing a series of conflict identification criteria and infusing additional relevant input features. However, traffic conflict identification itself is not trivial [47]. A traffic conflict could be defined when a risk metric (e.g., TTC) crosses a pre-defined threshold while it largely ignores other factors that influence traffic conflicts such as speed variance, traffic density, speed and weather conditions. Besides, our aim is to accurately estimate the collision probability based on trajectory predictions of surrounding vehicles; modifying the model for conflict prediction is beyond the research scope of this work.

C. Model Training
In existing literature [38], [40], when a single neural network is established for the multi-modal trajectory prediction, the network is typically trained to minimize the negative log likelihood (NLL) of its conditional distributions as where where m true denotes the actual lane-change maneuver mode for each sample in the training dataset. Note that we do not directly employ (15) to train our prediction model. Instead, given the established two independent neural networks in our work, we can separate the minimization objective as where θ T and θ L are independent network parameters for the trajectory module and lane-change intention module, respectively. Therefore, the two prediction modules can be separated trained. For the trajectory prediction module, it is trained as min

IV. EXPERIMENTS FOR PREDICTION MODELS
In this section, we first introduce the highD dataset and experimental setups for prediction model training and testing. Then the multi-modal trajectory prediction model (denoted as M-C-LSTM) is tested through ablation studies and comparisons with a state-of-the-art prediction model.

A. Dataset and Experimental Setup
We utilize the highD dataset [37], which contains bird-view naturalistic driving data on German highways, to train and test the prediction model. Compared to previously widely applied NGSIM dataset [36], the recent highD dataset was collected by drones with a high-resolution camera and contains smoother vehicle trajectories. We collect the first 20 subdataset and randomly split it into train and test sets. The prediction is conducted at each time instant along a trajectory, which means one trajectory at different time instants corresponds to different sample data. Moreover, the collected data is naturally imbalanced as in most cases, the vehicle maneuver is predicted as the lane-keeping mode. To deal with this, in the default setting, we randomly select equal cases for the three different lane-change categories. In the end, 168390 (56130 for each maneuver mode) and 25113 (8371 for each mode) samples for the training and testing data are selected respectively. The original dataset sampling rate is 25 Hz, and we downsample by a factor of 2 before feeding them to the LSTMs, to reduce the model complexity. We use 2 seconds of historic trajectories and to predict the maneuver mode within a 3-second horizon by default.
The prediction models are trained using Adam with learning rate 0.001. The sizes of encoder and decoder LSTMs are 64 and 128 respectively. A fully connected layer is employed to obtaining the vehicle information encoding and its size is 32. The convolutional social pooling layers consist of a social tensor layer, two convolutional layers, and a max pooling layer. Specifically, the social tensor layer is first formed by populating the spatial grid configuration with the locations of the vehicle being predicted. To learn locally useful features within the spatial grid of the social tensor, a 3×3 convolutional layer with 64 filters (each 3 × 3 filter slides over the input and performs an element-wise multiplication) in addition to a 3 × 1 convolutional layer with 16 filters is then applied. In the end, a max pooling layer, which slides a 2 × 1 filter to take the maximum value of each 2 × 1 grid region from the input, is applied. More detailed descriptions of the convolutional social pooling layers can be found in [38]. The leaky-ReLU activation with α = 0.1 is applied for all layers, and the batch size is set as 128. The model is implemented using PyTorch, and the proposed drving risk metric is coded in MATLAB.

B. Lane-Change Intention Prediction
As illustrated in Fig. 2, the proposed multi-modal trajectory prediction model includes a lane-change intention prediction module. The prediction accuracy of the lane-change intention module is crucial, since it provides maneuver mode probabilities as input to the trajectory prediction module.
To validate the performance of the proposed lane-change intention model, an ablation study is processed as follows. First, an LSTM-based decoder-encoder network, which only utilizes historic track positions of the vehicle being predicted without considering surrounding information, is tested as a baseline method. Given its LSTM-based decoder-encoder structure, this prediction model is regarded as a typical Vanilla LSTM (V-LSTM) [48]. Then in line with [35], two binary values, which indicate whether the vehicle being predicted can turn left or turn right, has been added as additional input features. Potentially the additional lane related information could improve the prediction accuracy. The LSTM with additional lane information is denoted as L-LSTM. On the other hand, based on V-LSTM, the surrounding vehicle information can be considered using an LSTM encoder and a fully connected layer, which is the same as the vehicle dynamic representation of the vehicle being predicted (denoted as S-LSTM). We also consider combining the lane information and surrounding information together, and denote the model as LS-LSTM. Moreover, the lateral deviation between the vehicle being predicted and its current lane center is also an important indicator for lane-change intention prediction [33]. Then the proposed lane-change intention prediction model is referred as C-LSTM, which considers binary lane information of the vehicle being predicted, surrounding information using the fully connected layer, and lateral deviation from the current lane center. Besides, a state-of-the-art trajectory prediction model SC-LSTM [38], which also provides maneuver mode probabilities, is implemented as well for comparisons. A summary of the LSTM-based lane-change intention prediction models is provided in Table I. The comparative results for lane-change intention prediction on testing data among SC-LSTM, V-LSTM, L-LSTM, S-LSTM, LS-LSTM and C-LSTM are reported in Table II. It is interesting to observe that the SC-LSTM, which utilizes the surrounding information in a convolutional manner, achieves the worst performance for lane-change intention prediction; even the V-LSTM without surrounding information performs slightly better than SC-LSTM. This indicates that the social surrounding information after convolutional pooling process has negative impacts on lane-change intention prediction. This may be due to that the convolutional and maxpool layers applied in SC-LSTM aggregate geometry information, and then the lane location cannot be distinguished. On contrast, S-LSTM which utilizes surrounding vehicle information with a fully connected layer achieves 93.96% prediction accuracy in average. Compared to V-LSTM, L-LSTM with additional binary lane information shows improved prediction results. When combining binary lane information and the surrounding information that processed with fully connected layers, LS-LSTM realizes better performance than that of L-LSTM, especially for the lane keeping cases. This indicates a suitable representation of surrounding vehicles information can effectively underpin the lane-change intention predictions. Finally, the proposed C-LSTM with complete information provides the overall best prediction results. As discussed before, the naturalistic driving trajectory data is imbalanced, in which around 97.5% of the highD data belongs to the lane-keeping mode. When directly employing the imbalanced for training and testing, the prediction results are reported in Table III. Although it achieves an almost 100% prediction accuracy in terms of the lane-keeping mode, the prediction results for lane-change modes become worse, especially for the recall metrics. However, the trajectories belonging to turning-left/-right modes normally correspond to safety-critical scenarios, where higher prediction accuracy should be realized. Thus the lane-change intention prediction model trained with imbalanced data is not desirable.
The prediction horizon could also have a huge impact to the lane-change intention prediction. Based on C-LSTM, we test the lane-change intention prediction model with different prediction horizons (i.e., 1, 2 and 3 seconds) and the results are listed in Table IV. The results clearly are reasonable; as the prediction horizon becomes shorter, the prediction accuracy is increased as well.
We further evaluate the performance of the proposed lane-change intention prediction model using the default balanced training data and an imbalanced testing data. Over 700000 samples are randomly selected from highD, in which around 97.5% samples in the imbalanced testing data belong to the lane-keeping mode. The prediction results with different prediction horizons and the imbalanced testing data are listed in Table V. As expected, the recall with respect to different lane-change modes is similar to that in Table IV using the balanced testing data, as the recall values do not significantly change with different sample mode ratios. On the other hand, the precision of lane-keeping samples is close to 100%, as around 97.5% imbalanced testing data are lane-keeping.  II   COMPARISON RESULTS AMONG LANE-CHANGE INTENTION PREDICTION MODELS USING 3-SECOND PREDICTION HORIZON AND BALANCED TRAINING  DATA. COLUMNS PR, RE, F1 AND WEIAVE STAND FOR THE PRECISION, RECALL, F1 SCORE AND OVERALL WEIGHTED AVERAGE VALUES  RESPECTIVELY. BOLD NUMBERS INDICATE THE BEST PERFORMANCE IN TERMS OF THE CORRESPONDING METRICS   TABLE III  Due to the same reason, the precision of turning-left/-right samples decreases. We sacrifice the precision performance to maintain a more accurate recall for the turning-left/-right samples in the imbalanced testing data, since cut-in scenarios frequently occur and are potentially risky on highways [49].

C. Trajectory Prediction
Based on the lane-change intention prediction module C-LSTM, we test the performance of the proposed twostage multi-modal model M-C-LSTM for trajectory prediction. Again, the multi-modal prediction model SC-LSTM with social pooling layers [38] is introduced for comparisons. The experimental results are reported in Table VI, Table IV for lane-change intention prediction.
Since M-C-LSTM predicts multi-modal trajectories using two separate neural networks, one should also evaluate its capability for online applications. According to the experiments, the average run time of M-C-LSTM is 7.0 milliseconds, while for CS-LSTM, the run time per prediction is 2.7 milliseconds. Although the run time increases over 150% compared to CS-LSTM, M-C-LSTM is still able to be applied in a real-time manner.

V. EXPERIMENTS FOR DRIVING RISK METRIC
The proposed multi-modal trajectory prediction model has been verified in Section IV. Based on the prediction model, we further test the proposed driving risk metric P-PDRF with simulated safety-critical scenarios as well as empirical trajectory dataset highD.  V   LANE-CHANGE INTENTION PREDICTION RESULTS OF C-LSTM USING DIFFERENT PREDICTION HORIZONS, BALANCED TRAINING DATA AND  IMBALANCED TESTING DATA. COLUMNS PR, RE, F1 AND WEIAVE STAND FOR THE PRECISION, RECALL, F1 SCORE  AND OVERALL AVERAGE WEIGHTED VALUES RESPECTIVELY   TABLE VI   TRAJECTORY

A. Risk Metric on Simulated Safety-Critical Scenarios
To evaluate the proposed risk metric based on the prediction model M-C-LSTM, we consider simulated safety-critical cutin scenarios, since the cut-in scenarios frequently occur and are potentially risky on highways [49]. Here we assume that the subject vehicle is autonomous, thus the additional bivariate normal distribution, which represents human perceived risk, is not considered.
As shown in Fig. 3  leading to entirely 20 × 20 = 400 simulations. In line with the above settings, there are in total 85 crashes cases identified when 1 ≤ V sub − V sur ≤ 5 m/s. Both PDRF and P-PDRF are repetitively calculated to evaluate driving safety for each 0.08 seconds. At each current time instant t, the risk metrics are calculated based on a prediction horizon t f = 3 seconds with δt = 0.2 seconds. The risk metric TTC is also introduced as baseline for comparisons. As shown in Table VII, the threshold of the three metrics are respectively calibrated for prediction accuracy. Setting a risk threshold value 50 and 100 Joules for PDRF and P-PDRF respectively, all simulated cases have been correctly identified as crash or non-crash cases with either PDRF and P-PDRF. Note that PDRF and P-PDRF do not share a same risk threshold, as their collision probability calculation is different. PDRF considers the maximum vehicle acceleration/deceleration to obtain a reachable space, leading to a relatively lower collision probability before a crash; instead, P-PDRF utilizes a predicted compact PDF, thus leading to a higher collision probability. Using a 3-second threshold, TTC achieves its best prediction accuracy, including 48 out of 400 false negative cases. This is due to that TTC is not flexible in dealing with two-dimensional Statistically, for the crash cases, the average difference between the time when TTC reaches the threshold and the time when crash occurs is 1.31 seconds (i.e., the timeliness of TTC in Table VII). TTC has the worst performance in terms of the timeliness, as TTC is invalided when the vehicles are not in the same lane. The average timeliness is increased to 2.49 seconds using PDRF. P-PDRF has a sharper increasing curve and reaches 50 and 100 Joules at 2.40 and 2.89 seconds, respectively. When P-PDRF reaches the 100-Joule threshold, it can provide an alert 3.43 seconds earlier before crashes. Even though the threshold of P-PDRF is greater than that of PDRF, P-PDRF is demonstrated to be more agile and effective to identify potential crashes thanks to the integrated prediction model.
The effect of the prediction model on the risk assessment can also be observed from Table VII. When the prediction model proposed in Section III is integrated for risk assessment (i.e., using P-PDRF), the risk assessment outperforms with respect to different evaluation indicators. When the prediction model turns off, P-PDRF degrades to PDRF, which uses a set of pre-defined bivariate normal distribution parameters to consider trajectory predictions. The performance of PDRF is then worse than that of P-PDRF. Furthermore, if we do not consider any prediction approaches/models for risk assessment (i.e., using TTC), the performance is the worst.
To have a closer look at the risk assessment results, the simulated cut-in scenario with V sub = 31 m/s and V sur = 28 m/s is analysed in Fig. 4. As shown in Fig. 4(b), in the first 4 seconds, the proposed risk metric P-PDRF with the trajectory prediction model M-C-LSTM provides a highest collision probability under turning-right maneuver mode, followed by the collision probability for PDRF. From 4.0 to 7.4 seconds, all probabilities are close to one, since the vehicles are too close to avoid crashes, until the subject vehicle has been in front of the surrounding vehicle at 7.4 seconds. Besides, as seen in Fig. 4(c), the lane-change intention prediction module C-LSTM realizes an overall 87.8% prediction accuracy, which is lower than the average prediction accuracy 95% based on a testing set from highD. This may be that C-LSTM was trained on a NDD, i.e., highD; the prediction performance could be influenced as the simulated trajectory may represent different driving behaviors compared to NDD. Nevertheless, the lane-change intention prediction module C-LSTM successfully identifies that the surrounding vehicle in this simulated cut-in scenario is on a turning-right process.
Given the collision and maneuver probabilities, P-PDRF has a sharper increase and a higher peak value compared to PDRF, as observed in Fig. 4(f). P-PDRF has a generally decreasing trend from 2 to 7.4 seconds. This is due to the predicted velocity of surrounding vehicle in Fig 4(d) and Fig 4(e). While the predicted longitudinal velocity of surrounding vehicle fluctuates around 28 m/s, the predicted lateral velocity decreases from 1 to 0 m/s. As for PDRF, it reaches the peak around 4.5 seconds and then slightly decrease (due to crash severity changes from lateral velocities) until the subject vehicle has been in front of the surrounding vehicle at 7.4 seconds. In fact, the crash has happened before at 4.7 seconds if the vehicles follow the simulated trajectories.
When we set up a 100-Joule threshold for P-PDRF to execute an emergency brake for the subject vehicle, the brake will be activated at 1.5 seconds to avoid a potential crash at 4.7 seconds. While we estimate driving safety using PDRF with a threshold 50 Joules, the vehicle is to brake at 1.8 seconds, which indeed falls behind the reaction time obtained via the P-PDRF threshold.
We then analysis the computational efficiency of the risk metrics. Recall that the risk metrics P-PDRF/PDRF are calculated at every 0.08 seconds. To enable real-time risk metric calculations, the risk metrics should be efficiently obtained at least less than 0.08 seconds. As concluded in Table VII, the run time for P-PDRF per calculation is 0.018 seconds in average, including 0.007 seconds for the prediction model. As for PDRF, the average run time is 0.02 seconds. TTC is far more efficient, since no efforts are needed to calculate collision probabilities. In a nutshell, P-PDRF has been verified to be capable of real-time applications.

B. Risk Metric on NDD
Furthermore, we apply risk metrics to highD, which contains naturalistic human-driving trajectories. One cut-in case in highD dataset is selected to compare the risk metrics P-PDRF and PDRF. Detailed results are illustrated in Fig. 5. As shown in Fig. 5(e), the PDRF without the prediction model has a tiny value in the beginning and then goes down to zero. This is because that without considering the lane-change intention of the surrounding vehicle, the collision probability between the subject and surrounding vehicles are clearly low, as they were in different lanes in the first 4 seconds. While the surrounding vehicle crossed the lane marker afterwards, the longitudinal space between the vehicles is relatively large, leading to zero collision probability. Thus the PDRF remains as zero. However, as shown in Fig. 5(f), the human driver of the subject vehicle decelerated all the time, and the deceleration reached the peak when the surrounding vehicle was about to cross the lane marker. The PDRF value without considering prediction model failed to explain the driver deceleration behavior.
When the prediction model is integrated in the proposed risk metric, we can first observe that the lane-change intention prediction module C-LSTM works well in this cut-in scenario, and achieved prediction accuracy over 95% as shown in Fig. 5(c). The velocity of surrounding vehicle is also accurately predicted by the proposed trajectory prediction model as shown in Fig. 5(d). However, the collision probability under each maneuver mode is close to zero. This is due to that the two vehicles kept a sufficient safe distance in line with Fig. 5(b); even if the subject vehicle did not decelerate, no crash would occur. Under such driving scenario, the prediction model M-C-LSTM outputs positions of the surrounding vehicle, aiming to minimize the NLL of the conditional multimodal distributions. The human driver's deceleration behavior was mainly due to a subjectively perceived risk. The current position distribution for surrounding vehicle from M-C-LSTM therefore cannot fully represent the human-perceived motion uncertainties; an additional bivariate normal distribution is further needed for a human driver. In doing so, the PDF of surrounding vehicle position in Eq. (5) utilized to calculate the collision probability is now replaced by Eq. (6). The P-PDRF calculated by Eq. (6) is illustrated in Fig. 6. The P-PDRF values indeed have been changed with different additional distribution parameters, where the lateral standard deviation is fixed as 1 metre, and the longitudinal standard deviations are set as 5, 10 and 20 meters, respectively. As expected, the collision probability is increased through adding additional standard deviations, while it is always below 0.1 under different additional standard deviations. Besides, with a higher additional deviation, the collision probability and P-PDRF decrease to zero later. Specifically, when the additional longitudinal deviation is set as 20 meters, the P-PDRF is always greater than zero, which can be used to explain the deceleration behaviors of the human driver. One may also set additional acceleration deviations for PDRF, while oscillations are observed from updated PDRF values, which still cannot correspond to driver behaviors. Nevertheless, the selection of the additional standard deviation for P-PDRF is relevant to driving behaviors of the human driver [43]. Tuning suitable additional standard deviations is left for our future research.

VI. CONCLUSION
A prediction-based probabilistic driving risk field metric on highways (P-PDRF) has been proposed in this work. The proposed P-PDRF can anticipate potential risk in the future through leveraging a two-stage multi-modal trajectory prediction model, and thus does not rely on a system dynamics model, and a known normal distribution of surrounding vehicles. Meanwhile, the two-stage structure of the proposed prediction model can first anticipate the lane-change behaviors of the surrounding vehicle, and then provide more accurate trajectory prediction results. The prediction model is replaceable as long as it can provide multi-modal probabilistic trajectory prediction results with bivariate normal distributions. The proposed prediction model is trained and tested with an empirical driving dataset highD. The lane-change intention prediction module is first tested through ablation studies as well as comparative experiments, and has achieved a 95% prediction accuracy. Compared to a state-of-the-art approach, the overall trajectory prediction performance is further validated in terms of four evaluation indicators. Under a 3-second prediction horizon, the predicted average and final displacement errors are less than 0.5 and 1.4 meters, respectively. P-PDRF identifies potential crashes 3.43 seconds earlier in average, compared to 2.49 seconds with PDRF. The risk metric TTC has even worse performance in terms of prediction accuracy and timeliness, since TTC is not flexible dealing with twodimensional motion. We also showed that the average computational time for P-PDRF is less than 0.02 seconds on average. Moreover, P-PDRF is adopted to analyse cut-in scenarios in the highD dataset, and the additional standard deviations representing human perceived risks are demonstrated to be effective in explaining the human driver behaviors. Both the effectiveness and efficiency of P-PDRF have been verified.
Future research on driving risk assessment may be oriented to two directions. First, the lane location information, e.g., whether vehicles are near ramp areas, can affect the lane-change intention prediction. How to integrate the lane location information for better lane-change intention and trajectory predictions has not been addressed. Second, the proposed driving risk metric can only separately evaluate risk between a subject vehicle and each surrounding vehicle. Whether and how multiple surrounding vehicles can jointly impact the safety deserves further investigation.