Loading web-font TeX/Math/Italic
Teaching Vehicles to Anticipate: A Systematic Study on Probabilistic Behavior Prediction Using Large Data Sets | IEEE Journals & Magazine | IEEE Xplore

Teaching Vehicles to Anticipate: A Systematic Study on Probabilistic Behavior Prediction Using Large Data Sets


Abstract:

By observing their environment as well as other traffic participants, humans are enabled to drive road vehicles safely. Vehicle passengers, however, perceive a notable di...Show More

Abstract:

By observing their environment as well as other traffic participants, humans are enabled to drive road vehicles safely. Vehicle passengers, however, perceive a notable difference between non-experienced and experienced drivers. In particular, they may get the impression that the latter ones anticipate what will happen in the next few moments and consider these foresights in their driving behavior. To make the driving style of automated vehicles comparable to the one of human drivers with respect to comfort and perceived safety, the aforementioned anticipation skills need to become a built-in feature of self-driving vehicles. This article provides a systematic comparison of methods and strategies to generate this intention for self-driving cars using machine learning techniques. To implement and test these algorithms we use a large data set collected over more than 30 000 km of highway driving and containing approximately 40 000 real-world driving situations. We further show that it is possible to classify driving maneuvers upcoming within the next 5 s with an Area Under the ROC Curve (AUC) above 0.92 for all defined maneuver classes. This enables us to predict the lateral position with a prediction horizon of 5 s with a median lateral error of less than 0.21 m.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 22, Issue: 11, November 2021)
Page(s): 7129 - 7144
Date of Publication: 25 June 2020

ISSN Information:

No metrics found for this document.

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Automated driving has the potential to radically change our mobility habits as well as the way goods are transported. To enable driving automation, several processing steps have to be executed. Fig. 1 illustrates this thought: In the first step, the current traffic scene has to be sensed and a proper representation of the environment needs to be generated. Using this information, the given traffic situation needs to be interpreted and the behavior of others has to be anticipated. Subsequently, a plan, i.e. a trajectory, is derived based on this knowledge. Finally, this plan is executed in the last step of this process. How long the trajectory stays viable, before it has to be re-planned, is strongly influenced by the capability of the prediction component.

Fig. 1. - Long-term driving behavior predictions in the context of trajectory planning for automated driving (equal symbols denote simultaneity).
Fig. 1.

Long-term driving behavior predictions in the context of trajectory planning for automated driving (equal symbols denote simultaneity).

As opposed to other research works dealing with techniques to interconnect vehicles through a so called car-to-car communication, we aim to solve this anticipation task locally. On one hand, it is not foreseeable when an adequate market penetration of vehicles with such techniques will be reached. On the other, a local prediction component always becomes necessary, as there are several traffic participants without communication abilities such as bicyclists. In addition, local predictions might become necessary to bypass transmission times in certain cases as emphasized by [1]. Moreover, it is reasonable to approach the topic from the perspective of highway driving, as this use case is easier to realize than others due to its clear constraints (e.g. structured setting, absence of pedestrians). However, for the prediction task this implies the challenge to create precise long-term predictions (2 to 5 s) rather than short forecasts (up to 2 s), as in highway scenarios higher velocities can be expected than in urban or rural areas.

A. Problem Statement

We tackle the challenge of anticipating the behavior of other traffic participants in highway scenarios. In particular, we aim to generate information that can be processed by trajectory planning algorithms to implement an anticipatory driving style. In this context, our objective is to model future vehicle positions within a time t in longitudinal x_{t} and lateral y_{t} direction as spatial distributions x_{t} \sim p_{x} , y_{t} \sim p_{y} rather than estimating single shot predictions \hat {x}_{t} and \hat {y}_{t} respectively. Note that these distributions are more useful for down-streamed criticality assessments as they enable us to represent several alternative hypotheses at a time with their particular frequencies. Despite the focus on highway driving, the presented methods shall be general enough to be appropriate in other environments as well.

B. Problem Resolution Strategy

This article presents a systematic workflow for the design and evaluation of a lightweight maneuver-based model [2], which uses standard sensor inputs to perform long-term driving behavior predictions. Methodically, we build on [3] and use a two-step Mixture of Experts (MOE) approach. This includes a maneuver classification and a down-streamed behavior prediction. The maneuver probabilities \{P_{m}\}_{\forall m \in M} determined by the classifier are used in the Mixture of Experts approach as gating nodes. Specifically, the probabilities control the weighting w_{m} of the respective expert distributions p_ {y, m} , while calculating the overall distribution of future vehicle positions p_{y} . Eq. 1 summarizes this procedure for the lateral direction (equivalent for x ):\begin{align*} y_{t}\sim&p_{y}(\Theta _{y}, I, t) \\=&\sum _{m \in M}{p_{y, m}(\theta _{y, m}| I, t) \cdot w_{m}(I)} \tag{1}\end{align*}

View SourceRight-click on figure for MathML and additional features.

The set of maneuvers M is defined as follows:\begin{equation*} M = \{LCL, FLW, LCR\} \tag{2}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Different weighting approaches based on the maneuver probabilities are presented in Sec. VII. The expert distributions p_{y, m} are modeled as Gaussian Mixture Models (GMMs) in the combined input and output space with K components according to Eq. 3, and are used in a Gaussian Mixture Regression manner. Hence, they are conditioned by the input features I and the prediction time t (cf. Eq. 1).\begin{equation*} p_{y, m}(\theta _{y, m}) = \sum _{i=1}^{K}{\phi _{y, m, i} \cdot \mathcal {N}(\mu _{y, m, i}, \Sigma _{y, m, i})} \tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

The parameters of the GMMs are subsumed in \Theta _{y} :\begin{equation*} \Theta _{y} = \{\theta _{y, m}\}_{\forall m \in M} = \{\phi _{y, m}, \mu _{y, m}, \Sigma _{y, m}\}_{\forall m \in M}\tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

In addition, we introduce an alternative methodology to the Mixture of Experts approach, integrating the outputs of the gating nodes into one single model. This simplifies Eq. 1 as follows:\begin{equation*} y_{t} \sim p_{y}(\theta _{y, IGMM}| I, t, P_{LCL}(I), P_{LCR}(I)) \tag{5}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

For implementing the models, we use out-of-the-box modules from the widely used frameworks Apache Spark MLlib [4] (classifiers) and Scikit-learn [5] (GMMs).

Altogether, we contribute a systematic workflow for designing and evaluating the prediction models as well as methodical extensions to known approaches. Moreover, we assess the performance of the developed modules for the two tasks of predicting (1) driving maneuvers and (2) probability distributions of future positions both separately and in combination. To evaluate the modules, we utilize a large data set comprising real-world measurements. As will be shown, our prediction models outperform established state-of-the-art approaches.

The remainder of this article is organized as follows: Sec. II discusses related work on object motion prediction, emphasizing the value added by our approach. Sec. III introduces the data set and describes the preprocessing steps applied to it. Sec. IV outlines the training of the considered maneuver classifiers, whereas Sec. V deals with the experimental evaluation and the performance of the classifiers. Based on these findings, Sec. VI develops different approaches for estimating probability distributions of future vehicle positions, which are then assesed in Sec. VII. Finally, Sec. VIII summarizes the article and gives an outlook on future work.

SECTION II.

Related Work

Regarding the understanding and prediction of the behavior of other traffic participants in highway scenarios, various aspects were investigated in literature. Accordingly, this section is sub-divided into three parts: Sec. II-A presents approaches inferring the kind of maneuver that will be executed by a vehicle. Note that applications like collision checkers or trajectory planning algorithms cannot directly process such kind of information. Instead, probabilities of future vehicle positions or trajectories need to be predicted. Related research on this topic is presented in Sec. II-B. Bringing together the aspects of maneuver classification and position prediction, Sec. II-C gives an overview of hybrid prediction approaches. Finally, Sec. II-D closes the section with a brief literature discussion, leading to the contributions of this article in Sec. II-E.

A. Classification Approaches

Classification approaches for maneuver recognition are described in [1], [6]–​[8]. In [1], a system is introduced, which is capable of detecting lane changes with high accuracies (>99%), approximately 1 s before their occurrence. For this purpose, dynamic Bayesian networks are used. Another approach, which is capable of detecting lane changes approximately 1.5s before their occurrence, is presented in [6]. To achieve this, the lane change probability is decomposed into a situation- and a movement-based component, resulting in an F_{1} -score better than 98%. The approach presented in [7], in turn, shows that it is possible to detect lane changes up to time horizons of 2 s when using feature selection for scene understanding, with an Area Under the Curve (AUC) better than 0.96. Moreover, [8] combines interaction-aware heuristic models with an interaction-unaware learned model. The interaction-aware component relies on a multi agent simulation based on game theory, in which each agent simultaneously tries to minimize different cost functions. These cost functions are designed using expert knowledge and consider traffic rules. In a second step, the output of the interaction model is used to condition an interaction-unaware classifier based on Bayesian networks. The approach is able to detect lane changes on average 1.8s in advance, with an AUC better than 0.93.

B. Trajectory and Position Prediction Approaches

Approaches dealing with the prediction of trajectories and positions are presented in [9]–​[13]: [9] uses a fully-connected Deep Neural Network to learn the parameters of a two-dimensional GMM. For each situation, an adapted Gaussian Mixture distribution models the probability density in the output dimensions a_{x} and v_{y} (cf. Tab. XII). This distribution is then sampled to estimate trajectories. The authors evaluate their approach with the widely used NGSIM data set [14] and show that a root weighted square error (comparable to RMSE) of approximately 0.5m in lateral direction at a prediction horizon of 5 s can be achieved.

TABLE I Data Set Identifiers and Sizes
Table I- 
Data Set Identifiers and Sizes
TABLE II Summary of Examined Feature Selection Techniques
Table II- 
Summary of Examined Feature Selection Techniques
TABLE III Optimized Hyperparameters Per Classifier
Table III- 
Optimized Hyperparameters Per Classifier
TABLE IV Definition of the Detection Time Metrics
Table IV- 
Definition of the Detection Time Metrics
TABLE V Summary of Examined Classifiers With Preferred Hyperparameters
Table V- 
Summary of Examined Classifiers With Preferred Hyperparameters
TABLE VI Contextual Features Solely Impacting Special Situations
Table VI- 
Contextual Features Solely Impacting Special Situations
TABLE VII AUC Values in Comparison to Reference Works
Table VII- 
AUC Values in Comparison to Reference Works
TABLE VIII Selected Feature Sets and Hyperparameters Per Classifier
Table VIII- 
Selected Feature Sets and Hyperparameters Per Classifier
TABLE IX Per Sample Log-Likelihoods With Different Classifiers and MOE Strategies
Table IX- 
Per Sample Log-Likelihoods With Different Classifiers and MOE Strategies
TABLE X Comparing Lateral Prediction Performance With Related Works
Table X- 
Comparing Lateral Prediction Performance With Related Works
TABLE XI Prediction Errors Per Class and Direction
Table XI- 
Prediction Errors Per Class and Direction
TABLE XII Description of the Evaluated Features f of an Observed Vehicle o and Usage of the Features in the Constructed Feature Sets ( A-D )
Table XII- 
Description of the Evaluated Features 
$f$
 of an Observed Vehicle 
$o$
 and Usage of the Features in the Constructed Feature Sets (
$A-D$
)

Another approach, also evaluated with the NGSIM data set, is presented in [10]. The authors propose the use of a Long Short Term Memory network for predicting trajectories. In particular, the approach is able to compute single shot predictions with an RMSE of approximately 0.42m at a prediction horizon of 5 s. Reference [11] deals with the prediction of spatial probability density functions, especially at road intersections. More precisely, a conditional probability density function, which models the relationship between past and future motions, is inferred from training data. Finally, standard GMMs and variational approaches are compared. In [12], this approach is extended by a hierarchical Mixture of Experts that allows to incorporate categorical information. The latter includes, for example, the topology of a road intersection.

In [13], a Gaussian Mixture Regression approach for predicting future longitudinal positions as well as a procedure for estimating the prediction confidence are introduced.

C. Hybrid Approaches

Approaches that combine strategies for both maneuver detection and trajectory or position prediction, similar to the approach presented in this article, are described in [15]–​[20]. In the following, we denote such approaches as hybrid.

Reference [15] presents a two-staged approach: In the first step, a Multilayer Perceptron (MLP) is used to estimate the future lane of a vehicle. In a second step, a concrete trajectory realization is estimated with an additional MLP. As a result, the lane estimation module is able to detect lane changes 2 s in advance with an AUC better than 0.90. The evaluation of the trajectory prediction module shows a median lateral error of approximately 0.23m at a prediction horizon of 5 s.

Reference [16] proposes another hybrid approach that uses the prediction of future trajectories to forecast lane change maneuvers. Moreover, the intention of drivers is modeled using a Support Vector Machine. Subsequently, the resulting action is checked for collisions. This enables the approach to model interrupted lane changes. During the evaluation, an F_{1} -score of 98.1% with a detection time up to 1.74s is achieved.

In turn, [17] does not follow such a hybrid approach, but contains an intermediate step before predicting trajectories. Instead of learning maneuver probabilities, the authors present a regression technique for estimating the time span to the next lane change relying on Random Forests. In [18], this approach is extended and combined with findings from [6]. The estimated time up to the next lane changes to the left and to the right are used as input for a cubic polynomial which is intended to predict future trajectories. Finally, the approach is evaluated with the mentioned NGSIM data set, showing a median lateral error of approximately 0.5m at a prediction horizon of 3 s for lane changing scenarios, assuming a perfect maneuver classification.

Reference [19] proposes the use of a maneuver recognition based on a Hidden Markov Model, distinguishing between ten maneuver classes. Based on this model, a position prediction module, which combines several maneuver specific variational GMMs (according to [11]) and an Interacting Multiple Model, which weights different physical models against each other, are implemented. As the approach uses ten maneuver classes and as the errors are only measured in terms of Euclidean distance, the results are difficult to compare with the ones of other approaches. Additionally, the approach is evaluated on a rather small data set. Finally, in [20] these findings are pursued by the use of a Long Short Term Memory network. The authors demonstrate certain improvements compared to their previous work, while using the NGSIM data set for evaluation purposes.

Reference [3] presents an approach predicting future lateral vehicle positions based on Gaussian Mixture Regression and a Mixture of Experts with a Random Forest as gating network. The approach is evaluated based on a small data set, leading to noisy results, especially in case of lane changes. The evaluation shows that the approach is able to perform maneuver classifications with an AUC better than 0.84 and lateral position predictions with a median error of less than 0.2m at a prediction horizon of 5 s.

D. Discussion

The findings of our literature survey can be summarized as follows: Many works provide meaningful algorithmic contributions. However, in numerous cases we miss structure regarding the problem resolution strategy. Often, it does not become clear how the approaches compare to any baseline (e.g. [19]). Moreover, parameters (e.g. [16]) and feature sets (e.g. [10]) are selected manually, and are thus difficult to retrace. In addition, most approaches focus on short or medium prediction horizons (e.g. [1]), or lack a good prediction performance for larger time-horizons (e.g. [18]). When analyzing the approaches that aim to resolve the long-term prediction problem, it becomes clear that the latter is challenging as the prediction models become significantly more complex as, e.g., pointed out by [7], [8] and [21].

Moreover, many approaches (e.g. [10]) aim to predict single trajectories or single shot predictions rather than probabilistic distributions of future vehicle positions. Therefore, the objective to be optimized is mostly the root-mean-square error (RMSE ). As opposed to these works, we consider the objective of the learning problem as generating an estimator that models a probability distribution of positions reflecting the frequencies of all observed positions, e.g., for different drivers in the same situation. Thus, we aim to maximize the likelihood of truly occupied positions given the model. As reasoning behind this design choice, such distributions contain significantly more information than single shot predictions. Thus, they are more useful for applications that need to consider risks, like, for example, maneuver planning approaches as presented in [11], [22], [23].

E. Contributions

The contribution of this article is threefold:

  1. We apply a heuristic-free machine learning workflow to generate a model capable of predicting maneuvers and precise distributions of future vehicle positions for time horizons up to 5 s (reasonable in terms of comparability). This is achieved with a machine learning workflow that omits any human tuned (hyper-) parameters when constructing the classifiers. Note that this includes all aspects involving feature engineering, labeling, feature selection, and hyperparameter optimization for different classification algorithms. Regarding feature engineering and selection, this means that we construct a data set with a large superset of all features, which are potentially relevant for the problem solution beforehand. Afterwards we select a more or less small feature set that still ensures maximum predictive power through an automated feature selection process.

  2. We evaluate the modules for maneuver classification and position prediction, where both parts are not only evaluated separately, as in other works (e.g. [18]), but as a combined prediction system as well. This concerns the lateral as well as the longitudinal behavior. In this context, we show that directly feeding the results of the classifier into the regression problem produces results comparable to an Mixture of Experts approach. Additionally, we show that relying on the Markov assumption and not modeling the interactions between the traffic participants explicitly, allows producing superior results compared to existing approaches. As opposed to these works, we integrate the different aspects of behavior prediction, which comprise the prediction of driving maneuvers and positions both in lateral and longitudinal direction. In addition, we introduce new methodologies and conduct a large-scale evaluation.

  3. We demonstrate that the presented methods not only have the potential to outperform state-of-the-art approaches when feeding them with a sufficient number of data. Additionally, we show that our approach is able to provide a meaningful estimate of the prediction uncertainty to the consumer of the information, which is beneficial for collision risk calculation and trajectory planning (e.g. [22]).

SECTION III.

Data Preparation & Experimental Setup

Sec. III-A introduces the considered data set and the experimental setup. Sec. III-B then gives a detailed overview of the features used to train our models. Afterwards, Sec. III-C introduces the labeling process. Finally, Sec. III-D deals with the data set split for training, validating and testing the constructed models as well as further preprocessing steps. Fig. 2 summarizes the overall preprocessing workflow.

Fig. 2. - Preprocessing steps used in the proposed workflow (respective sections are referred in the boxes).
Fig. 2.

Preprocessing steps used in the proposed workflow (respective sections are referred in the boxes).

A. Data Collection

For modelling and evaluating our modules, we use measurement data from a fleet of testing vehicles [24] equipped with common series sensors. The sensor setup includes a front-facing camera detecting lane markings as well as two radars observing the traffic situation in the back. In addition, the vehicles have a front-facing automotive radar to sense the distances and velocities of surrounding vehicles. The data has been collected with different vehicles and drivers at varying times of the day during all seasons. The data collection campaign spanned over more than a year and was mainly restricted to the area around Stuttgart in Germany. Through the wide variance, we are expecting our models to achieve good generalization characteristics.

Unlike other contributions (e.g. [3]), we are not using the actual object-vehicles as prediction target o in this work, but rather the ego- (or measurement-) vehicle itself. However, as our work of course focuses on the prediction of surrounding vehicles, we solely use features that are observable from an external point of view, as postulated in other works (e.g. [1] or [16]). Note that this constraint excludes features like driver status or steering wheel angle. Thus, the models remain applicable to actual object-vehicles, assuming a good sensing of their surrounding. Working with the ego-vehicle data offers several advantages concerning the modeling of situations: First, each situation can be described in a similar way, as situations in which relevant neighboring vehicles to the target-vehicle are hidden for the measurement-vehicle can not occur. In addition, all measurements span longer time periods as the target-vehicle can never disappear from the field of view. This way of data handling is widespread in literature (e.g. [6]). In addition, one can expect that future sensor setups will minimize measurement uncertainty for perceived objects and will get closer to the data quality that is nowadays available for the ego-vehicle.

Basically, our investigations rely on a similar environment model than the one presented in [7], modeling the surrounding with a fixed grid of eight relation partners. But opposed to [7], we use the ego-vehicle as prediction target. For this purpose, we slightly adapt the environment model: As the sensors facing the rear traffic in the testing vehicles are less capable than the ones facing the front, our environment model (cf. Fig. 3) distinguishes between relation partners behind (index rb ) and in front of (index rf ) the prediction target o . Thus, the relation vectors of the rear objects R_{rb} are shortened compared to the ones of the front objects R_{rf} . The relation vectors describe the relation between the respective object and the prediction target. Object-vehicles on the same lane as o and driving behind o are left out, as the current sensor setup is not able to sense them. Consequently, a traffic situation can be described by the feature vector F_{sit} , which contains the relations of o and its seven relation partners, its own status F_{o} , and the infrastructure description F_{infra} (cf. Eq. 6):\begin{align*} F_{sit}=&[R_{rf}(r=fl), R_{rf}(r=f), R_{rf}(r=fr), \\&R_{rf}(r=l), R_{rf}(r=r), \\&R_{rb}(r=rl), R_{rb}(r=rr), \\&F_{o}, F_{infra}]^{T} \tag{6}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Fig. 3. - Environment model used for our investigations.
Fig. 3.

Environment model used for our investigations.

A detailed listing of the particular elements of the relation vectors R_{rf} and R_{rb} as well as F_{o} and F_{infra} can be found in Tab. XII.

B. Feature Engineering

To test and develop our system and to fill the described environment model, we use fused data originating from three different sources:

  1. The basis for our investigations are measurement data produced by the testing fleet (cf. Sec. III-A).

  2. As we identified additional features being of interest as inputs beforehand, we fuse the data with information from a navigation map (e.g. bridges, tunnels, and distances to highway approaches).

  3. Besides, we calculate some higher order features out of the measurements, as e.g. a conversion to a curvilinear coordinate-system along the road [25].

C. Labeling

Like previous works [3], we divide all samples into the three maneuver classes LCL (lane change left), FLW (lane following), and LCR (lane change right) and apply a labeling process that works as follows: First, for each measurement, the times up to the next lane change to the left neighboring lane (TTLCL ) and to the right one (TTLCR ) respectively are calculated. This is accomplished by a forecast in time with the distances to the lane markings. As the moment of the lane change, we define the point in time when the vehicle center has just crossed the lane marking. Subsequently, we determine the maneuver labels of each sample based on a defined prediction horizon T_{h} according to Eq. 7:\begin{align*} L = \begin{cases} LCL,& \text {if } (TTLCL \leq T_{h})~\land ~\\ & \; \; \; \;(TTLCL < TTLCR) \\ LCR,& \text {if } (TTLCR \leq T_{h})~\land ~\\ & \; \; \; \; (TTLCR < TTLCL) \\ FLW,& \text {otherwise}\\ \end{cases} \tag{7}\end{align*}

View SourceRight-click on figure for MathML and additional features.

We decided to use a horizon of 5 s, as the duration of lane change maneuvers usally ranges from 3 s to 5 s (see [16]). Consequently, it is reasonable to label samples only to an upper boundary of 5 s as potential lane change samples. Additionally, this value is widely used in literature as longest prediction time (e.g. [8], [15] or [16]) and, therefore, it allows for comparability. However, note that this style of labeling might result in decreased performance values, as detections being slightly more than 5 s ahead of a lane change count as false positives in the evaluation.

D. Data Set Split

As shown in Fig. 2, we split our data into several parts after executing the mentioned preprocessing steps. The first split divides our data into one part for the maneuver classification D^{Ma} and another one for the position prediction D^{Po} . This allows us to produce models based on independent data sets. An overview of the splits as well as the respective data set sizes and identifiers is given in Tab. I.

The first part D^{Ma} is then used as follows: To prepare the training, parametrization and evaluation of the developed classifiers as well as to stay methodically straight, we split data set D^{Ma} once more into six folds.1 Thereof we use five folds D^{Ma}_{TV} in Sec. IV for the design and parametrization. The remaining fold D_{6}^{Ma}=D^{Ma}_{Te} is only used for the performance examinations presented in Sec. V. The split is performed based on entire situations as described in [3]. This means that the measurements of each situation solely occur in one of the folds. Note that this ensures the absence of unrealistic results, which might occur due to similar samples from the same time series in the evaluation and trainings data otherwise. To achieve an even proportion of the three maneuver classes, we balance the number of samples within each fold by a random undersampling strategy. As the prediction problem is extremely unbalanced, as outlined in [10], classifiers would focus on the most frequent maneuver class FLW otherwise. In our case approximately 94% of the data points belong to that class.

In addition, we only take situations into account that were collected continuously up to the prediction horizon of 5 s. This ensures that the folds are also balanced over time, which constitutes a prerequisite for performing fair evaluations. This is necessary, as the prediction task is obviously much more demanding when predicting a lane change 4 s in advance instead of 1 s in advance. Due to this strategy, the numbers of samples in the six folds are slightly different, but we consider this as uncritical. Overall, D^{Ma} contains approximately 8 hours of highway driving of which \frac {2}{3} are collected right during lane changes.

The second data set D^{Po} , which serves for the training and evaluation of the position prediction, is processed as follows: Initially, we add the lane change probabilities as estimated by the different classifiers to each sample. Furthermore, we only consider measurements that were collected when the vehicle was manually driven. Note that this restriction is essential as all vehicles of our testing fleet are equipped with an Adaptive Cruise Control (ACC) system. Thus, driving in a semi-automated mode is over-represented in our data set compared to reality.2

We further split data set D^{Po} into the subsets D^{Po}_{T} for training and D^{Po}_{Te} for evaluating the position predictions (cf. Sec. VI and Sec. VII). Afterwards, we expand each data point in D_{T}^{Po} with the desired prediction outputs, i.e., the true positions in x and y direction for all times t \in T_{T}= {−1.0s, −0.9s, …, 6.0s}. Note that the samples with negative times and the ones with times >5s are needed to train the distributions correctly. Strictly limiting the times to a certain range would generate areas in the data space, which are difficult to represent with GMMs due to discontinuities similar to the ones in the probability dimension (cf. Sec. VI-B). To overcome these problems, we integrated a mechanism performing a subsampling between −1s and 0 s as well as between 5 s and 6 s according to a Gaussian distribution (percentiles: P_{50}=0.0\,s ; P_{-3\sigma }=-1.0\,s ; equivalent between 5 and 6 s).

Another mechanism performing a time interpolation ensures that the training data points are distributed continuously along the time dimension. Accordingly, we also have access to prediction times in between our sampling times during the training process. Moreover, the data points in the position test data set D^{Po}_{Te} are expanded with x and y positions as well as corresponding times t \in T_{Te}= {0.0s, 0.1s,…, 5.0s}.

Finally, we ’coil’ the two data sets D^{Po}_{T} & D^{Po}_{Te} such that each of the newly constructed data points contains the features at the start point of the prediction, one corresponding prediction time, and the actual x and y positions at that point in time (in Fig. 2 this step is called ’Explode Data’). Hence, our data sets are multiplied by a factor of |T_{T}|=71 respectively |T_{Te}|=51 and are structured as described in Sec. VII-A. Note that D^{Po}_{T} is re-splitted along the maneuver labels and undersampled in Sec. VI-A, to train maneuver specific position prediction experts.

SECTION IV.

Maneuver Classifier Training

This section gives an overview of the different techniques used for feature selection (cf. Sec. IV-A), classification algorithms (cf. Sec. IV-B), and techniques to tune the respective hyperparameters (cf. Sec. IV-C) for the maneuver classification. The corresponding activities are illustrated by Fig. 4.

Fig. 4. - Process of training and evaluating maneuver classifiers.
Fig. 4.

Process of training and evaluating maneuver classifiers.

A. Feature Selection

This section deals with the task of selecting a meaningful subset of features from the available superset. Such selection makes sense for two reasons: First, it can improve the prediction performance of the maneuver classifiers. Second, it can help to reduce calculation efforts, enabling predictions on devices with limited computational power as well. Our main goal here is to improve the overall prediction performance. Note that this slightly contrasts with an overall ranking of the available features, as some of them are highly redundant. Consequently, the most predictive variables shall be selected, while excluding redundant ones. In literature, one can find numerous works dealing with feature selection in machine learning applications. In our implementation, we rely on the findings from [26]. As we claim to solve the underlying classification problem through a systematic machine learning workflow, we start with simple techniques and move towards more sophisticated and computationally expensive ones. To demonstrate the performance of the used techniques, additionally, we test the classification with the entire superset as a baseline. The superset that contains all features is denoted as A in the following.

The first investigated technique is a simple correlation-based feature selection technique, which evaluates the correlation of all features and then applies a threshold (set to 0.15) to remove features showing a very low correlation with the maneuver class from the superset. More precisely, we compute Spearman’s Correlation (see [27, p. 133 ff]) between each feature and the time up to the next lane change (TTLC ). We selected this quantity instead of the maneuver label, as it enables a smooth fade-out. The resulting feature set is denoted as B in the following. Tab. II summarizes the examined variants and their abbreviations. Finally, the elements of the resulting feature sets can be found in Tab. XII.

The second technique uses the Correlation-based Feature Selection (CFS; cf. [28]) and is referred to as C in the following. For this technique, the correlation of entire feature sets instead of single features is calculated. More precisely, for all feature sets S , the ’merit’ M_{S} , as a measure of the predictive performance, is computed according to Eq. 8:\begin{equation*} {M}_{S} = \frac {n\,\overline {\rho _{cf}}}{\sqrt {n+n(n-1)\overline {\rho _{ff}}}} \tag{8}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

n describes the number of features and \overline {\rho _{cf}} corresponds to the mean correlation of all features with the class label or, in our case, TTLC . Variable \overline {\rho _{ff}} , in turn, describes the mean feature-feature inter-correlation of all features within S . As can be seen from Eq. 8, strongly correlated features in a feature set S minimize M_{S} , whereas a stronger correlation with the class label \overline {\rho _{cf}} maximizes the value of M_{S} . All these computations rely on the assumption that no strong feature inter-correlations are present in the data set, but that instead every relevant feature itself is at least weakly correlated with the class label (see also [28]). To meet the conditions of our data set and to be consistent with variant B , we use Spearman’s correlation coefficient. As the computation of M_{S} is not feasible for all possible feature combinations, we use a backward selection strategy that, according to Guyon and Elisseeff [26], typically provides superior results compared to forward selection. When applying it in our research, we try to minimize the possible shortcomings of the CFS by applying cross-validation with the five data folds for training and validation (D^{Ma}_{TV} ), as described in Sec. III-D.

The feature selection techniques described so far are limited in two aspects: Firstly, a proper incorporation of the properties of the used classification algorithm is missing. Secondly, features only being meaningful in combination with others are not considered in feature sets B and C . Therefore, when generating feature set D , we apply a wrapper feature selection technique as described in [29]. As the training of Random Forests already includes an implicit feature selection, we solely focus on wrapper techniques including the other classifiers presented in Sec. IV-B. The main idea of wrapper techniques is to incorporate the classifier itself as black box into the feature selection process. Within this process the prediction performance on a validation data set is used to determine the best feature set for the respective classifier. We build our investigations on a hyperparameter set that was optimized as described in Sec. IV-C, whith the feature set of variant C being used for optimization. According to the process for deriving C , we perform the search for the most descriptive feature set with backward elimination. As for each of the approximately 5000 possible subsets, a classifier needs to be trained and evaluated, the wrapper technique becomes computationally expensive. To accelerate the computation, we are not performing the validation using cross-validation. Instead, we use one of the data folds constructed in Sec. III-D for training (D^{Ma}_{1} ) and one for validation (D^{Ma}_{2} ).

B. Examined Classification Algorithms

For the task of maneuver classification, we consider three different algorithms for evaluation purposes, which have been successfully applied in reference works:

  1. The first algorithm is based on a Gaussian Naïve Bayes (GNB) approach using GMMs instead of only using one Gaussian kernel per class and was presented in [7].

  2. The second algorithm is based on a Random Forest (RF) and was presented in [3].

  3. The third algorithm is based on a Multilayer Perceptron (MLP) approach and was presented similiarly in [15]. As opposed to GNB and RF, this approach uses scaled features, as suggested by [30, p. 398 ff]. In contrast to [15], we use a modified labeling and a partly automated strategy to identify an optimal model structure, where we restrict the model to one hidden layer in order to keep the parameter optimization solvable in finite time.

C. Hyperparameter Optimization

To achieve the best possible performance and to enable a fair comparison of the examined classifiers, we optimize their respective hyperparameters. For the GNB, this means to find the optimal number of Gaussian kernels K used for each feature and class. A Variational Bayesian Gaussian Mixture Model (VBGMM; see [31]) is used in this context. This technique was already successfully applied in [11]. The principle behind VBGMMs is to fit a distribution of the possible Gaussian Mixture distributions using a Dirichlet process. Hence, this technique ensures that the optimal value for k is determined automatically.

Regarding RF and MLP approaches, the parameter optimization is executed for each feature set using a grid-search. This means, that we vary the parameters and calculate for each parameter set a performance value. For the latter, we calculate the average balanced accuracy (see Sec. V-A) in a leave one out cross-validation manner. Thereby, we use the data of the five data folds for training and validation (D^{Ma}_{TV} ). The parameters to be optimized are summarized in Tab. III.

So far, we constructed different feature sets (cf. Sec. IV-A) and optimized the hyperparameters for the different classification algorithms (cf. Sec. IV-B & Sec. IV-C). Subsequently, we now execute a second training step with a larger amount of data for all algorithms, using the optimized feature sets and hyperparameters. The enlargement of the data set is achieved using all five folds that we previously used in the cross-validation D^{Ma}_{TV} . Note that through this step we derive the final models for the classifier evaluation (cf. Sec. V).

SECTION V.

Maneuver Classifier Evaluation

This section presents the experimental results obtained with the trained classification models (cf. Sec. IV). Sec. V-A introduces the used performance measures, whereas Sec. V-B presents and discusses the results measured with the constructed test data set (cf. Sec. III-A).

A. Performance Measures

To be able to assess the performance of the developed classifiers, several metrics are needed, as we are simultaneously focusing on different objectives. Particularly, we are interested in predicting lane changes not only with high accuracies, but also as early as possible in advance of their execution.

To reflect that, we use the balanced accuracy (BACC ), which enables us to perform an even weighting of the classification performance for the three maneuver classes. Basically, we use the definition presented in [32], but in a generalized form for multiclass problems (cf. Eq. 9):\begin{equation*} BACC = \frac {1}{|M|} \cdot \sum _{m \in M} \frac {TP_{m}}{P_{m}} \tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

M is defined according to Eq. 2. Moreover, TP_{m} corresponds to the number of true positives for class m and P_{m} to the number of samples truly belonging to class m (positives). Thereby, the classifiers assign each sample to the class with the highest probability value.

Additionally, we use the Receiver Operator Characteristic (ROC) and Area Under the ROC Curve (AUC), which both are widely used metrics in this domain (e.g. [33, p. 180 ff]). As opposed to the BACC , the ROC curve is originally intended to asses binary classifiers. Accordingly, we transform our three-class problem into three binary classification problems. In contrast to the BACC , the ROC curves constructed this way enable us to show off the classification performance at different working points (WP). For example, this property allows us to assess the performance for the maneuver classes LCL and LCR with more conservative classifier parametrizations and, thus, less false positives. Additionally, the AUC helps to analyze the performance at all possible working points at once.

Besides, metrics which enable us to analyze the technically possible prediction time horizon are needed. As the point in time being referenced in this context is essential and most sources (e.g. [1], [15] and [16]) are not very exact in this respect, we introduce the two metrics \tau _{f} and \tau _{c} (cf. Tab. IV).

As opposed to the BACC evaluation, for which an unambiguous class assignment becomes necessary, the class assignment is at this point conducted in a way that matches the binary evaluation in the ROC curve: For the classes LCL and LCR , respectively, we select a binary decision threshold that keeps the false positive rate below 1%. The resulting working points are presented later on in Fig. 5 along with the ROC curves. The detection times calculated this way reflect an evaluation with a limited false positive rate and, hence, at a similar working point for the different classifiers. Note that this ensures a fair evaluation. We decide here for a very low false positive rate as the system should not produce too many lane change detections. Remember that in practice, lane changes occur very rarely compared to lane following.

Fig. 5. - ROC curves for the developed maneuver classifiers with their respective best parameter sets and hyperparameters.
Fig. 5.

ROC curves for the developed maneuver classifiers with their respective best parameter sets and hyperparameters.

B. Results & Discussion

Tab. V shows the results (BACC , AUC , \tau ) for the different classifiers and feature sets measured based on the maneuver test data set D^{Ma}_{Te} . Probably, due to the large number of samples, a favorable classifier parametrization and selection seem to have a significantly higher impact on the classification performance than a clever feature selection has. Note that this can be concluded, as the classifiers working with feature sets B and C only perform slightly worse regarding BACC and AUC than the other classifiers. However, applying a feature selection still remains reasonable as it ensures shorter computation times. In addition, the results indicate that the feature selection contributes to an increase of the prediction times in most cases. Note that this does not apply to the RF as this classifier performs an implicit feature selection.

Fig. 5 additionally shows the ROC curves for the respective best combination of classifier and feature set regarding BACC and AUC for each of the three classifiers. As another result of our investigations, the classification performance for the lane following maneuver (FLW ), which is neglected by most researchers in literature, is notably worse than for the lane changing maneuvers for all considered algorithms. This can be explained with the fact that nearly each sample, which can not be certainly assigned to one of the lane change maneuvers, is classified as lane following. This is caused, as confusions between a lane change to the right and one to the left are very rare. Thus, a significantly larger number of false positives arises for maneuver class FLW . In addition, we could reproduce the findings of [8], which showed that lane changes to the left are easier to predict than the ones to the right. One may explain this phenomenon with the observation that lane changes to the right are often motivated by the intention to leave the highway. The latter can be hardly predicted compared to lane changes to the left, which are often performed to overtake slower leading vehicles. Besides, it can be observed that the classification problem remains resolvable even with a significantly decreased number of features, as shown by the MLP classifier with feature set D_{MLP} , which only includes 24 features. This illustrates that a decreased number of features sometimes leads to an improved performance due to a lower dimension of the input space. This can be explained with the fact that numerous features, which we expected to provide insights into specific lane changing situations, seem to have nearly no effect concerning the general behavior in highway situations. Exemplary features showing this behavior are summarized in Tab. VI.

An explanation of this behavior is that situations, which are affected by these features, occur even rarer than lane changes. However, as automated driving is extremely demanding exactly in these situations, additional investigations are needed in these cases (cf. Sec. VIII).

It is noteworthy that the detection times \tau _{f} and \tau _{c} are limited to a maximum of 5 s due to our evaluation methodology. Therefore, the average values \overline {\tau _{f}} and \overline {\tau _{c}} presented in Tab. V will even be exceeded in practice. To substantiate this assumption, Fig. 6 shows a histogram of the detection times for the RF . The distribution shows numerous situations, that are detected 5 or more seconds in advance.

Fig. 6. - Histogram of detection times 
$\tau _{f}$
 (a) and 
$\tau _{c}$
 (b) for RF for maneuver class 
$LCL$
 with feature set 
$A$
.
Fig. 6.

Histogram of detection times \tau _{f} (a) and \tau _{c} (b) for RF for maneuver class LCL with feature set A .

Altogether, our investigations show that a systematic machine learning workflow, combined with a large amount of data, is able to outperform current state-of-the-art approaches significantly. This becomes obvious when looking at the AUC in comparison to other approaches. Tab. VII shows that our approach outperforms the others, although we are working with a significantly larger prediction horizon, which makes the classification problem more demanding as aforementioned. Finally, note that the mentioned state-of-the-art approaches were designed and evaluated on considerably smaller data sets.

Our investigations show that the GNB classifier performs significantly worse than the two other approaches (i.e. MLP and RF). Thus, we only use these two classifiers in our further studies. Additionally, we are restricting ourselves to those feature sets and hyperparameter sets showing the best performance (cf. Tab. VIII).

SECTION VI.

Position Predictor Training

This section deals with the training of the models for position prediction. In particular, we show how to determine the GMM parameters \Theta . Sec. VI-A relies on the Mixture of Experts (MOE) approach, which was introduced in [3] for lateral predictions and which uses Gaussian Mixture Regression (cf. Eq. 1). An alternative approach is presented in Sec. VI-B. As opposed to the MOE approach, it solves the problem in one processing step (cf. Eq. 5). The entire procedure, including the evaluation process (cf. Sec. VII), is depicted in Fig. 7.

Fig. 7. - Steps to train and evaluate the position predictors.
Fig. 7.

Steps to train and evaluate the position predictors.

A. Mixture of Experts Approach

To train the experts for the three maneuver classes, we divide the data set (cf. Sec. III-D) along the maneuver labels (cf. Fig. 7). Subsequently, we perform a random undersampling of the data points for the FLW maneuver class to obtain approximately the same number of samples as for the other two classes. The basic idea behind this step is that the regression problem for the FLW class is less complex than for the two other classes. Thus, it should be solvable with the same amount of data. Amongst others, this data reduction helps to speed up training. As a consequence, the number of FLW samples is approximately decreased by 95% and the data sets D^{Po}_{T,LCL} , D^{Po}_{T,FLW} , and D^{Po}_{T,LCR} are constructed (cf. Tab. I). Afterwards, we train an expert GMM with each of these data sets. These experts are later used in the MOE approach (cf. Fig. 8). We choose a maximum number of K=50 mixture components as well as full covariance matrices,4 and fit the GMM in a variational manner again. Besides, we use the following input-feature set F^{I}_{y} and the true position y at a defined prediction time t to train the experts in lateral direction (cf. Eq. 10):\begin{equation*} F^{I}_{y} = \{v_{y},~d_{y}^{cl}\} \tag{10}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Fig. 8. - Illustration of the mixture of experts (MOE) approach.
Fig. 8.

Illustration of the mixture of experts (MOE) approach.

Regarding the prediction in longitudinal direction, we need to distinguish whether or not a preceding vehicle is present. If no vehicle is in sensor range, both the relative speed and distance for that vehicle are set to default values. As involving the latter in the training of the models would lead to bad fits, the input feature sets F_{x, Obj}^{I} and F_{x, \overline {Obj}}^{I} are defined as follows (cf. Eq. 11 & Eq. 12):\begin{align*} F_{x, Obj}^{I}=&\{v_{x},~a_{x},~d_{v}^{rel, f},~v_{v}^{rel, f}\} \tag{11}\\ F_{x, \overline {Obj}}^{I}=&\{v_{x},~a_{x}\} \tag{12}\end{align*}

View SourceRight-click on figure for MathML and additional features.

As shown in [13], the prediction performance for the longitudinal direction can be significantly increased by learning the deviation from the constant velocity prediction \hat {x}_{CV} instead of the true target position x . Consequently, we use the output dimensions F^{O}_{x} (cf. Eq. 13):\begin{equation*} F^{O}_{x} = \{x-\hat {x}_{CV},~t\} \tag{13}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

B. Integrated Approach

As alternative to the MOE approach, this section presents an integrated approach, which uses the unsplitted data set D^{Po}_{T} (cf. Tab. I) and expands the feature sets (F_{x, Obj}^{I}, F_{x, \overline {Obj}}^{I}, F_{y}^{I} ) with the maneuver probabilities P_{LCL} and P_{LCR} (cf. Fig. 9).P_{FLW} is left out here as this information would be redundant to the one provided by P_{LCL} and P_{LCR} , and we want to keep the models’ dimension as low as possible. Consequently, the task of considering the maneuver probabilities is directly integrated in the model. The resulting one-block solution is both easier to implement and to use. In this context, we discovered that GMMs are not well suited to fit probabilities bounded to values between 0 and 1. Especially, this is the case if most of the probabilities tend against the extreme values (cf. Fig. 10 (a)). Hence, we expand our data set with a duplicate of each data point containing probability values, which are mirrored at 0 for original probabilities being lower than 0.5 and at 1 for all other original probabilities. This way, we are able to generate the density shown in Fig. 10 (b), which we identified as easier to fit with GMMs. Note that before our adjustment, the density contained an abrupt jump, especially at P_{LCL}=0 . As such discontinuities are only representable by numerous Gaussian components, which are symmetrical and smooth per definition, many components needed in other areas of the data space would be wasted for this purpose.

Fig. 9. - Illustration of the integrated approach.
Fig. 9.

Illustration of the integrated approach.

Fig. 10. - Density of 
$P_{LCL}$
 before (a) and after (b) adjustment.
Fig. 10.

Density of P_{LCL} before (a) and after (b) adjustment.

The actual training of the integrated GMM is performed similarly to the experts training in a variational fashion, with K=50 components and full covariance matrices, but with the entire training data set. Thus, no undersampling procedures are applied and the unbalanced nature of the maneuver classes and their actual frequencies are preserved.

SECTION VII.

Position Estimation Evaluation

In order to evaluate the position predictions, first of all, one has to decide which of the considered classifiers fits best as gating network in the Mixture of Experts (MOE) and in the integrated approach respectively. Hence, we calculate the average log-likelihoods \overline {\mathcal {L}} on the entire position test data set D^{Po}_{Te} (cf. Sec. III-D). Note that this data set is not balanced according to the maneuver labels as also suggested in [20]. In particular, the unbalanced nature of the data allows us to draw general conclusions about the performance, independent of the respective driving maneuver. In this context, the use of the average log-likelihood as quality criterion for comparing different approaches is beneficial, as it rates the quality of the predicted probability density distribution instead of assessing only the ability to predict one single position with maximized accuracy. Moreover, the log-likelihood is exactly the value to be maximized in the process of fitting a GMM. However, as \overline {\mathcal {L}} can not be interpreted as physical quantity, it is solely useful for comparison purposes. As we are also interested in assessing the performance concerning the spatial error and to achieve comparability, we additionally investigate this quantity for the approach working best in the following subsections.

Tab. IX shows the per sample log-likelihood of different approaches for the longitudinal (\overline {\mathcal {L}_{x}} ) as well as the lateral (\overline {\mathcal {L}_{y}} ) direction. In this context, we use the already introduced classifiers RF and MLP in combination with four different strategies to combine the experts’ position estimates, as introduced in Eq. 1, as weighting function w_{m}(I) :

  1. Raw probabilities (Raw): This strategy directly uses the raw probabilities as issued by the classifiers P_{m}^{clf}(I) as gating probabilities. This means that we concatenate the three GMMs and multiply the mixture weights with the probabilities issued by the respective classifier: w_{m}^{Raw}(I) = P_{m}^{clf}(I) .

  2. Winner Takes it All (WTA): This strategy uses the outputs of the GMM for the maneuver class with the largest probability according to the respective classifier (cf. Eq. 14).\begin{align*} w_{m}^{WTA}(I) = \begin{cases} 1,& \text {if }P_{m}^{clf}(I)=\max \limits _{\{q \in M\}} P_{q}^{clf}(I)\\ 0,& \text {else} \end{cases} \tag{14}\end{align*}

    View SourceRight-click on figure for MathML and additional features.

  3. Prior Weighted Raw probabilities (PW-Raw): This strategy considers that the classifiers were trained on a balanced data set. Thus, it multiplies the raw probabilities with the prior probabilities for each maneuver class: w_{m}^{PWRaw}(I) = norm(P_{m}^{clf}(I) \cdot \pi _{m}) .

  4. Integrated GMM (I-GMM): This strategy directly uses the integrated approach presented in Sec. VI-B to predict the probability distributions and follows Eq. 5.

To demonstrate the benefits of our approach, which combines maneuver classification and position prediction, we additionally analyze its performance compared to reference strategies. First, we use the labels as a perfect classifier according to Eq. 15:\begin{align*} w_{m}^{Labels} = \begin{cases} 1,& \text {if } m=L\\ 0,& \text {else} \end{cases} \tag{15}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Moreover, we use the pure prior probabilities (\pi _{LCL}=\pi _{LCR}=0.03; \pi _{FLW}=0.94 ) as most naive classifier (w_{m}^{Priors}=\pi _{m} ) and a strategy without a classifier, referred to as NOCLF in the following.

For the longitudinal direction, Tab. IX shows that the reference solution without any previous maneuver classification (NOCLF) is able to produce slightly better results than the other combinations. Although it seems to be trivial that lane changes have not to be taken into account when predicting the longitudinal behavior, this is noteworthy, as our expectations beforehand was that lane changes to the left mostly go along with an acceleration, whereas braking actions are extremely rare.

By contrast, the benefits of the Mixture of Experts (MOE) approach come into effect for the lateral direction. As shown in Tab. IX, the combination of prior weighting and MLP probabilities performs best. Furthermore, all combinations involving the integrated approach perform only slightly worse or even better (RF) than the combinations using prior weighted probabilities. As benefit, these models are easier to use and are more robust against poor or uncalibrated maneuver probabilities without needing an additional calibration step. This can be explained with the fact that these models perform an implicit probability calibration during the training of the GMM.

Moreover, we learned that the WTA strategy has no practical relevance, as it does not necessarily produce continous position predictions over consecutive time steps as accomplished by the other strategies per definition. Besides, in case of a misclassification, the WTA strategy solely asks one specific expert model, which might not be applicable in that area of the data space, what clearly decreases the overall performance.

In the following, we investigate the spatial errors of the best combinations (lateral: MLP classifier with PW-Raw strategy; longitudinal: NOCLF), as previously introduced. For this purpose, we present the applied performance measures in Sec. VII-A and then show the obtained results in Sec. VII-B.

A. Performance Measures

To measure the spatial performance of our predictions, we rely on the unbalanced position evaluation data set D^{Po}_{Te} . The latter contains the needed inputs for the maneuver classifiers and position predictors (I ) as well as the true trajectories TR according to Eq. 16.\begin{equation*} D^{Po}_{Te} = \begin{bmatrix} I & \textit {TR} \end{bmatrix} \tag{16}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

TR contains N=20\,000\,\,5\text{s} -trajectories sampled with 10Hz (hence 1000000 samples) according to Eq. 17:\begin{equation*} \textit {TR} = \begin{bmatrix} tr^{0} & tr^{1} & \cdots & tr^{N} \end{bmatrix} \tag{17}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Each trajectory tr^{i} consists of 51 corresponding x and y positions, according to Eq. 18:\begin{align*} tr^{i} = \begin{bmatrix} x^{i}_{0.0} &\quad y^{i}_{0.0} \\ x^{i}_{0.1} &\quad y^{i}_{0.1} \\ \vdots &\quad \vdots \\ x^{i}_{5.0} &\quad y^{i}_{5.0} \\ \end{bmatrix} \tag{18}\end{align*}

View SourceRight-click on figure for MathML and additional features.

The predicted trajectories \hat {\textit {TR}} are then calculated with the described classifiers and position predictors in the same format as TR. However, as the Gaussian Mixture Regression originally produces probability densities instead of point estimates, these have to be calculated first. This is accomplished by calculating the center of gravity of the density as described in [3]. Accordingly, the prediction error e^{i}_{t} of a specific prediction time t for one of the i trajectories is calculated separately for the two dimensions x and y as follows (Eq. 19):\begin{align*} e^{i}_{t} = \begin{bmatrix} e^{i}_{x, t} & e^{i}_{y, t} \\ \end{bmatrix} = \begin{bmatrix} |x^{i}_{t} - \hat {x}^{i}_{t}| & |y^{i}_{t} - \hat {y}^{i}_{t}| \\ \end{bmatrix} \tag{19}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Variables \hat {x} and \hat {y} describe the estimated positions, whereas x and y correspond to the actual ones. The individual errors e^{i}_{t} of all trajectories i are concatenated to E_{t} (cf. Eq. 20):\begin{align*} E_{t} = \begin{bmatrix} E_{x, t} & E_{y, t} \\ \end{bmatrix} = \begin{bmatrix} e^{i}_{x, t} & e^{i}_{y, t} \\ \end{bmatrix}_{\forall i} \tag{20}\end{align*}

View SourceRight-click on figure for MathML and additional features.

At this point, we want to re-emphasize, that although this way of evaluating the performance produces easy to interpret results, it disregards that our original outputs (i.e. spatial probability densities) contain much more information than a single point estimation.

B. Results & Discussion

Fig. 11 shows the performance of the selected combinations of classifiers and mixing strategies (highlighted in Tab. IX) at a prediction horizon of 5 s for the longitudinal (E_{x, 5} ) and the lateral (E_{y, 5} ) direction on the left side. In comparison, a constant velocity (CV) prediction and a Mixture of Experts (MOE) with labels 5 are shown. The right-hand side of Fig. 11 shows the development of the median lateral error \tilde {E}_{y,t} as function of the prediction time t .

Fig. 11. - Visualization of the error distribution (left) in longitudinal and lateral direction and the median lateral error as function of the prediction time (right).
Fig. 11.

Visualization of the error distribution (left) in longitudinal and lateral direction and the median lateral error as function of the prediction time (right).

As the plots indicate, our position prediction system is able to produce results comparable to the ones with a perfect maneuver classification, in both lateral and longitudinal direction. Additionally, the plots show that we are able to clearly outperform simple models as CV and reach a very small median lateral prediction error of less than 0.21m at a prediction horizon of 5 s. As shown in Tab. X, this is remarkable compared to other approaches. Note that we did not include studies in this compilation, which report the root-mean-square error (RMSE), which we quantify with a value of 0.64m. On one hand, we follow [34], which points out that RMSE measures do not allow for a comparison over different data sets, as the values depend on the size of the data set. On the other, the challenge tackled by us (cf. Sec. I-A) is to predict the probability distribution of future vehicle positions rather than single shot estimates. Consequently, we did not optimize the predictions to minimize RMSE . Therefore, it is not surprising that other works which explicitly minimize this value, but ignore distribution estimations, perform better with respect to RMSE .

As shown in [3], these results are dominated by the most frequent maneuver class (FLW ). Hence, Tab. XI complementarily shows the errors for 20 000 maneuvers of each type.

As can be seen, the errors for the lane change maneuvers are considerably larger than the ones for lane-following. On one hand, this can be explained with the more complex regression task. On the other, the predictions are subjected to higher uncertainties in case of a lane change, as shown by the predicted distributions (cf. Fig. 12). As opposed to that, the uncertainty is ignored in the single point estimates. Note that the increased uncertainties are caused by the lack of knowledge on the exact point in time at which the maneuver will be completed. This even holds true, if the classifier made the position prediction to know about an upcoming lane change.

Fig. 12. - Predicted probability distribution of future vehicle positions for an illustrative situation.
Fig. 12.

Predicted probability distribution of future vehicle positions for an illustrative situation.

Complementary to these quantitative evaluations, we performed qualitative testing and visualized single situations along with our predictions. To illustrate this, we attached a short video and present a single frame in Fig. 12. More precisely, Fig. 12 shows the predictions during an upcoming lane change, along with the described uncertainties. In addition, we show the confidence of our predictions (Conf_{x} , Conf_{y} ), which provides an important hint concerning the reliability of the predictions to the consumer of the information. This value is calculated similarly to [13] through additional GMMs fitted in the input dimensions. To demonstrate its general usability, we visualized the confidence value divided by the standard deviation against the lateral prediction errors at T_{h}=5\text{s} in Fig. 13. As can be seen, and as expected the prediction errors decrease with increasing confidence values.

Fig. 13. - Prediction confidence against lateral prediction errors.
Fig. 13.

Prediction confidence against lateral prediction errors.

SECTION VIII.

Summary and Outlook

This work introduces a machine learning workflow that enables calculations of long-term behavior predictions for surrounding vehicles in highway scenarios. For the first time, a combined compilation of prediction techniques for driving maneuvers and positions as well as lateral and longitudinal behavior is presented. The developed modules are evaluated in detail based on a large amount of real-world data, challenging established state-of-the-art approaches.

To further improve the quality of the presented behavior predictions, especially in complex situations, we are working on various enhancements and conducting additional studies. Currently, we migrate the prediction strategies to an experimental vehicle to enable detailed investigations regarding run time as well as resource usage. Meanwhile, we are about to apply our models to predict movements of surrounding vehicles in contrast to ego-vehicle movements. Besides, we plan to apply our predictor to a publicly available data set as highD [35] or NGSIM to improve comparability. In addition, we want to investigate up to which maximum prediction horizon (beyond 5 s), the maneuver detection produces useful insights.

Moreover, we see high potential in identifying demanding scenarios and explicitly integrating contextual knowledge (e.g. weather, traffic, time of day or local specialties) into our models. First experiments towards this direction have proven, that contextual properties can have a considerable impact on driving behavior.

ACKNOWLEDGMENT

The authors would like to thank Mercedes-Benz AG Research and Development for providing real-world measurement data, which enabled us to perform our experiments. Furthermore, they would like to thank the Institute of Databases and Information Systems at Ulm University as well as Prof. Dr. Klaus-Dieter Kuhnert from the Institute of Realtime Learning Systems at the University of Siegen for supporting our studies.

Usage
Select a Year
2025

View as

Total usage sinceJun 2020:1,501
051015202530JanFebMarAprMayJunJulAugSepOctNovDec18221523120000000
Year Total:90
Data is updated monthly. Usage includes PDF downloads and HTML views.

References

References is not available for this document.