Introduction
Automated driving has the potential to radically change our mobility habits as well as the way goods are transported. To enable driving automation, several processing steps have to be executed. Fig. 1 illustrates this thought: In the first step, the current traffic scene has to be sensed and a proper representation of the environment needs to be generated. Using this information, the given traffic situation needs to be interpreted and the behavior of others has to be anticipated. Subsequently, a plan, i.e. a trajectory, is derived based on this knowledge. Finally, this plan is executed in the last step of this process. How long the trajectory stays viable, before it has to be re-planned, is strongly influenced by the capability of the prediction component.
Long-term driving behavior predictions in the context of trajectory planning for automated driving (equal symbols denote simultaneity).
As opposed to other research works dealing with techniques to interconnect vehicles through a so called car-to-car communication, we aim to solve this anticipation task locally. On one hand, it is not foreseeable when an adequate market penetration of vehicles with such techniques will be reached. On the other, a local prediction component always becomes necessary, as there are several traffic participants without communication abilities such as bicyclists. In addition, local predictions might become necessary to bypass transmission times in certain cases as emphasized by [1]. Moreover, it is reasonable to approach the topic from the perspective of highway driving, as this use case is easier to realize than others due to its clear constraints (e.g. structured setting, absence of pedestrians). However, for the prediction task this implies the challenge to create precise long-term predictions (2 to 5 s) rather than short forecasts (up to 2 s), as in highway scenarios higher velocities can be expected than in urban or rural areas.
A. Problem Statement
We tackle the challenge of anticipating the behavior of other traffic participants in highway scenarios. In particular, we aim to generate information that can be processed by trajectory planning algorithms to implement an anticipatory driving style. In this context, our objective is to model future vehicle positions within a time
B. Problem Resolution Strategy
This article presents a systematic workflow for the design and evaluation of a lightweight maneuver-based model [2], which uses standard sensor inputs to perform long-term driving behavior predictions. Methodically, we build on [3] and use a two-step Mixture of Experts (MOE) approach. This includes a maneuver classification and a down-streamed behavior prediction. The maneuver probabilities \begin{align*} y_{t}\sim&p_{y}(\Theta _{y}, I, t) \\=&\sum _{m \in M}{p_{y, m}(\theta _{y, m}| I, t) \cdot w_{m}(I)} \tag{1}\end{align*}
The set of maneuvers \begin{equation*} M = \{LCL, FLW, LCR\} \tag{2}\end{equation*}
Different weighting approaches based on the maneuver probabilities are presented in Sec. VII. The expert distributions \begin{equation*} p_{y, m}(\theta _{y, m}) = \sum _{i=1}^{K}{\phi _{y, m, i} \cdot \mathcal {N}(\mu _{y, m, i}, \Sigma _{y, m, i})} \tag{3}\end{equation*}
The parameters of the GMMs are subsumed in \begin{equation*} \Theta _{y} = \{\theta _{y, m}\}_{\forall m \in M} = \{\phi _{y, m}, \mu _{y, m}, \Sigma _{y, m}\}_{\forall m \in M}\tag{4}\end{equation*}
In addition, we introduce an alternative methodology to the Mixture of Experts approach, integrating the outputs of the gating nodes into one single model. This simplifies Eq. 1 as follows:\begin{equation*} y_{t} \sim p_{y}(\theta _{y, IGMM}| I, t, P_{LCL}(I), P_{LCR}(I)) \tag{5}\end{equation*}
For implementing the models, we use out-of-the-box modules from the widely used frameworks Apache Spark MLlib [4] (classifiers) and Scikit-learn [5] (GMMs).
Altogether, we contribute a systematic workflow for designing and evaluating the prediction models as well as methodical extensions to known approaches. Moreover, we assess the performance of the developed modules for the two tasks of predicting (1) driving maneuvers and (2) probability distributions of future positions both separately and in combination. To evaluate the modules, we utilize a large data set comprising real-world measurements. As will be shown, our prediction models outperform established state-of-the-art approaches.
The remainder of this article is organized as follows: Sec. II discusses related work on object motion prediction, emphasizing the value added by our approach. Sec. III introduces the data set and describes the preprocessing steps applied to it. Sec. IV outlines the training of the considered maneuver classifiers, whereas Sec. V deals with the experimental evaluation and the performance of the classifiers. Based on these findings, Sec. VI develops different approaches for estimating probability distributions of future vehicle positions, which are then assesed in Sec. VII. Finally, Sec. VIII summarizes the article and gives an outlook on future work.
Related Work
Regarding the understanding and prediction of the behavior of other traffic participants in highway scenarios, various aspects were investigated in literature. Accordingly, this section is sub-divided into three parts: Sec. II-A presents approaches inferring the kind of maneuver that will be executed by a vehicle. Note that applications like collision checkers or trajectory planning algorithms cannot directly process such kind of information. Instead, probabilities of future vehicle positions or trajectories need to be predicted. Related research on this topic is presented in Sec. II-B. Bringing together the aspects of maneuver classification and position prediction, Sec. II-C gives an overview of hybrid prediction approaches. Finally, Sec. II-D closes the section with a brief literature discussion, leading to the contributions of this article in Sec. II-E.
A. Classification Approaches
Classification approaches for maneuver recognition are described in [1], [6]–[8]. In [1], a system is introduced, which is capable of detecting lane changes with high accuracies (>99%), approximately 1 s before their occurrence. For this purpose, dynamic Bayesian networks are used. Another approach, which is capable of detecting lane changes approximately 1.5s before their occurrence, is presented in [6]. To achieve this, the lane change probability is decomposed into a situation- and a movement-based component, resulting in an
B. Trajectory and Position Prediction Approaches
Approaches dealing with the prediction of trajectories and positions are presented in [9]–[13]: [9] uses a fully-connected Deep Neural Network to learn the parameters of a two-dimensional GMM. For each situation, an adapted Gaussian Mixture distribution models the probability density in the output dimensions
Another approach, also evaluated with the NGSIM data set, is presented in [10]. The authors propose the use of a Long Short Term Memory network for predicting trajectories. In particular, the approach is able to compute single shot predictions with an RMSE of approximately 0.42m at a prediction horizon of 5 s. Reference [11] deals with the prediction of spatial probability density functions, especially at road intersections. More precisely, a conditional probability density function, which models the relationship between past and future motions, is inferred from training data. Finally, standard GMMs and variational approaches are compared. In [12], this approach is extended by a hierarchical Mixture of Experts that allows to incorporate categorical information. The latter includes, for example, the topology of a road intersection.
In [13], a Gaussian Mixture Regression approach for predicting future longitudinal positions as well as a procedure for estimating the prediction confidence are introduced.
C. Hybrid Approaches
Approaches that combine strategies for both maneuver detection and trajectory or position prediction, similar to the approach presented in this article, are described in [15]–[20]. In the following, we denote such approaches as hybrid.
Reference [15] presents a two-staged approach: In the first step, a Multilayer Perceptron (MLP) is used to estimate the future lane of a vehicle. In a second step, a concrete trajectory realization is estimated with an additional MLP. As a result, the lane estimation module is able to detect lane changes 2 s in advance with an AUC better than 0.90. The evaluation of the trajectory prediction module shows a median lateral error of approximately 0.23m at a prediction horizon of 5 s.
Reference [16] proposes another hybrid approach that uses the prediction of future trajectories to forecast lane change maneuvers. Moreover, the intention of drivers is modeled using a Support Vector Machine. Subsequently, the resulting action is checked for collisions. This enables the approach to model interrupted lane changes. During the evaluation, an
In turn, [17] does not follow such a hybrid approach, but contains an intermediate step before predicting trajectories. Instead of learning maneuver probabilities, the authors present a regression technique for estimating the time span to the next lane change relying on Random Forests. In [18], this approach is extended and combined with findings from [6]. The estimated time up to the next lane changes to the left and to the right are used as input for a cubic polynomial which is intended to predict future trajectories. Finally, the approach is evaluated with the mentioned NGSIM data set, showing a median lateral error of approximately 0.5m at a prediction horizon of 3 s for lane changing scenarios, assuming a perfect maneuver classification.
Reference [19] proposes the use of a maneuver recognition based on a Hidden Markov Model, distinguishing between ten maneuver classes. Based on this model, a position prediction module, which combines several maneuver specific variational GMMs (according to [11]) and an Interacting Multiple Model, which weights different physical models against each other, are implemented. As the approach uses ten maneuver classes and as the errors are only measured in terms of Euclidean distance, the results are difficult to compare with the ones of other approaches. Additionally, the approach is evaluated on a rather small data set. Finally, in [20] these findings are pursued by the use of a Long Short Term Memory network. The authors demonstrate certain improvements compared to their previous work, while using the NGSIM data set for evaluation purposes.
Reference [3] presents an approach predicting future lateral vehicle positions based on Gaussian Mixture Regression and a Mixture of Experts with a Random Forest as gating network. The approach is evaluated based on a small data set, leading to noisy results, especially in case of lane changes. The evaluation shows that the approach is able to perform maneuver classifications with an AUC better than 0.84 and lateral position predictions with a median error of less than 0.2m at a prediction horizon of 5 s.
D. Discussion
The findings of our literature survey can be summarized as follows: Many works provide meaningful algorithmic contributions. However, in numerous cases we miss structure regarding the problem resolution strategy. Often, it does not become clear how the approaches compare to any baseline (e.g. [19]). Moreover, parameters (e.g. [16]) and feature sets (e.g. [10]) are selected manually, and are thus difficult to retrace. In addition, most approaches focus on short or medium prediction horizons (e.g. [1]), or lack a good prediction performance for larger time-horizons (e.g. [18]). When analyzing the approaches that aim to resolve the long-term prediction problem, it becomes clear that the latter is challenging as the prediction models become significantly more complex as, e.g., pointed out by [7], [8] and [21].
Moreover, many approaches (e.g. [10]) aim to predict single trajectories or single shot predictions rather than probabilistic distributions of future vehicle positions. Therefore, the objective to be optimized is mostly the root-mean-square error (
E. Contributions
The contribution of this article is threefold:
We apply a heuristic-free machine learning workflow to generate a model capable of predicting maneuvers and precise distributions of future vehicle positions for time horizons up to 5 s (reasonable in terms of comparability). This is achieved with a machine learning workflow that omits any human tuned (hyper-) parameters when constructing the classifiers. Note that this includes all aspects involving feature engineering, labeling, feature selection, and hyperparameter optimization for different classification algorithms. Regarding feature engineering and selection, this means that we construct a data set with a large superset of all features, which are potentially relevant for the problem solution beforehand. Afterwards we select a more or less small feature set that still ensures maximum predictive power through an automated feature selection process.
We evaluate the modules for maneuver classification and position prediction, where both parts are not only evaluated separately, as in other works (e.g. [18]), but as a combined prediction system as well. This concerns the lateral as well as the longitudinal behavior. In this context, we show that directly feeding the results of the classifier into the regression problem produces results comparable to an Mixture of Experts approach. Additionally, we show that relying on the Markov assumption and not modeling the interactions between the traffic participants explicitly, allows producing superior results compared to existing approaches. As opposed to these works, we integrate the different aspects of behavior prediction, which comprise the prediction of driving maneuvers and positions both in lateral and longitudinal direction. In addition, we introduce new methodologies and conduct a large-scale evaluation.
We demonstrate that the presented methods not only have the potential to outperform state-of-the-art approaches when feeding them with a sufficient number of data. Additionally, we show that our approach is able to provide a meaningful estimate of the prediction uncertainty to the consumer of the information, which is beneficial for collision risk calculation and trajectory planning (e.g. [22]).
Data Preparation & Experimental Setup
Sec. III-A introduces the considered data set and the experimental setup. Sec. III-B then gives a detailed overview of the features used to train our models. Afterwards, Sec. III-C introduces the labeling process. Finally, Sec. III-D deals with the data set split for training, validating and testing the constructed models as well as further preprocessing steps. Fig. 2 summarizes the overall preprocessing workflow.
Preprocessing steps used in the proposed workflow (respective sections are referred in the boxes).
A. Data Collection
For modelling and evaluating our modules, we use measurement data from a fleet of testing vehicles [24] equipped with common series sensors. The sensor setup includes a front-facing camera detecting lane markings as well as two radars observing the traffic situation in the back. In addition, the vehicles have a front-facing automotive radar to sense the distances and velocities of surrounding vehicles. The data has been collected with different vehicles and drivers at varying times of the day during all seasons. The data collection campaign spanned over more than a year and was mainly restricted to the area around Stuttgart in Germany. Through the wide variance, we are expecting our models to achieve good generalization characteristics.
Unlike other contributions (e.g. [3]), we are not using the actual object-vehicles as prediction target
Basically, our investigations rely on a similar environment model than the one presented in [7], modeling the surrounding with a fixed grid of eight relation partners. But opposed to [7], we use the ego-vehicle as prediction target. For this purpose, we slightly adapt the environment model: As the sensors facing the rear traffic in the testing vehicles are less capable than the ones facing the front, our environment model (cf. Fig. 3) distinguishes between relation partners behind (index \begin{align*} F_{sit}=&[R_{rf}(r=fl), R_{rf}(r=f), R_{rf}(r=fr), \\&R_{rf}(r=l), R_{rf}(r=r), \\&R_{rb}(r=rl), R_{rb}(r=rr), \\&F_{o}, F_{infra}]^{T} \tag{6}\end{align*}
A detailed listing of the particular elements of the relation vectors
B. Feature Engineering
To test and develop our system and to fill the described environment model, we use fused data originating from three different sources:
The basis for our investigations are measurement data produced by the testing fleet (cf. Sec. III-A).
As we identified additional features being of interest as inputs beforehand, we fuse the data with information from a navigation map (e.g. bridges, tunnels, and distances to highway approaches).
Besides, we calculate some higher order features out of the measurements, as e.g. a conversion to a curvilinear coordinate-system along the road [25].
C. Labeling
Like previous works [3], we divide all samples into the three maneuver classes \begin{align*} L = \begin{cases} LCL,& \text {if } (TTLCL \leq T_{h})~\land ~\\ & \; \; \; \;(TTLCL < TTLCR) \\ LCR,& \text {if } (TTLCR \leq T_{h})~\land ~\\ & \; \; \; \; (TTLCR < TTLCL) \\ FLW,& \text {otherwise}\\ \end{cases} \tag{7}\end{align*}
We decided to use a horizon of 5 s, as the duration of lane change maneuvers usally ranges from 3 s to 5 s (see [16]). Consequently, it is reasonable to label samples only to an upper boundary of 5 s as potential lane change samples. Additionally, this value is widely used in literature as longest prediction time (e.g. [8], [15] or [16]) and, therefore, it allows for comparability. However, note that this style of labeling might result in decreased performance values, as detections being slightly more than 5 s ahead of a lane change count as false positives in the evaluation.
D. Data Set Split
As shown in Fig. 2, we split our data into several parts after executing the mentioned preprocessing steps. The first split divides our data into one part for the maneuver classification
The first part
In addition, we only take situations into account that were collected continuously up to the prediction horizon of 5 s. This ensures that the folds are also balanced over time, which constitutes a prerequisite for performing fair evaluations. This is necessary, as the prediction task is obviously much more demanding when predicting a lane change 4 s in advance instead of 1 s in advance. Due to this strategy, the numbers of samples in the six folds are slightly different, but we consider this as uncritical. Overall,
The second data set
We further split data set
Another mechanism performing a time interpolation ensures that the training data points are distributed continuously along the time dimension. Accordingly, we also have access to prediction times in between our sampling times during the training process. Moreover, the data points in the position test data set
Finally, we ’coil’ the two data sets
Maneuver Classifier Training
This section gives an overview of the different techniques used for feature selection (cf. Sec. IV-A), classification algorithms (cf. Sec. IV-B), and techniques to tune the respective hyperparameters (cf. Sec. IV-C) for the maneuver classification. The corresponding activities are illustrated by Fig. 4.
A. Feature Selection
This section deals with the task of selecting a meaningful subset of features from the available superset. Such selection makes sense for two reasons: First, it can improve the prediction performance of the maneuver classifiers. Second, it can help to reduce calculation efforts, enabling predictions on devices with limited computational power as well. Our main goal here is to improve the overall prediction performance. Note that this slightly contrasts with an overall ranking of the available features, as some of them are highly redundant. Consequently, the most predictive variables shall be selected, while excluding redundant ones. In literature, one can find numerous works dealing with feature selection in machine learning applications. In our implementation, we rely on the findings from [26]. As we claim to solve the underlying classification problem through a systematic machine learning workflow, we start with simple techniques and move towards more sophisticated and computationally expensive ones. To demonstrate the performance of the used techniques, additionally, we test the classification with the entire superset as a baseline. The superset that contains all features is denoted as
The first investigated technique is a simple correlation-based feature selection technique, which evaluates the correlation of all features and then applies a threshold (set to 0.15) to remove features showing a very low correlation with the maneuver class from the superset. More precisely, we compute Spearman’s Correlation (see [27, p. 133 ff]) between each feature and the time up to the next lane change (
The second technique uses the Correlation-based Feature Selection (CFS; cf. [28]) and is referred to as \begin{equation*} {M}_{S} = \frac {n\,\overline {\rho _{cf}}}{\sqrt {n+n(n-1)\overline {\rho _{ff}}}} \tag{8}\end{equation*}
The feature selection techniques described so far are limited in two aspects: Firstly, a proper incorporation of the properties of the used classification algorithm is missing. Secondly, features only being meaningful in combination with others are not considered in feature sets
B. Examined Classification Algorithms
For the task of maneuver classification, we consider three different algorithms for evaluation purposes, which have been successfully applied in reference works:
The first algorithm is based on a Gaussian Naïve Bayes (GNB) approach using GMMs instead of only using one Gaussian kernel per class and was presented in [7].
The second algorithm is based on a Random Forest (RF) and was presented in [3].
The third algorithm is based on a Multilayer Perceptron (MLP) approach and was presented similiarly in [15]. As opposed to GNB and RF, this approach uses scaled features, as suggested by [30, p. 398 ff]. In contrast to [15], we use a modified labeling and a partly automated strategy to identify an optimal model structure, where we restrict the model to one hidden layer in order to keep the parameter optimization solvable in finite time.
C. Hyperparameter Optimization
To achieve the best possible performance and to enable a fair comparison of the examined classifiers, we optimize their respective hyperparameters. For the GNB, this means to find the optimal number of Gaussian kernels
Regarding RF and MLP approaches, the parameter optimization is executed for each feature set using a grid-search. This means, that we vary the parameters and calculate for each parameter set a performance value. For the latter, we calculate the average balanced accuracy (see Sec. V-A) in a leave one out cross-validation manner. Thereby, we use the data of the five data folds for training and validation (
So far, we constructed different feature sets (cf. Sec. IV-A) and optimized the hyperparameters for the different classification algorithms (cf. Sec. IV-B & Sec. IV-C). Subsequently, we now execute a second training step with a larger amount of data for all algorithms, using the optimized feature sets and hyperparameters. The enlargement of the data set is achieved using all five folds that we previously used in the cross-validation
Maneuver Classifier Evaluation
This section presents the experimental results obtained with the trained classification models (cf. Sec. IV). Sec. V-A introduces the used performance measures, whereas Sec. V-B presents and discusses the results measured with the constructed test data set (cf. Sec. III-A).
A. Performance Measures
To be able to assess the performance of the developed classifiers, several metrics are needed, as we are simultaneously focusing on different objectives. Particularly, we are interested in predicting lane changes not only with high accuracies, but also as early as possible in advance of their execution.
To reflect that, we use the balanced accuracy (\begin{equation*} BACC = \frac {1}{|M|} \cdot \sum _{m \in M} \frac {TP_{m}}{P_{m}} \tag{9}\end{equation*}
Additionally, we use the Receiver Operator Characteristic (ROC) and Area Under the ROC Curve (AUC), which both are widely used metrics in this domain (e.g. [33, p. 180 ff]). As opposed to the
Besides, metrics which enable us to analyze the technically possible prediction time horizon are needed. As the point in time being referenced in this context is essential and most sources (e.g. [1], [15] and [16]) are not very exact in this respect, we introduce the two metrics
As opposed to the
ROC curves for the developed maneuver classifiers with their respective best parameter sets and hyperparameters.
B. Results & Discussion
Tab. V shows the results (
Fig. 5 additionally shows the ROC curves for the respective best combination of classifier and feature set regarding
An explanation of this behavior is that situations, which are affected by these features, occur even rarer than lane changes. However, as automated driving is extremely demanding exactly in these situations, additional investigations are needed in these cases (cf. Sec. VIII).
It is noteworthy that the detection times
Histogram of detection times
Altogether, our investigations show that a systematic machine learning workflow, combined with a large amount of data, is able to outperform current state-of-the-art approaches significantly. This becomes obvious when looking at the AUC in comparison to other approaches. Tab. VII shows that our approach outperforms the others, although we are working with a significantly larger prediction horizon, which makes the classification problem more demanding as aforementioned. Finally, note that the mentioned state-of-the-art approaches were designed and evaluated on considerably smaller data sets.
Our investigations show that the GNB classifier performs significantly worse than the two other approaches (i.e. MLP and RF). Thus, we only use these two classifiers in our further studies. Additionally, we are restricting ourselves to those feature sets and hyperparameter sets showing the best performance (cf. Tab. VIII).
Position Predictor Training
This section deals with the training of the models for position prediction. In particular, we show how to determine the GMM parameters
A. Mixture of Experts Approach
To train the experts for the three maneuver classes, we divide the data set (cf. Sec. III-D) along the maneuver labels (cf. Fig. 7). Subsequently, we perform a random undersampling of the data points for the \begin{equation*} F^{I}_{y} = \{v_{y},~d_{y}^{cl}\} \tag{10}\end{equation*}
Regarding the prediction in longitudinal direction, we need to distinguish whether or not a preceding vehicle is present. If no vehicle is in sensor range, both the relative speed and distance for that vehicle are set to default values. As involving the latter in the training of the models would lead to bad fits, the input feature sets \begin{align*} F_{x, Obj}^{I}=&\{v_{x},~a_{x},~d_{v}^{rel, f},~v_{v}^{rel, f}\} \tag{11}\\ F_{x, \overline {Obj}}^{I}=&\{v_{x},~a_{x}\} \tag{12}\end{align*}
As shown in [13], the prediction performance for the longitudinal direction can be significantly increased by learning the deviation from the constant velocity prediction \begin{equation*} F^{O}_{x} = \{x-\hat {x}_{CV},~t\} \tag{13}\end{equation*}
B. Integrated Approach
As alternative to the MOE approach, this section presents an integrated approach, which uses the unsplitted data set
The actual training of the integrated GMM is performed similarly to the experts training in a variational fashion, with
Position Estimation Evaluation
In order to evaluate the position predictions, first of all, one has to decide which of the considered classifiers fits best as gating network in the Mixture of Experts (MOE) and in the integrated approach respectively. Hence, we calculate the average log-likelihoods
Tab. IX shows the per sample log-likelihood of different approaches for the longitudinal (
Raw probabilities (Raw): This strategy directly uses the raw probabilities as issued by the classifiers
as gating probabilities. This means that we concatenate the three GMMs and multiply the mixture weights with the probabilities issued by the respective classifier:P_{m}^{clf}(I) .w_{m}^{Raw}(I) = P_{m}^{clf}(I) Winner Takes it All (WTA): This strategy uses the outputs of the GMM for the maneuver class with the largest probability according to the respective classifier (cf. Eq. 14).
\begin{align*} w_{m}^{WTA}(I) = \begin{cases} 1,& \text {if }P_{m}^{clf}(I)=\max \limits _{\{q \in M\}} P_{q}^{clf}(I)\\ 0,& \text {else} \end{cases} \tag{14}\end{align*} View Source\begin{align*} w_{m}^{WTA}(I) = \begin{cases} 1,& \text {if }P_{m}^{clf}(I)=\max \limits _{\{q \in M\}} P_{q}^{clf}(I)\\ 0,& \text {else} \end{cases} \tag{14}\end{align*}
Prior Weighted Raw probabilities (PW-Raw): This strategy considers that the classifiers were trained on a balanced data set. Thus, it multiplies the raw probabilities with the prior probabilities for each maneuver class:
.w_{m}^{PWRaw}(I) = norm(P_{m}^{clf}(I) \cdot \pi _{m}) Integrated GMM (I-GMM): This strategy directly uses the integrated approach presented in Sec. VI-B to predict the probability distributions and follows Eq. 5.
To demonstrate the benefits of our approach, which combines maneuver classification and position prediction, we additionally analyze its performance compared to reference strategies. First, we use the labels as a perfect classifier according to Eq. 15:\begin{align*} w_{m}^{Labels} = \begin{cases} 1,& \text {if } m=L\\ 0,& \text {else} \end{cases} \tag{15}\end{align*}
Moreover, we use the pure prior probabilities (
For the longitudinal direction, Tab. IX shows that the reference solution without any previous maneuver classification (NOCLF) is able to produce slightly better results than the other combinations. Although it seems to be trivial that lane changes have not to be taken into account when predicting the longitudinal behavior, this is noteworthy, as our expectations beforehand was that lane changes to the left mostly go along with an acceleration, whereas braking actions are extremely rare.
By contrast, the benefits of the Mixture of Experts (MOE) approach come into effect for the lateral direction. As shown in Tab. IX, the combination of prior weighting and MLP probabilities performs best. Furthermore, all combinations involving the integrated approach perform only slightly worse or even better (RF) than the combinations using prior weighted probabilities. As benefit, these models are easier to use and are more robust against poor or uncalibrated maneuver probabilities without needing an additional calibration step. This can be explained with the fact that these models perform an implicit probability calibration during the training of the GMM.
Moreover, we learned that the WTA strategy has no practical relevance, as it does not necessarily produce continous position predictions over consecutive time steps as accomplished by the other strategies per definition. Besides, in case of a misclassification, the WTA strategy solely asks one specific expert model, which might not be applicable in that area of the data space, what clearly decreases the overall performance.
In the following, we investigate the spatial errors of the best combinations (lateral: MLP classifier with PW-Raw strategy; longitudinal: NOCLF), as previously introduced. For this purpose, we present the applied performance measures in Sec. VII-A and then show the obtained results in Sec. VII-B.
A. Performance Measures
To measure the spatial performance of our predictions, we rely on the unbalanced position evaluation data set \begin{equation*} D^{Po}_{Te} = \begin{bmatrix} I & \textit {TR} \end{bmatrix} \tag{16}\end{equation*}
TR contains \begin{equation*} \textit {TR} = \begin{bmatrix} tr^{0} & tr^{1} & \cdots & tr^{N} \end{bmatrix} \tag{17}\end{equation*}
Each trajectory \begin{align*} tr^{i} = \begin{bmatrix} x^{i}_{0.0} &\quad y^{i}_{0.0} \\ x^{i}_{0.1} &\quad y^{i}_{0.1} \\ \vdots &\quad \vdots \\ x^{i}_{5.0} &\quad y^{i}_{5.0} \\ \end{bmatrix} \tag{18}\end{align*}
The predicted trajectories \begin{align*} e^{i}_{t} = \begin{bmatrix} e^{i}_{x, t} & e^{i}_{y, t} \\ \end{bmatrix} = \begin{bmatrix} |x^{i}_{t} - \hat {x}^{i}_{t}| & |y^{i}_{t} - \hat {y}^{i}_{t}| \\ \end{bmatrix} \tag{19}\end{align*}
Variables \begin{align*} E_{t} = \begin{bmatrix} E_{x, t} & E_{y, t} \\ \end{bmatrix} = \begin{bmatrix} e^{i}_{x, t} & e^{i}_{y, t} \\ \end{bmatrix}_{\forall i} \tag{20}\end{align*}
At this point, we want to re-emphasize, that although this way of evaluating the performance produces easy to interpret results, it disregards that our original outputs (i.e. spatial probability densities) contain much more information than a single point estimation.
B. Results & Discussion
Fig. 11 shows the performance of the selected combinations of classifiers and mixing strategies (highlighted in Tab. IX) at a prediction horizon of 5 s for the longitudinal (
Visualization of the error distribution (left) in longitudinal and lateral direction and the median lateral error as function of the prediction time (right).
As the plots indicate, our position prediction system is able to produce results comparable to the ones with a perfect maneuver classification, in both lateral and longitudinal direction. Additionally, the plots show that we are able to clearly outperform simple models as CV and reach a very small median lateral prediction error of less than 0.21m at a prediction horizon of 5 s. As shown in Tab. X, this is remarkable compared to other approaches. Note that we did not include studies in this compilation, which report the root-mean-square error (RMSE), which we quantify with a value of 0.64m. On one hand, we follow [34], which points out that RMSE measures do not allow for a comparison over different data sets, as the values depend on the size of the data set. On the other, the challenge tackled by us (cf. Sec. I-A) is to predict the probability distribution of future vehicle positions rather than single shot estimates. Consequently, we did not optimize the predictions to minimize
As shown in [3], these results are dominated by the most frequent maneuver class (
As can be seen, the errors for the lane change maneuvers are considerably larger than the ones for lane-following. On one hand, this can be explained with the more complex regression task. On the other, the predictions are subjected to higher uncertainties in case of a lane change, as shown by the predicted distributions (cf. Fig. 12). As opposed to that, the uncertainty is ignored in the single point estimates. Note that the increased uncertainties are caused by the lack of knowledge on the exact point in time at which the maneuver will be completed. This even holds true, if the classifier made the position prediction to know about an upcoming lane change.
Predicted probability distribution of future vehicle positions for an illustrative situation.
Complementary to these quantitative evaluations, we performed qualitative testing and visualized single situations along with our predictions. To illustrate this, we attached a short video and present a single frame in Fig. 12. More precisely, Fig. 12 shows the predictions during an upcoming lane change, along with the described uncertainties. In addition, we show the confidence of our predictions (
Summary and Outlook
This work introduces a machine learning workflow that enables calculations of long-term behavior predictions for surrounding vehicles in highway scenarios. For the first time, a combined compilation of prediction techniques for driving maneuvers and positions as well as lateral and longitudinal behavior is presented. The developed modules are evaluated in detail based on a large amount of real-world data, challenging established state-of-the-art approaches.
To further improve the quality of the presented behavior predictions, especially in complex situations, we are working on various enhancements and conducting additional studies. Currently, we migrate the prediction strategies to an experimental vehicle to enable detailed investigations regarding run time as well as resource usage. Meanwhile, we are about to apply our models to predict movements of surrounding vehicles in contrast to ego-vehicle movements. Besides, we plan to apply our predictor to a publicly available data set as highD [35] or NGSIM to improve comparability. In addition, we want to investigate up to which maximum prediction horizon (beyond 5 s), the maneuver detection produces useful insights.
Moreover, we see high potential in identifying demanding scenarios and explicitly integrating contextual knowledge (e.g. weather, traffic, time of day or local specialties) into our models. First experiments towards this direction have proven, that contextual properties can have a considerable impact on driving behavior.
ACKNOWLEDGMENT
The authors would like to thank Mercedes-Benz AG Research and Development for providing real-world measurement data, which enabled us to perform our experiments. Furthermore, they would like to thank the Institute of Databases and Information Systems at Ulm University as well as Prof. Dr. Klaus-Dieter Kuhnert from the Institute of Realtime Learning Systems at the University of Siegen for supporting our studies.