Uncertainty estimation of pedestrian future trajectory using Bayesian approximation

Past research on pedestrian trajectory forecasting mainly focused on deterministic predictions which provide only point estimates of future states. These future estimates can help an autonomous vehicle plan its trajectory and avoid collision. However, under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy. Rather, estimating the uncertainty associated with the predicted states with a certain level of confidence can lead to robust path planning. Hence, the authors propose to quantify this uncertainty during forecasting using stochastic approximation which deterministic approaches fail to capture. The current method is simple and applies Bayesian approximation during inference to standard neural network architectures for estimating uncertainty. The authors compared the predictions between the probabilistic neural network (NN) models with the standard deterministic models. The results indicate that the mean predicted path of probabilistic models was closer to the ground truth when compared with the deterministic prediction. Further, the effect of stochastic dropout of weights and long-term prediction on future state uncertainty has been studied. It was found that the probabilistic models produced better performance metrics like average displacement error (ADE) and final displacement error (FDE). Finally, the study has been extended to multiple datasets providing a comprehensive comparison for each model.

Uncertainty estimation of pedestrian future trajectory using Bayesian approximation Anshul Nayak 1 , Azim Eskandarian 2 , Zachary Doerzaph 3 Abstract-Past research on pedestrian trajectory forecasting mainly focused on deterministic predictions which provide only point estimates of future states. These future estimates can help an autonomous vehicle plan its trajectory and avoid collision. However, under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy. Rather, estimating the uncertainty associated with the predicted states with certain level of confidence can lead to robust path planning. Hence, the authors propose to quantify this uncertainty during forecasting using stochastic approximation which deterministic approaches fail to capture. The current method is simple and applies Bayesian approximation during inference to standard neural network architectures for estimating uncertainty. The authors compared the predictions between the probabilistic neural network (NN) models with the standard deterministic models. The results indicate that the mean predicted path of probabilistic models was closer to the ground truth when compared with the deterministic prediction. Further, the effect of stochastic dropout of weights and long-term prediction on future state uncertainty has been studied. It was found that the probabilistic models produced better performance metrics like average displacement error (ADE) and final displacement error (FDE). Finally, the study has been extended to multiple datasets providing a comprehensive comparison for each model.
Index Terms -Uncertainty quantification, Bayesian Neural network, Monte Carlo dropout, Long short term memory, convolutional neural network NOMENCLATURE x, y x and y position of pedestrian from dataset u, v Pedestrian velocity along x and y direction x,ŷ predicted x and y position X, Y Training data and training label respectively x * , y * test data sample and predicted outcome θ Weight parameter of the neural network Σ xx , Σ yy covariance matrix along x and y direction σ x , σ y standard deviation along x and y direction µ x , µ y Mean along x and y direction y * Mean predicted path of trajectory distribution Σ y * Variance of trajectory distribution

A. Motivation
For a self-driving car, awareness of the surrounding environment is crucial for correct and safe maneuvering [1] [2]. Especially, complex maneuvers require a trustworthy estimate of future states of vulnerable road users (VRUs) like pedestrians and bicyclists [3]. Continuous progress has been made towards predicting the motion of the vulnerable users with a certain degree of effectiveness [4] [5]. However, most prediction models are deterministic and provide only point estimates of the future states [6]. Such assumptions may be helpful for specific scenarios but in a dynamic environment with multiple interactions, deterministic predictions can be inaccurate. Since, humans tend to change directions swiftly, deterministic predictions may fail to capture this randomness in pedestrian trajectory and thus ignoring the associated uncertainty with the motion. A more robust approach will be to provide a probability distribution based on the likelihood of pedestrian's location for each predicted state rather than a single point estimate. The uncertainty associated with predicted states can enable autonomous vehicles achieve uncertainty-aware motion planning that will be more robust and trustworthy compared to planning based on deterministic prediction. For instance, the deterministic prediction outputs point estimates of future states (Figure 1a). However, during longterm forecasts or multiple actor interactions, the deterministic predictions deviate from the ground truth. In such a scenario, the planning algorithm may be uncertain about the future states and the risk of collision will be significantly high. Therefore, a robust and trust-worthy planning algorithm requires improvement in confidence of the predicted states. This can be achieved through probabilistic prediction of states arXiv:2205.01887v1 [cs.
LG] 4 May 2022 with associated uncertainty (Figure 1b). The risk-aware region around the future state captures all the probable locations the pedestrian can be present at a future time with certain confidence. It enables planning algorithms to either completely avoid the region or plan intelligently to execute uncertaintyaware motion planning that is more robust and less prone to collision [7]. Hence, the current-study proposes a risk-aware motion prediction model for pedestrians which can be further combined with intelligent planning algorithms. The current model can perform long-term motion prediction with high positional accuracy. The uncertainty in motion is estimated in the form of a distribution of trajectories with mean and variance such that the ground truth mostly lies within 95% of the confidence interval of the predictions.

B. Related Work
In recent years, Deep Neural networks (DNNs) have been used extensively for pedestrian trajectory forecasting. Most of the networks are based on recurrent neural network (RNN) architecture which captures the temporal dynamics of sequential data [8]. Although, RNNs should retain complete information of a temporal sequence, practically they fail to propagate longterm dependencies. Hence, Long short-term memory, LSTMs [9] [10] have been used for sequential prediction due to its improved capability in back propagating long-term error. Seminal works like scene-LSTM [11] and social-LSTM [12] have used LSTM network architecture to incorporate either scene information or social pooling between multiple pedestrians for enhanced trajectory forecasting. However, most of the prior work on LSTM focused on improving prediction accuracy and did not stress on quantifying uncertainty. Recently, Convolutional Neural networks have been used for trajectory prediction. Especially, a fast convolutional neural network (CNN) based model compared a 1D convolutional model with LSTM and showed improvement in temporal representation of trajectory [13]. Further, Simone et.al [14] elaborated upon the previous work by introducing novel preprocessing and data augmentation techniques to outperform other complex models. However, both the models output deterministic estimates of future state and do not quantify uncertainty. The deterministic predictions may not be trustworthy in complex traffic scenarios. Hence, probabilistic inference of predictions can be useful in safety-critical tasks like collision avoidance [15] and uncertainty-aware motion planning.
Traditionally, Kalman filter has been used for uncertainty estimation [16] but it fails to capture non-linearities during long-term forecasts. Deep learning methods like Gaussian processes [17] and Gaussian Mixture Model [18] have been used for probabilistic forecasting too. For instance, a GP model with unscented Kalman Filter (UKF) was used for long term prediction of obstacle for collision avoidance [19]. However, Gaussian process kernels are infinite-dimensional and require extensive parameter-tuning for accurate prediction. Further, methods like stochastic reachability analysis has been performed by assigning probabilities to future states in the reachable set [20]. However, reachability analysis [21] often estimates all possible future states resulting in a large infeasible set. Recently, non-linear architectures like neural networks are used for probabilistic trajectory forecasting. A CNN based architecture was used to capture the uncertainty but for shortterm forecasts only [22]. Further, an LSTM architecture could predict long term probabilistic estimates of future states using an occupancy grid [23]. The problem was formulated as a multi class classification problem using softmax to quantify probabilistic distribution of future states over the occupancy grid. However, softmax often leads to over confident prediction and the model can be uncertain with high softmax output [24]. More sophisticated models like the Bayesian Neural networks (BNN) [25] have been used recently to capture the uncertainty in time series.
Typically, a BNN uses a distribution over prior to formulate a posterior distribution which is used to quantify uncertainty during prediction. It can capture three types of uncertainties, model uncertainty termed as epistemic uncertainty, inherent noise in data also known as aleatoric uncertainty, and model misspecification that occurs when testing data is different from the training dataset. In this paper, our main focus is to predict epistemic uncertainty associated with future trajectories. Although, Bayesian network accurately captures uncertainty, the inference becomes challenging due to a large number of model parameters. This often requires the incorporation of variational methods for Bayesian inference that can reduce computational costs. Recently, Monte Carlo (MC) dropout has been used as a variational method for uncertainty estimation in time series forecasting [26] without any significant change to the network architecture.
In the current work, we have elaborated upon this Bayesian approximation of using MC dropout for pedestrian uncertainty estimation. Our work primarily focuses on comparing the prediction performance of deterministic and probabilistic models. We have quantified and compared the uncertainty during trajectory forecasting using three popular neural network architectures for time-series forecasting namely LSTM, 1D CNN and CNN-LSTM. Our novelty lies in showing the importance of probabilistic forecasting of future states over deterministic predictions and also providing a detailed performance comparison between each probabilistic model with its deterministic model. To our knowledge, our work is also novel in the sense that we also show the effect of long-term forecasts as well as stochastic dropout of weights on uncertainty of future trajectory. Moreover, this work also provides a comprehensive performance comparison of both probabilistic and deterministic models on popular pedestrian datasets.
The remainder of the paper is organized as follows. In section II, we introduce the methods for probabilistic forecasting with Bayesian Neural network followed by describing each neural network architecture; the encoder-decoder model, the convolutional model and the CNN-LSTM model along with Monte Carlo dropout. In section III, we describe the data preprocessing, implementation details and performance metrics. In section IV, we discuss the results of uncertainty quantification and provide a comprehensive study on the effect of future prediction horizon and stochastic dropout on performance metrics. Section V concludes our study. Deep learning architectures have been used extensively for prediction tasks. However, most networks are deterministic generating point estimates without any confidence interval ( Figure 2a). Conversely, a Bayesian Neural Network (BNN) produces uncertainty-aware predictions based on a stochastic network. Although, stochastic networks might provide better point estimates than other standard neural networks, the main aim of stochastic networks is to provide trustworthy uncertainty estimates for predictions. Typically, a stochastic neural network introduces stochastic weights or activation functions ( Figure 2b) into the model and is trained using Bayesian approach as: P (H|D) is the posterior which implies the hypothesis, H is statistically updated using inference based on data, D. It is the main aspect of a stochastic network as it captures the model uncertainty known as epistemic uncertainty. The probability P (D|H) is called likelihood which captures the inherent noise in the data also known as aleatoric uncertainty. Meanwhile, P (H) is the prior while P (D) is the evidence. The prior samples from a stochastic distribution of weight with probability, P (H) unlike standard NNs which have deterministic weights ( Figure 2). As a result, the posterior distribution also becomes stochastic and the metrics can be estimated by computing the mean of predicted distribution alongwith its associated variance. Particularly, stochastic networks such as Bayesian neural networks can be used for time-series forecasting with applications expanding to risk-aware prediction of vulnerable road users. A BNN has several advantages over traditional neural networks such as estimating uncertainty, classifying the uncertainty into epistemic and aleatoric uncertainty as well as integrating the knowledge of prior into the model. For instance, provided a set of training input states, X = x 1 , x 2 , ..., x T and outputs, Y = y 1 , y 2 , ..., y T for a pedestrian, the distribution of its future predicted states, y * can be computed by marginalizing the posterior, p(θ|X, Y ) over some new input data point, θ represents weights and p(θ ) refers to the probability of sampling from prior weight distribution. During Bayesian prediction, computing the posterior, p(θ |X, Y ) can be quite challenging and many sampling algorithms like Monte Carlo Markov Chain (MCMC) and variational inference [27] have been used to approximate the posterior. But, most of the sampling methods are either computationally expensive or introduce a lot of parameters into the model which stems into a complex problem. Further, a BNN incurs additional computation costs due to its non-linearity. Hence, we use the Monte Carlo (MC) dropout method [24] which has been used to accurately approximate a Bayesian neural network without significantly changing the model. The MC dropout method assumes that weights are stochastically dropped during inference. This process is then repeated for N forward passes to generate a random distribution of predicted values. Therefore, we have applied Monte Carlo dropout to different neural network architectures for uncertainty quantification during predictions.

A. Encoder Decoder Model
We developed an encoder-decoder based simple LSTM architecture to predict future trajectory of pedestrians up to multiple time horizons. An encoder creates an embedding of essential features as a series of encoded space vectors which the decoder uses for estimating outputs. Suppose, {x t } T represents the x-position up to T time steps as {x 1 , x 2 , .., x T }, then, the encoder embeds the data into an encoded space using non-linear function as e = g(x). The decoder then uses the encoded features to construct F forward time steps, The current architecture has two LSTM cells for the encoder and one for the decoder (Figure 3). Through an ablation study, we found encoding both position and velocity of the pedestrian was beneficial for accurate prediction rather than encoding only position information. Hence, the input data for LSTM layers is a multivariate time series with four features {x, y, u, v} corresponding to x and y position and velocity respectively. The output of the neural network has similar features and the predictedx,ŷ show pedestrian's future position. For regularization, dropout of weights with a certain probability, p followed by tanh activation has been implemented. The steps are repeated for each LSTM layer to create the encoded vector space which carries the essential features of the training data. The decoder then uses the latent encoded vector to predict the future motion. The decoder architecture has a single LSTM layer that takes the encoded vector space as input. The LSTM layer is then followed by a dropout layer with linear activation to estimate the output.

B. Convolutional Model
Recently, convolutional networks like 1D CNNs have been used for time series analysis. The CNN model is a simple sequence-to-sequence architecture that uses convolutional layers to handle temporal representations. In the current 1D CNN model, we represent the past trajectory as a one-dimensional channel with four features, {x, y, u, v}. We constantly pad the input at each convolutional layer so that the number of features in both input and output remains same. We have used  Figure 4). We use a kernel size of 5 which showed better root mean-squared error (RMSE) in an ablation study with other odd kernel sizes {3,7}. A single dropout layer with probability, p is applied after the first two convolutional layers to prevent overfitting. Further, 'ReLU' activation function has been applied to all the convolutional layers. We observed a 1D Max Pool layer followed by upsampling performed better than global average pooling. Finally, for many-to-many predictions, time distributed dense layer has been used to predict multiple time steps simultaneously generating a single trajectory. Further, MC dropout can be applied during inference to generate a distribution of trajectories.

C. CNN-LSTM
A CNN-LSTM architecture is a hybrid network that uses both convolution and LSTM layers as an end-to-end model for time-series forecasting. Unlike the LSTM encoder-decoder model, one-dimensional convolutional layers are used for feature extraction to create an embedding rather than LSTM layers. The current model has two one-dimensional convolution layers with filters 128 and 64 respectively. Each convolution layer is followed by a dropout layer where weights are dropped with a certain probability, p to prevent overfitting. In order to process the data into the format required by the LSTM, a Flatten layer is connected after convolution. Further, the LSTM layer acts as decoder and utilises the features for prediction. Similar to previous models, a time-distributed layer at the end enables multi-step forecasting.

D. MC Dropout
Epistemic uncertainty during forecasting problems can be estimated from the mean and variance of the marginal distribution, p(y * |x * , X, Y ). Recall that X and Y represent the training input and output samples while y * represents the predicted states for some new test sample, x * . The marginal distribution is sampled from posterior, p(θ|X, Y ) based on test input, x * . The sampling process can be challenging and computationally expensive due to non-linearity in BNNs. Therefore, Monte Carlo dropout can be used as a variational approximation to such Bayesian Neural Networks [24] without modifying the existing network architecture much. Hence, model uncertainty known as epistemic uncertainty can be estimated with ease without any additional computational cost, unlike other inference methods.
As discussed, Monte Carlo dropout (MC dropout) has been applied to each network architecture during inference to quantify uncertainty (Figure 3,4). The MC dropout model applies stochastic dropout during test time to the neural network. Thus, for any new set of input, x * , we compute the inference by random dropout at each layer of the model. The probability of dropout is set as p and the inference model is run N times to obtain a set of outputs {y * 1 , y * 2 , ..., y * N }. We can then estimate the mean,ȳ * and variance,Σ y * of the marginal distribution where the variance indicates model uncertainty (Equation 3).

III. EXPERIMENTS
In this section, we discuss the datasets, data augmentation, implementation details for each network and the performance metrics. Following common practice from literature [12], we trained our models on publicly available pedestrian datasets. Two most popular datasets are the ETH dataset [28] which contains the ETH and HOTEL scene while the UCY dataset [29] which contains the UNIV, ZARA1 and ZARA2 scenes. In order to draw parallelism with past works [13], we studied 8 (3.2 secs) historical steps to predict 12 (4.8 secs) steps into the future. Further, we extended our study to predict multiple time horizons with long-term forecasts upto 8 seconds into future.

A. Data Augmentation
Initially, we trained our model on the ETH dataset only which contains approximately 420 pedestrian trajectories under varied crowd settings. However, a small number of trajectories is insufficient for training. Therefore, we performed data augmentation using Taken's Embedding theorem [30]. We used a sliding window of T = 1 step to generate multiple small trajectories out of a single large trajectory. For instance, a pedestrian's trajectory of 29 steps will result in 10 small {x, y} trajectory pairs of 20 steps each if past trajectory information of 8 steps is used for predicting 12 steps into future. In total, we constructed 1597 multivariate time series sequences which we split into 1260 training and 337 testing sequences for the ETH hotel dataset. Further, in Section IV-D, we train on the ZARA1, ZARA2 datasets too to provide a comprehensive comparison of uncertainty quantification across all the models.

B. Implementation details
All the neural networks are trained end-to-end using Tensorflow. Adam optimizer with a learning rate of 1e − 3 was used to compute the mean-squared error (MSE) loss. Each model was trained for 100 epochs with a batch size of 32. The LSTM encoder-decoder model was trained with 'tanh' activation while the other two models had 'ReLU' activation. 10% of the training data was used for validation. The MSE error was monitored on the validation loss with callback functions like EarlyStopping and ReduceLRonPlateau. The model was compiled and fit using train and test data.

C. Performance Metrics
The trained model is then used to predict the future position of the pedestrian. By default, the model predicts deterministic future states. However, probabilistic predictions can be inferred using Monte Carlo dropout. We can run the stochastic inference using MC dropout repeatedly to generate a distribution of trajectories ( Figure 5). The mean of the distribution represents the predicted path while the associated variance quantifies uncertainty. We adopt the widely used performance metrics [12] namely average displacement error (ADE) and final displacement error (FDE) for prediction comparison between the deterministic and probabilistic models.
whereŶ t is the predicted location at timestamp t and Y t is the ground truth position.

IV. RESULTS AND DISCUSSION
In this section, we discuss about the uncertainty estimation and performance metrics of pedestrian trajectory prediction. Initially, we quantify uncertainty in prediction using the probabilistic models based on the ETH dataset. We also provide a confidence estimation of our predictions with respect to the ground truth. Later, we have have shown the effect of varying forecast horizon and dropout on performance metrics for all the deterministic and probabilistic models. Finally, we provide a comprehensive comparison of performance metrics among all the models across multiple datasets.

A. Uncertainty Estimation
For predicting uncertainty and evaluating its trustworthiness, we generated a distribution of trajectories using MC dropout. We have shown the uncertainty estimation of a single pedestrian trajectory from the ETH dataset using the 1D CNN network ( Figure 5). For the current scenario, we predict 12 steps or 4.8 seconds into the future based on 3.2 seconds of past trajectory data. A single trajectory (red) is generated by deterministic prediction and provides point estimates of future states. However, we generate probabilistic predictions by applying MC dropout to each neural network model. For instance, the 1D CNN with MC dropout model was sampled N = 30 times during inference with a stochastic dropout, p = 0.2 (20% of the weights are randomly dropped) to generate a distribution of N different trajectories. The mean and variance of the distribution quantifies the uncertainty during trajectory prediction. The predicted trajectory distribution ( Figure 5) shows the pedestrian's motion along both x and y direction with predominant motion along x. Therefore, we need to quantify the mean and variance along both the directions and treat the predicted trajectory cluster as a bivariate distribution. At each prediction step, the trajectories can be represented as a cluster of N points distributed on the x-y domain. Assuming Gaussian distribution for each point cluster, we can then estimate the mean and covariance showing the associated uncertainty at each step ( Figure 6). The Gaussian representation of the predicted states thus shows the future states with associated uncertainty. Further, we can compute the mean and covariance of this bivariate Gaussian distribution as: Here, {µ x , µ y } and {Σ xx , Σ xy , Σ yy } represent the mean and covariance of pedestrian's movement along x-y domain respectively. Each covariance term shows the correlation of motion along one direction with respect to another. Our results indicate that the covariance, Σ xx is significantly higher compared to covariance along other directions (Figure 7). Similar observation was seen across all the trajectories of ETH dataset. This shows that uncertainty in motion is more along x followed by y and xy. As the covariance Σ xy is negligible, only the standard deviation, σ = √ Σ along x and y were considered to quantify the uncertainty during trajectory prediction. We compared the estimated uncertainty during prediction of each probabilistic model with the ground truth, (Fig 8). We have shown the mean predicted path with two standard deviation (2σ ) along x and y direction to quantify uncertainty. The model takes 8 input states (•, green dot) to predict 12 states into future. One can visualise the mean of the predicted distribution (•, blue dot) alongwith the standard deviations (+) along x and y for a single trajectory chosen from the ETH dataset (Fig 8). On inspection, it seems the uncertainty predicted by the CNN-LSTM model grows with prediction horizon while both LSTM and 1D-CNN models provide conservative probabilistic estimates that neither decease nor increase with time. Further, it appears that the final displacement error (FDE) is minimum for 1D CNN while it is maximum for LSTM. However, to obtain conclusive evidence on predictive accuracy of each model, we need to consider the performance metrics of all possible trajectories as well as estimate whether the ground truth lies within the 95% (2σ) confidence interval of our predictions.

Confidence Interval
We define a parameter known as confidence score (CS) to check whether the ground truth {x true , y true } lie within two standard deviation (2σ) of our predicted distribution.
Here, F represents the number of predicted states into future. We predict the confidence score along x and y for each test trajectory and then take the mean across all trajectories to obtain a single confidence score for that prediction horizon. Further, we also show the variation of confidence score with prediction time horizon, T F = 3.2, 4.8, 6.4, 8 seconds into future.
The LSTM model with MC dropout (gray) provides a high mean confidence score over other models for forecasts upto 4.8 seconds. For instance at T F = 3.2 seconds, the LSTM model has a mean CS x ≈ 90% which signifies the percentage of ground truth that lies within 2σ confidence interval of the predictions (Figure 9a). However, for long-term forecasts beyond 4.8 seconds, the CNN-LSTM model (light maroon) outperforms the other two models with a mean CS x ≈ 80%. Similar trend can be observed for CS y for both short-term and long-term forecasts (Figure 9b). However, the confidence score along y is less than x for each model. This shows the model is more confident and captures uncertainty effectively along the direction of predominant motion and suitable for longterm linear motions. However, when the motion is significantly small along any direction, the model is less confident and uncertainty is not captured accurately. Among all models, the 1D-CNN probabilistic model (blue) has the lowest confidence score along both x and y across all time horizons. Our results thus indicate that the LSTM model (gray) performs better for short-term probabilistic predictions while CNN-LSTM model (light maroon) has better long-term probabilistic accuracy. Further, the confidence score along y gradually decreased with prediction horizon for each model. This shows the positional accuracy along y for predicted states becomes more uncertain with increase in prediction horizon. To understand how prediction horizon affects uncertainty estimates, we studied its effect on performance metrics like ADE and FDE. Further, we compared each probabilistic model against its deterministic prediction. We also studied the effect of stochastic dropout probability, p on performance metrics.

B. Dropout
Stochastic dropout during inference is critical for inducing uncertainty into the model. The network weights are randomly dropped with certain probability, p at every inference generating a new trajectory. The whole inference process was repeated multiple times generating a distribution of trajectories ( Figure  5). For implementing probabilistic inference, we considered dropout layers in each neural network architecture. Meanwhile the deterministic predictions are obtained using the same neural network architecture that uses dropout layers only during training but not for inference. For the experiments, each probabilistic model was studied with dropout probabilities, p ={0.2, 0.3, 0.4, 0.5}. Our main aim is to quantify uncertainty and understand the effect of dropout probability, p on performance metrics for each probabilistic model and compare that to deterministic prediction. In Figure 10, we compare the uncertainty estimation for dropout probability, p = 0.2 and 0.4 using probabilistic CNN-LSTM model for a particular trajectory. The results indicate that with increase in dropout probability, the uncertainty in pedestrian state along x increased significantly. Since, more weights are randomly dropped, the variance associated with prediction also increases and thus the model becomes less certain during prediction. However, no significant difference in uncertainty was observed along y. Further, the mean predicted path is farther away from the ground truth at p=0.4 than at p =0.2 for the current trajectory. A more detailed analysis on variation of performance metrics with stochastic dropout has been provided below.
In Figure 11, minimum average displacement error, ADE=0.543 (less is better) was obtained at p=0.2. Further, the ADE increased with dropout probability till p=0.4. It is evident as a higher dropout implies more weights are randomly dropped from the architecture so the variance in trajectories would increase. This increase in randomness across the predicted distribution somehow generates a mean predicted path that has more ADE with the ground truth. Yet, further increase in dropout probability, p = 0.5 resulted in smaller ADE. Meanwhile, no significant variation was observed in final displacement error with change in dropout probability for the LSTM model.
However with LSTM dropout has an ADE = 0.819 which shows an improvement of 13.8% over vanilla 1D-CNN model with ADE = 0.95 ( Figure 12). The performance improvement was even higher for the probabilistic CNN-LSTM model with at least 20% improvement in ADE over vanilla CNN-LSTM model ( Figure 13) across all dropout values. Similar trend was also seen for the final displacement error where the FDE increased with dropout probability till p=0.4 and then decreased across all models. Overall, the probabilistic models 1D-CNN and CNN-LSTM showed significant improvement in both ADE and FDE over deterministic predictions across all dropout probability values. Further, the ADE and FDE increased till p =0.4 across all models which shows dropout induces uncertainty into the model as higher dropout rates should lead to more variance in trajectory distribution. However, further increase in dropout, p = {0.5} resulted in smaller ADE and FDE (less is better) which seems counter-intuitive.

C. Time Horizon
We compare the probabilistic and deterministic models for multiple time horizons into future. Based on a past trajectory of T = 3.2 seconds, we predicted the future states along with their associated uncertainty at T f = 3.2, 4.8, 6.4, 8 seconds. A constant dropout probability, p = 0.2 was considered for all experiments as it had the minimum average displacement error and final displacement error (IV-B). Our results indicate that minimum ADE and FDE occur for the smallest time horizon at T f = 3.2 seconds. Further, both ADE and FDE increased with prediction horizon across all models. It shows that irrespective of the models, error in prediction increases with the increase in prediction horizon.
In Figure 14, we compare the uncertainty in predicted future states at T f = 3.2 and 4.8 seconds respectively. Both plots indicate that the uncertainty grows with prediction horizon. Especially, at T f = 4.8 seconds, the variance along both x and y grow significantly with each predicted step when compared with the predicted trajectory at T f = 3.2 seconds. Further, we have compared the performance metrics based on the mean of predicted trajectories from probabilistic prediction with its deterministic forecast. For both probabilistic and deterministic models, both ADE and FDE increased with prediction horizon.
In Figure 15, LSTM with MC dropout model shows no improvement in performance metrics over vanilla LSTM. However, both 1D CNN and CNN-LSTM with MC dropout produce mean predictions that have lower ADE and FDE when   Fig. 17. Performance comparison between CNN-LSTM and CNN-LSTM with MC dropout with varying prediction horizon in uncertainty along lateral motion i.e. y. Further, both dropout and prediction horizon also play an important role on performance metrics. The mean predicted path from probabilistic models produced a better estimation of predicted trajectory and is more trust-worthy when compared to a deterministic prediction.

D. Quantitative Evaluation
In Table I, we compare the performance metrics, ADE/FDE based on our predictions for all the models. There are three probabilistic models trained with MC dropout and three standard neural network architectures which generate deterministic predictions. All the probabilistic models are inferred with a dropout probability, p =0.2. We observe that the novel CNN-LSTM model with MC dropout outperforms every other model across ETH and ZARA1 dataset. Its mean predicted path has minimum ADE/FDE (0.48/0.82) in ETH scene. We speculate that the CNN captures feature more efficiently than standard LSTM encoder. Meanwhile, the LSTM decoder utilises the feature information efficiently to generate predictions without any contextual cues (social pooling or scene information). However, the LSTM still performs better in the ZARA2 scene. Unlike other probabilistic models, we did not find any significant effect of MC dropout on predictions for the LSTM model over the standard deterministic model.

V. CONCLUSIONS AND FUTURE RESEARCH
In this paper, we presented a Bayesian approach using Monte Carlo dropout to quantify the uncertainty in pedestrian trajectory prediction. The method was evaluated on real-world pedestrian dataset to generate a distribution of trajectories instead of a single trajectory. The current results indicate that the mean predicted path of probabilistic model is better and closer to the ground truth than the predictions from deterministic models. Further, the effect of varying dropout probability and time horizon showed that both ADE and FDE increased. It implies the probabilistic models become less certain in its predictions with increase in either prediction horizon or dropout probability. However, the performance metrics of probabilistic models were better than deterministic models.
In future, we plan to improve the probabilistic predictions such that the ground truth should always lie within the predicted trajectory distribution. We will explore other Bayesian methods for uncertainty quantification or change the current neural network architecture [31] for accurate long-term prediction.