A Comparative Evaluation of Probabilistic and Deep Learning Approaches for Vehicular Trajectory Prediction

This work compares two innovative methodologies to predict the future locations of moving vehicles when their current and previous locations are known. The two methodologies are based on: (a) a Bayesian network model used to infer the statistics of prior vehicles, trajectory data that is further adopted in the estimation process; (b) a deep learning approach based on recurrent neural networks (RNNs). We present experimental results obtained with both prediction methodologies. The results indicate that the prediction accuracy is improved in both methods as more information about prior vehicle mobility is available. The Bayesian network-based method is advantageous because the statistical inference can be updated in real-time as more trajectory data is known. On the contrary, the RNN-based method requires a time-consuming learning task every time new data is added to the inference dataset. However, the RNN achieves a higher prediction accuracy performance (3% to 5% higher). Additionally, we show that the computational cost to predict the next position a vehicle will move to can be substantially reduced when the Bayesian network is adopted, a scenario where the RNN method requires more computational time. But when the quantity of prior data used in the prediction increases, the computational time required by the RNN-based method can be two orders of magnitude lower, showing that the RNN method is advantageous in both accuracy and computational time. Both methods achieve a next position successful prediction rate higher than 90%, confirming the applicability and validity of the proposed methods.


I. INTRODUCTION
The massive adoption of mobile devices are supporting a plethora of location-based services, which have allowed the generation of a very high volume of spatio-temporal data and motivated the design of many location-based services, ranging from data traffic offloading [1] and data delivery [2], urban planning [3], transportation optimization [4], the recommendation of points of interest [5], and analytical facts about the distribution of the population in a specific area [6]. Several works have studied human mobility [7] and mobility of vehicular networks [8], [9]. Mobility is useful in several applications, such as the prediction of public transportation passengers' flow [10], the increase of vehicular safety protection [11], and to estimate the spread of contagious diseases such as COVID-19 [12], [13]. Most of geolocated datasets available in the community are based on GPS traces sampled over time [14].
The prediction of vehicular trajectories is usually based on prior vehicular data, particularly the latest visited positions, as considered in [1], [15]- [19]. The prediction is useful for several purposes, ranging from the computation of the estimated travel time of a given route, the support for route planning or the computation of vehicular staying time at particular locations [20], [21]. The use of prior vehicular data was also considered in [16] to forecast the most likely potential passenger for taxi drivers.
In the literature we find several probabilistic-based methods to represent and predict vehicular mobility [4], [5], [22]- [25]. Regarding the forecast of mobility aspects, the models can be categorized into two main categories: short-term and longterm trajectory estimation models. The short-term forecasting models are usually based on prior information [26], consisting of one or two previous locations visited by the vehicle plus the current location [27]. Regarding the long-term trajectory models, they predict the trajectory positions at a longer time horizon [28]. The majority of the works published so far are focused on short-term forecasting of vehicular trajectories in urban areas, where the vehicular motion is constrained by traffic lights, complex roads with segments, pedestrians, intersections, and traffic density. Due to the random nature of the vehicles' location over time, long-term prediction models are more difficult to achieve higher prediction accuracy for urban vehicles' mobility [29]. More regular motion patterns lead to more accurate predictions [30], as is the case of deterministic journeys such as the ones traveled by buses. However, in some mobility scenarios regular motion can not be assumed, as is the case for taxis' trajectories, because the passengers' pick up, drop off, and journeys exhibit a higher level of randomness. In these cases, the forecast of future locations is more challenging due to the higher level of uncertainty.

A. MOTIVATION
This work is motivated by the lack of performance comparative results of different vehicular trajectory prediction techniques. More specifically, we aim at answering the following research questions: r How Bayesian-based and neural network-based prediction schemes behave in terms of prediction performance and computational cost?
r What is the influence of the quantity of prior data used in the trajectory prediction?
r How different is the prediction accuracy and the computational performance for short-term and long-term estimates? We start with a detailed description of two different prediction methodologies compared throughout the paper, based on a Bayesian network model and a LSTM neural network. In a first step, we compare the performance of the two proposed methods for short-term predictions, considering the forecast of the next location. Then, we compare both methods for longterm predictions by analyzing the prediction performance of the next five future locations assuming periodic sampling. An additional goal is to decrease the computation time by decreasing the observation state space's size through a preprocessing algorithm that divides the travel region into multiple cells. The cells are used to define the vehicle's trajectory represented by a set of sub-trajectories (sequences) composed of the consecutive traveled cells. Regarding the Bayesian-based method, we adopt a Hidden Markov inference model to capture the taxis' mobility's statistical properties. The prediction approach is based on an improved version of the Viterbi algorithm that computes the most likely sequence of future locations given a sequence of prior locations. Regarding the neural network-based approach, we propose a LSTM neural network where the initial locations of the sub-trajectories are used in the network learning process to estimate the latest positions of the sequence.

B. NOVELTY AND CONTRIBUTIONS
The main novelty of this work is the comparison of two techniques for vehicle mobility prediction, so we can answer the research questions that have motivated the paper. The comparative analysis is carried out for short-term and long-term predictions and characterizes the influence of the number of prior locations in the prediction performance (longer/shorter length sequences). The contributions of this work are listed as follows: r The trajectories represented by sampled GPS coordinates are converted into geographic cells so that the spatial data can be downsampled for increased performance. For a fairer comparison, the same downsampled data is adopted in the performance evaluation of both techniques; r The first technique relies on an innovative Bayesian network represented by a Hidden Markov model (HMM), where the hidden states represent a single sequence of locations, thus embodying the Markov relation between prior visited locations to capture the sequential relation of each trajectory. The prediction relies on an improved version of the Viterbi algorithm, OPTVIT, that achieves a lower computational time while maintaining the optimal prediction performance; r The second technique compared in this work is based on recurrent neural networks, more precisely a LSTM network, to attenuate the gradient problem that can have a significant impact on long-term prediction of sequential data; r The prediction techniques are assessed using a dataset of real traces, comparing the OPTVIT algorithm and the LSTM approach for a variable quantity of prior data used in the prediction, as well as for short and long-term predictions. The computation times and prediction performance are reported based on the experimental results; r A final contribution has to do with the experimental results. They show that: (i) a higher quantity of prior data always improves the prediction accuracy of both techniques; (ii) the Bayesian approach can achieve a lower computational cost for short-term prediction and a similar prediction performance; (iii) For the long-term horizon the prediction accuracy decreases linearly with time; (iv) for long-term predictions, the proposed neuralnetwork is effectively a better solution, as it jointly achieves higher prediction accuracy and lower computational time.

C. PAPER STRUCTURE
Regarding the paper's organization, Section II presents an overview of selected works in the field. Section III introduces the definitions adopted in the prediction techniques, states the problem to solve, and describes the adopted spatio-temporal model and its analytics. Section IV presents the Bayesianbased technique to model and predict the locations of the trajectory. Section V presents the deep learning approach and the details of the LSTM neural network. Section VI describes the comparison of the performance achieved by the different techniques, and Section VI concludes the paper.

II. RELATED WORK
Several works have been proposed to address the problem of mobility prediction, and particularly vehicular mobility estimation [31]. Multiple methodologies were proposed so far, including but not limited to fundamental concepts of Information Theory [20], [32] and traditional Markovian-based predictors [15]. The categorization of the existing mobility prediction approaches was addressed in [33]. The majority of the works proposing mobility prediction schemes can be divided into Bayesian Network-Based Methods, Neural Network-Based Methods, and Markov-Based Methods.
The Markov-Based methods are based on the Markov property, which states that the probability of traveling to a future position depends only on the current one. The majority of vehicular prediction schemes are based on Markov-based methods [2], [4], [5], [15], [22]- [24]. The work in [4] adopts the Markov property in a hidden Markov model to characterize the prediction performance of a fixed number of trajectory segments. More recently, the problem of sparse trajectory data was addressed in [24], which has proposed a prediction scheme that makes use of group mobility statistics to increase the prediction accuracy. The sequences of the different trajectory locations are used in [24] to run spatial clustering algorithms capable of classifying them in different groups, which are employed in a variable-order Markov model that estimates the trajectory. However, the use of group trajectory data can lead to bad results because it effectively relies on crowd behavior only. The crowd mobility was also used in [23] to improve individual trajectory estimation by dividing the spatial region into a set of points of interest. Other works have considered Markov-based models with data enrichment, as in [5] where the travelers living habits are taken into account in the prediction process.
Contrarily to the Markov-based methods, the Bayesian Network-Based methods are not only based on the current position but also on the sequence of positions that have preceded it. Bayesian inference is employed to characterize the likelihood of a given vehicle trajectory given prior observed location or locations, thus using the statistics of the historical trajectory data. The adoption of a Bayesian inference model is described in [34] for location prediction, which takes into account multiple predictive factors to enhance the prediction performance, such as road topology information and motion information. The work in [35] identifies the patterns containing regions frequently visited to build a Bayesian network to estimate future locations. Differently, [36] proposed a dynamic Bayesian scheme that represents the consecutive occurrence of observable random variables to estimate the future user's location. Although Bayesian network-based methods are easily implemented, the sparsity of trajectories represents an additional challenge that usually results in high computational costs. Consequently, the methods based on Bayesian inference are most of the time improved with the adoption of other techniques for enhanced performance [27].
Deep learning has also been used to predict vehicular trajectories. Although neural networks need a long time to learn from data, the inference model to predict the mobility benefits from its low computational complexity and is run in deterministic time. Neural networks are gaining popularity in mobility prediction when mobility data is available in a centralized way [17]. Neural networks were adopted in [37] to predict vehicular trajectories using a Multi-Layer Perceptron. The work in [38] tackled the estimation of taxi trajectory destinations based on convolutional neural networks. Innovative neural network models have been developed in the last years and have already been used for mobility prediction, including long short-term memory (LSTM) recurrent networks and generative adversarial networks (GAN). LSTMs were proposed in [39] to predict the vehicular trajectories in highways. GANs were used in [19] to predict vehicular trajectory position and speed in urban scenarios. The adoption of different neural network models combined in a single architecture has also been used to address mobility prediction, as in [16], where a LSTM model is combined with a convolutional one to forecast the most likely potential passenger for taxi drivers.

III. PROBLEM DEFINITION AND DATA MODEL A. PROBLEM DEFINITION
The work considers multiple vehicles moving on a spatial region, delimited by a grid map. Fig. 1 illustrates a grid map with 16 cells. The grid map is divided into two-dimensional geographical sub-regions designated as cells, denoted by c η . The location of a vehicle is sampled periodically and linked to a cell. Spatio-temporal trajectories are generated when the vehicles move from the starting point to the endpoint of a journey. Considering that each journey has a variable duration, the trajectories are represented by fixed-length sequences of cells to guarantee a coherent granular temporal basis. The notation and a few definitions adopted in this work are next introduced to provide practical insights into the proposed approach. In the trajectory, c k η denotes its k-th cell. Finally, we highlight that a trajectory is formed by a non-constant number of j cells, j > 1.
While a trajectory represents the total number of locations traveled by a vehicle, in this work we introduce a subset of trajectory locations denoted as a sequence. The sequences are the subsets adopted in the prediction algorithms by partitioning the trajectory in multiple sequences. will have a significant number of identical sequences. The symbol represents the number of unique sequences found in .
Definition 5: The prediction problem uses the knowledge of the − β cells observed so far (cells of a given sequence S κ ) to predict the next β cells of the sequence.
The symbols previously defined and adopted in this work are represented in Table I.

B. MOBILITY DATASET PREPROCESSING
This subsection describes the method to convert the GPS raw data representing a trajectory into sequences. The sequences are characterized in Section III-C. The offline preprocessing of raw trajectory data plays an essential role in prediction performance. The data preprocessing algorithm that transforms the data is next introduced. Its primary purpose is to define the set of sequences ( ).
We consider that the trajectories are no longer defined by a set of GPS coordinates but as a set of cells, simplifying the process of describing trajectories. Since each cell contains several GPS locations this assumption can be seen as a downsampling of the spatial positions. Each pair of GPS coordinates is mapped into a cell of the map, c η , i.e., 1 ≤ η ≤ 16 cells depicted in Fig. 2. Each trajectory T j = {c 1 η , c 2 η , . . ., c j η } has a variable number of locations represented by j cells. Each trajectory is divided in one or more sequences. A sequence S κ = {c 1 η , c 2 η , . . ., c η } represents a set of consecutive cells. The number of cells ( ) of each sequence κ ∈ {1, . . ., } is maintained fixed for all sequences in the dataset.
After representing each trajectory into a set of cells, the Algorithm 1 is run to define the set of sequences from raw data. In line 2 it is evaluated if the number of cells forming the sequence S κ is greater than 1. In line 7, the algorithm identifies the j cells that form each trajectory, which are added in line 8 to each sequence S κ until reaching the number of sequence cells ( ). The sequence S κ is then copied to the set of sequences (line 11). The procedure is then repeated for all T j trajectories.

C. CHARACTERIZATION OF THE MOBILITY DATA
This subsection assesses the set of sequences, , computed in Algorithm 1 for the mobility dataset. Table II indicates the amount of unique sequences ( ) and non-unique sequences ( ) considering various sequence lengths, i.e., = {4, 8, 12, 16, 20}. As indicated in Table II, the number of sequences ( ) increases for shorter sequences (when decreases). Additionally, it is observed that the number of unique sequences ( ) increases for longer sequences.
To characterize the occurrence of each unique sequence S κ in the dataset, we plot the cumulative distribution function (CDF) of the unique sequences in Fig. 3. In the figure, the most likely sequences are ordered in descending order in the x-axis. The y-axis represents the cumulative probability of each sequence (S κ ). As illustrated in Fig. 3, different CDFs are achieved for the different lengths of the sequences ( ), confirming that lower sequence occurrence probabilities are observed for longer sequences. The increase of is beneficial for the prediction because the vehicles' trajectories are described by sequences with lower occurrence probabilities, increasing the diversity of unique sequences.

IV. BAYESIAN NETWORK MODEL
This section describes the Bayesian network model. The model contains two different stages: 1) the statistical inference; and 2) the mobility prediction. The inference stage is supported by an HMM model, and it is detailed in Section IV-A. Taking into account the inferred information, Section IV-B presents the mobility prediction algorithm OPTVIT.

A. INFERENCE STAGE
Each trajectory is modeled through a Markov chain that describes the transition between states. The set of states is represented by , and each hidden state is assigned to a unique sequence S κ . The transition probability between two adjacent hidden states represents the probability of traveling from S κ to S κ+1 and is defined as follows where #(S κ , S κ+1 ) represents the total number of transitions between two adjacent sequences {S κ , S κ+1 } that occur in all trajectories and #(S κ ) is the number of times that S κ occurs in all trajectories. The result of the transition probability a κ,κ+1 is stored in matrix A. The transition probability matrix A is an × matrix. The HMM describes the relation between hidden events represented by different random variables and the conditional relation of all observable variables with each hidden event. In real-time, the sequences are not fully observed, but only the current cell where a vehicle is located. Thus, the visited cells represent the observed events, while the possible sequences that characterize the vehicles' mobility represent the hidden events. In this section, the cells already visited by a vehicle are considered observable variables (prior information) to predict a sequence of locations represented by a hidden event.
Defining #(q κ , o n ) as the number occurrences of the state o n at the hidden state q κ , the computation of each element of B is as follows The HMM's initial distribution is denoted by , an 1 × vector where each element is computed through After the inference stage, we estimate the most likely hidden state through the proposed prediction algorithm described in the next section.

B. PREDICTION STAGE
In the prediction stage, we identify the most probable sequence of hidden states given a set of consecutive observations. The prediction can be seen as a decoding problem as follows, and the solution relies on the identification of the hidden states that maximize the probability P(O|λ). An optimal solution for the decoding problem is the traditional Viterbi algorithm (TDVIT). The algorithm can be divided into three main phases described as follows: 1) First, the initial probability π i is multiplied by the emission probability elements. Thus, the Viterbi variable δ 1 (i) is initialized with respect to the hidden state q i , i.e., δ 1 2) The forward variable δ 2: ( j) is obtained recursively with respect to each hidden state q j through δ 2: ( j) = max q 1 ≤q i ≤q {δ 1: −1 (i) a i, j b j (o 2: )}, which considers the transition probability from q i to state q j and the emission probabilities of state q j .
3) Lastly, for all δ T (i) Viterbi variables, the algorithm finds the Viterbi path with the maximum transition probability, i.e., P * = max q 1 ≤q i ≤q {δ T (i)}.

The algorithm TDVIT, denoted as Viterbi(O, λ), predicts the last β cells (c
. . , c η }, given that − β visited cells are already known. Its computational complexity is O( 2 ). The computational performance of the TDVIT is enhanced in Algorithm 2, referred as OPTVIT.
The rationale behind Algorithm 2 is the computation of only the first − β cells, i.e., (δ 1:( −β ) ( j)), for the first sequence S j stored in ζ . First, the algorithm copies to S test the { − β} cells (line 1). Then, it is copied to ζ the unique sequences in T test beginning with the sequence in χ (line 2). Thus, ζ contains all sequences in which the latest β cells are hypothetical candidates for the prediction. Each S j ∈ ζ (line 3) is checked (line 6) to verify if the variables δ 1:( −β ) ( j) were already computed.
Because all S j ∈ ζ share the first − β observable states, i.e., {o 1 , o 2 , . . . , o −β }, δ 1:( −β ) ( j) are only computed for S 1 (line 7). For S j>1 only the Viterbi variable (δ ( j)) is computed (line 10). In lines 11 and 12 the pair [P * , S j ] is stored in ϒ and R. R is used to store all probabilities P * associated with the different sequences, so we can avoid computing them again. The probabilities already computed in P * are used in line 14. Finally, in line 16 the algorithm identifies the pair [P * , S j ] exhibiting the highest P * value, which corresponds to the most likely sequence S pred .

V. DEEP LEARNING APPROACH
This section describes the structure of the LSTM neural network and specifies the enhancements involved in the training phase to obtain accurate vehicle trajectory predictions.

A. LSTM STRUCTURE
We use an LSTM recurrent neural network to solve the vanishing gradient problem that can have a significant impact on dealing with long-term sequential data [41]. The LSTM is well-known for sharing the cell states across each forward step, making the architecture ideal to deal with trajectories, where it is possible to reinforce important patterns or discard the redundant ones.
In Fig. 4, we show the structure of the adopted LSTM neural network. As can be seen, the LSTM layer is composed by − β discrete time steps and admit an input vector X i (i ∈ {1, . . ., − β}) for each step, containing the information of the visited cell. We use one-hot encoding to generate X i . Thus, X i is a 1 × N one-hot vector with the value 1 assigned to the index η, used to identify the cell of the grid map, and zeros in the remaining vector positions. Given an input sequence represented as {X 1 , ..., X −β }, the proposed LSTM network computes the output Y ( −β+1): . The output is an one-hot encoding matrix (β × N) where each row j ∈ {1, . . ., β} contains the information of each of the β predicted cells. The index η of each row that contains the single 1 will correspond to the predicted cell c ( −β+1): , or the labeled output during the learning stage. In the LSTM layer, we adopt 16 LSTM units for each step. Since the structure of the LSTM unit may adopt different models, we follow the one proposed by Hochreiter & Schmidhuber [41]. The structure of the unit is composed of the operations involving the input, output, and forget gates.
To finalize the LSTM structure, we adopt a Sigmoid function in the Dense Layer. The Sigmoid activation function is a logistic function that is useful in the prediction of one-hot vectors [41], as is the case of the desired output Y ( −β+1): .

B. LSTM TRAINING
To learn the vehicles' mobility patterns, during the training process we use the dataset containing the sequences described in Section III-A. We selected 70% of the dataset for To optimize the training phase, we adopt the Adam optimizer and the categorical cross-entropy as the loss function. We start the training phase with a learning rate of 0.00 001, and a fixed number of 100 epochs. Furthermore, we added an early stoppage algorithm where we selected two levels of patience, i.e., the training phase stops when two negative oscillations in the loss function occur. We decided to add the stoppage algorithm to avoid over-fitting in the model.
In Table III we present the LSTM structure and model configurations.

VI. PERFORMANCE COMPARISON
This section evaluates the performance of the prediction algorithms proposed in Sections IV and V. The evaluation methodology is presented in Section VI-A and the accuracy of the estimation process is discussed in Section VI-B and VI-C.

A. EVALUATION METHODOLOGY
The evaluation is based on the dataset described in Section III-B. We choose a grid map representation with N = 16 cells (Fig. 2), where each cell has a lateral size of 738 m and a longitudinal size of 980 m. Then, the raw data is filtered to take into account the trajectories starting and ending within the defined area of the grid map. Each trajectory is then characterized as a list of cells, and the set of sequences is computed to be used as input of the Bayesian network model and the LSTM network.
The method used to evaluate the prediction process is based on the outputs of Algorithm 2 and the LSTM recurrent neural network. The prediction performance is evaluated by comparing each predicted cell c i pred (i ∈ { − β + 1, . . ., }), with the cell c i test of the trajectory sequence in test, S test . The prediction performance is defined taking into account the cells correctly predicted, c i pred , as follows where (c i pred j , c i corr j ) holds 1 when the cell c i pred j is correctly predicted, i.e., is equal to c i corr j , and holds 0 otherwise. |T test | denotes the number of sequences in the set T test .
The estimation assessment is performed for five values of ( = {4, 8, 12, 16, 20}) and two values of β (β = {1, 5}). The prediction performance was characterized for a dataset (T test ) formed by 10 5 sequences randomly selected from the dataset of unique sequences ( ). The cumulative computation time was obtained for a smaller dataset also formed by 10 3 sequences randomly selected from the dataset of unique sequences ( ).
The experiments were deployed using the NumPy package in Python. Regarding the setup, we have run the prediction approaches in an Intel 8-core i7-9800X @ 3.8 GHz computer with 128 GB of memory.

B. SHORT-TERM PERFORMANCE
First, we evaluate the prediction performance for a single predicted cell (β = 1), meaning a prediction for a 15-second time horizon. Fig. 5 represents the prediction performance obtained with OPTVIT, and LSTM approaches for different values, confirming that the increase of the sequence length ( ) improves the prediction results. The prediction process achieves higher short-term prediction performance as the amount of prior mobility information (cells) increases, and this is observed for both OPTVIT and LSTM approaches. From Fig. 3, considering = 4, we know that the 8 most probable sequences in the dataset of unique sequences represent more than 60% of the occurrence probability. This fact partially justifies the high prediction performance values in Fig. 5 even for shorter sequences. The increase of from 4 to 20 increases the prediction performance even more. This is because the number of unique sequences increases with (see Table II) and less dominant sequences (in terms of probability) are obtained for higher values (Fig. 5). Consequently, the diversity of unique sequences increases with , and the mobility is more accurately described because the probability of occurrence of the sequences is not so dissimilar. Finally, the proposed methods (OPTVIT and LSTM) achieve approximately the same prediction performance of the next cell. Although the prediction performance of the TDVIT method is not represented in Fig. 5, its performance is equal to the OPTVIT approach.
As mentioned before, TDVIT, OPTVIT, and LSTM approaches achieve almost the same prediction performance for β = 1. However, we observe a significant difference in terms of the computation time performance of three methods, as   Fig. 6, where the prediction time (τ ) is evaluated for each sequence, considering a test set with 1000 sequences randomly observed in . The computation time plotted in the figure is the cumulative time to predict the number of sequences indicated in the x-axis. The average of the prediction time for the 1000 sequences is represented in Table IV. The results in Table IV and Fig. 6 show that the computation time achieves the lowest values for the OPTVIT algorithm. Moreover, the computation time and prediction time increase with , which is explained by the increasing number of cells that compose the observable state set. i.e., a higher amount of prior data is used in the prediction. In Fig. 6 we also plot the computation time for the Viterbi algorithm (TDVIT) without the enhancements proposed in OPTVIT. The results for TDVIT are only plotted for = 4 because the other computation times for = {8, 12, 16, 20} are significantly higher and are not comparable with the other two prediction methods.
Regarding the LSTM approach, since the duration of each computation of the neural network outputs is constant, the total computation time increases linearly with the number of sequences as shown in Fig. 6. However, the OPTVIT approach achieves lower computational times because it grows sub-linearly with the number of sequences. This is mainly due to the reuse of prior computations in the OPTVIT algorithm, which effectively leads to higher computational times for the first sequences but can be used afterward to avoid unnecessary computations associated with similar sequences.

C. LONG-TERM PERFORMANCE
Instead of predicting only the next cell, we analyze the prediction performance of OPTVIT and LSTM methods for the next five predicted cells. In the time domain, the prediction of a single cell occurs every 15 seconds. Consequently, the prediction of the next five cells is a prediction for the next 75 seconds. The results in Figs. 7 and 8 indicate the prediction performance for long-term predictions, considering the next five predicted cells (β = 5). Because different sequence lengths ( ) were adopted, for = 8 in Fig. 7 the predicted sequence cells are c 4 , c 5 , c 6 , c 7 , c 8 . For = 20 in Fig. 8 the  predicted sequence cells are c 16 , c 17 , c 18 , c 19 , c 20 . We recall that more prior information (15 cells) is used in the prediction of the scenario considered in Fig. 8, which compares with only 3 cells in the scenario considered in Fig. 7.
The results in Figs. 7 and 8 show that the prediction performance decreases for a longer time horizon. In Fig. 7 we observe that 87.04% of successful prediction rate is achieved for

TABLE V. Average Prediction Time Per Sequence
the next cell (estimated for a 15-second time horizon) adopting the OPTVIT algorithm. However, the prediction probability decreases to 54.88% for the fifth cell, i.e., for a 75-second time horizon. The decrease of the prediction performance with the increase of the time horizon is due to the increase of uncertainty associated with the long-term time duration.
By comparing the results in Figs. 7 and 8, we conclude that the LSTM method's performance is improved with the increase of , whereas for the OPTVIT method this trand only occurs from the 3 rd predicted cell (c 6 and c 18 in Figs. 7 and 8, respectively). Contrarily to the results achieved for β = 1, in which both methods present similar performances, for long-term predictions (β = 5) the LSTM method presents higher performance performance.
Additionally, we analyze the time performance of the proposed prediction methods for long-term predictions (β = 5). The cumulative computation times for the 1000 unique sequences are plotted in Fig. 9 and the average of the sequences' computation times are given in Table V. The results in Fig. 9 and Table V show that the cumulative computation time and the average prediction time favors the LSTM approach. Regarding the OPTVIT approach, the curves in Fig. 9 indicate a higher computation time for the first predicted cells (estimated for the first 100 sequences) and a decrease of the computation time for the remaining ones. This is due to the high computation efforts related to the computation of the Viterbi variables, which are computed once and its posterior computation is avoided by the enhancements proposed in OPTVIT. The cumulative computation time of the predictions obtained with the LSTM approach has a linear trend since no advantage is taken from sequences computed so far. For long-term prediction, the average computation times presented in Table V indicate that LSTM achieves an order of magnitude speedup for the adopted and β values. Consequently, the results show that both prediction performance and computation times benefit from the adoption of the LSTM approach for long-term vehicular trajectory prediction.

VII. CONCLUSION
This paper has proposed two efficient methodologies to predict vehicles' future locations: (a) a Bayesian network model that models the interaction between sequences (subtrajectory) and between the cells and the sequences; (b) a LSTM RNN that is adequate to deal with sequential data. We have compared the performance of both prediction methodologies for short-term and long-term predictions. The experimental results indicate that the prediction accuracy is improved in both methods as more observations as more prior information is used in the prediction process. The two proposed methods have achieved almost the same performance for short-term predictions. However, we have shown that the LSTM RNN is more suitable for long-term predictions. Additionally, we have compared the two proposed prediction methods' time performance, showing that the computation time of (a) is shorter for short-term predictions while (b) is shorter for long-term predictions.
Although the methodology (a) can predict the next location in a shorter time (as shown in Fig. 6) and does not require a time-consuming learning task every time new data is added to the inference dataset, the results reported in this work show that it is not recommended to predict more than a single location because the computation time can be several orders of magnitude higher than the methodology b). Regarding b), the main limitation is the learning task, which hampers the adoption of fresh prior data into the prediction stage.
As future work, it would be interesting to evaluate the computation performance and accuracy for different spatial sampling strategies and using other deep learning techniques with the main purpose of achieving a lower computation time without compromising the mobility prediction accuracy.

ACKNOWLEDGMENT
We thank the support of André Ip in the results reported in Figures 7 and 8.