Interaction-Aware Short-Term Marine Vessel Trajectory Prediction With Deep Generative Models

Navigation safety is of paramount importance in areas with heavy and complex maritime traffic. Any ship navigating such a scenario should be able to foresee the future positions of other ships and adjust its path accordingly to avoid collisions. However, predicting future trajectories is a very challenging problem due to many possible future trajectories from the inherent uncertainty and the complex interaction dynamics between different ships. In this article, we propose a deep generative model based on the conditional variational autoencoder framework to learn marine vessel movement and predict future trajectories. The model is able to produce a multimodal probability distribution over future trajectories and model the complex interactions between vessels. Experiments are performed in two-vessel encounter scenarios from real-world automatic identification system data. The proposed model outperforms the baseline methods, including both kinematics-based and data-driven methods. The trajectories predicted by the proposed model are also analyzed to demonstrate the effectiveness of the model.

O VER the past few decades, intelligent marine transporta- tion systems have received increasing attention from the maritime industry.The development of digital twin [1], remotely operated, and autonomous ships is within such a context.They are expected to increase the efficiency of maritime transport while reducing fuel consumption and extending the operating window [2].Understanding vessel motion is a key skill of intelligent systems, which includes predicting the future trajectories of other traffic vessels.This prediction enables a range of downstream tasks, such as predictive planning, model predictive control, and collision avoidance.
The challenge of the accurate trajectory prediction for marine vessels arises from the complexity of human behavior and its diversity of internal and external stimuli [3].The future behaviors of marine vessels may be driven by their intent, the interaction with surrounding vessels, the environment, and traffic rules.Most factors cannot be directly observed and need to be inferred from noisy perceptual cues.Traditionally, the constant velocity model is used for vessel trajectory prediction, and the future position is simply extrapolated by its velocity and course.More advanced model-based approaches involve a Bayesian filter to estimate the acceleration or turning rate [4] and then assume these parameters remain constant.These methods apparently have difficulty in modeling the intent of the vessels as well as other stimuli and, thus, often lead to high prediction errors in the real world.
Pattern-based methods, especially machine-learning methods, are able to address the aforementioned complexities involved in trajectory prediction by learning from historical data.These methods learn motion behavior by fitting different function approximators to the data, ranging from hidden Markov models to more recently deep neural networks.Learning such models requires recording historical traffic data.For marine vessels, the automatic identification system (AIS) is used, which is an automatic tracking system that uses transceivers on the vessel and is used by vessel traffic services.Information provided by AIS includes unique identification, position, course, speed, etc.Its main purpose is to allow ships to view marine traffic in their area and to be seen by that traffic.The historical AIS data are also saved and it is usually publicly accessible from different organizations, such as coastal administration.Thus, these AIS data form a rich dataset for analyzing the behavior or traffic of marine vessels.It can also be used for learning trajectory prediction models, and different machine-learning models have been applied [5], [6] by using AIS data.
However, unlike human motion prediction [3], these models for marine vessels focus on long-term predictions (up to several hours) and, thus, do not take into account the inherent uncertainties of the predictions and interactions between ships.This may be due to the fact that long-term trajectory prediction mainly depends on the ship's destination and route.It is pretty deterministic when the destination and the route are known.However, this is not the case for the short-term prediction (up to several minutes) as it is heavily influenced by these two factors.For example, in the case of an encountering ship, as shown in Fig. 1, whether the ship gives way or passes directly depends on the behavior of the encountering ship.Also, an aggressive captain might only alternate the course slightly in a give-way situation, while a conservative captain would alternate the course in a much larger manner, which results in the inherent uncertainties (aleatoric uncertainty) of the predictions.Therefore, it is important to consider ship interactions and prediction uncertainty for short-term ship trajectory prediction.Note that the uncertainty can be divided into aleatoric and epistemic uncertainty.Uncertainty in the predicted trajectory comes from unknown stimuli, such as the captain's intentions, and cannot be removed by collecting more data.Therefore, it is regarded as aleatoric uncertainty.To model this uncertainty, predicted trajectories are treated as distributions and modeled by a variational autoencoder.
In this article, we propose a novel model based on deep neural networks for short-term trajectory prediction of marine vessels.In particular, we approximate the predicted aleatoric uncertainty with a deep generative model: Conditional variational autoencoder (CVAE).It includes a latent random vector to represent the uncertainty.Multiple future trajectories can be generated by sampling from this latent vector.The model is implemented in a sequence-to-sequence (seq2seq) manner with the use of the recurrent neural network (RNN) to better handle the sequential data.The interaction between the vessels is encoded as context for the CVAE model.The performance of the model is demonstrated with two-vessel encounter scenarios from AIS data.Although CVAE has been applied to the trajectory prediction of people and vehicles, it is the first time that it is applied to marine vessels.The main contributions of this article can be highlighted as follows.
1) A novel model is developed for short-term trajectory prediction of marine vessels, which includes the prediction uncertainty and the interaction between vessels.2) Extensive experiments are performed to validate the model and the detailed analyses of the future trajectory patterns generated by the model are conducted.CVAE is a deep generative model based on autoencoders.Other generative models, such as regression generative networks [7], lead to min-max optimization problems that are known to be unstable to train.The main focus of vanilla autoencoders is usually to compress data for downstream tasks, as shown in [8].And for CVAE, although it uses an autoencoder architecture, its focus is on model distribution.The rest of this article is organized as follows.Section II presents the literature reviews of trajectory prediction.The illustration of the proposed prediction model is given in Section III.In Section IV, experiments are conducted with AIS data to validate the model, and the results are shown and discussed.Finally, Section V concludes this article.

A. Human and Vehicle Trajectory Prediction
Trajectory prediction for humans and vehicles has been extensively studied in recent years, especially in the application domains of autonomous vehicles and service robots.Learningbased methods are one of the modeling approaches that have made promising progress recently.In particular, RNNs for sequence learning have become a widely popular modeling approach in such a context [9], [10], [11].Altche and de la Fortelle [10] use a long short-term memory (LSTM) network for highway trajectory prediction.Similarly, Park et al. [11] use an LSTM network as well but use an encoder-decoder structure to generate the future trajectory sequence in highway traffic scenarios.This kind of model is also called the seq2seq model.However, these methods only produce a single deterministic trajectory output, thus neglecting to capture the uncertainty inherent in the prediction process.Predicting multiple future trajectories or the distribution of possible future outcomes is critical for safety-critical systems, as it requires reasoning over many possible future outcomes to guard against worst-case scenarios.In [9], the future position is assumed to follow a Gaussian distribution to represent the distribution of possible future outcomes.This method is simple but not able to account for the multimodal futures' distributions.In such a context, a popular approach is to use deep generative models that model the future trajectories implicitly as latent variables.Gupta et al. [12] leverage generative adversarial networks to capture future distribution.The model consists of a generator and a discriminator network, and the generator outputs trajectory samples, which are then evaluated by the discriminator.Rhinehart et al. [13] use a flow-based generative model, while Ivanovic et al. [14] use CVAE framework.These generative models show promising results in generating multimodal distributions of future trajectories.The interaction between different agents also has a nonnegligible effect on future trajectories.Many approaches attempt to aggregate information from neighboring agents.Alahi et al. [9] model the interaction of pedestrians by sharing the hidden state of each individual RNN using a pooling mechanism.Salzmann et al. [15] represent the scene as a graph and aggregate the information from neighboring agents by an elementwise sum.In addition to the above two factors, more stimuli for the trajectory prediction, such as the target destination [16], can also be included.

B. Trajectory Prediction for Marine Vessels
In the maritime domain, the term trajectory prediction is used not only for the traffic vessel but also for the controlled vessel as in [17].This is due to the inaccurate dynamic model and uncertain environmental effects [18] on the controlled vessel.The major difference is whether the control command or future plan is available.In this article, we focus on the traffic vessel since neither the control commands nor the future plan is accessible.Unlike human and vehicle trajectory prediction, the prediction horizon of marine vessels is usually much longer (on the order of several minutes).Also, lanes are not designated as vehicles.The learning-based approaches have received increasing attention in recent years.Gao et al. [19] apply a similarity-based method to determine the destination point of the vessel from historical data and use an LSTM network to generate multiple support points to the destination point.The predicted trajectory is the cubic interpolation of the support and destination points.Capobianco et al. [6] developed a seq2seq model, where the encoder is a bidirectional LSTM network and the decoder is a unidirectional LSTM network.The attention mechanism between the encoder and decoder is utilized.Murray and Perera [5] developed a twostep approach.The historical AIS trajectories are first clustered using a clustering method and a local prediction model is built for each cluster.The local prediction model is similar to [6], which is a seq2seq LSTM network with an attention mechanism.Nguyen and Fablet [20] learn a prediction model based on the transformer architecture.But instead of learning a regression model, they discretize position into bins and learn a classification model.Multistep prediction is made by applying this model recursively.The majority of the research for ship trajectory prediction focuses on finding a suitable network architecture, either for long-term or short-term prediction.Liu et al. [21] apply a graph convolutional neural network to aggregate the information from surrounding vessels.The future position of the vessel is assumed to follow a Gaussian distribution.As a result, all of them except Liu et al. [21] use a deterministic approach that does not consider the inherent uncertainty of the prediction.However, the multimodel distributions of future trajectories are not considered in [21].Besides, most of them do not try to capture the interaction between different vessels, which could be nonneglectable in short-term prediction.In this article, we propose a model that includes the prediction uncertainty as well as vessel interaction and emphasize their importance.

III. GENERATIVE MODEL FOR VESSEL TRAJECTORY PREDICTION
In this section, a general CVAE model and the gated recurrent unit (GRU) are described and we apply them in the context of vessel trajectory prediction.Then, the core characteristics of the proposed CVAE trajectory prediction model are illustrated.

A. Conditional Variational Autoencoder
Given a dataset D = {(x i , y i )} N i=1 , the conditional generative modeling tries to fit a model of the conditional probability distribution p(y|x).Once fit, the model can be used to generate samples y given x, which can be used to represent uncertainty for trajectory prediction.In other words, the aleatoric uncertainty is modeled as a conditional probability distribution p(y|x).In this article, we consider p(y|x) to be defined by a fixed set of parameters, which we fit into the dataset with the objective of maximizing the likelihood of the observed data.In such a context, neural networks are often used due to their strong expressivity.The commonly used model includes CVAE [22] and conditional generative adversarial network (CGAN) [23].We choose to use CVAE because CGAN is harder to train and may suffer from mode collapse.
A CVAE is a latent conditional generative model.The CVAE consists of an encoder q φ (z|y, x) parameterized by φ and a decoder p θ (y|z, x) parameterized by θ.The encoder takes the inputs y and x and produces a distribution p(z|x), where z is the latent vector.The decoder uses x and samples from p θ (z|x) to produce y.The model can be described by To efficiently perform the marginalization in (1), a proposal distribution q φ (z|y, x) is used.The marginal likelihood in (1) becomes By taking the log of both sides in (2) and using Jensen's inequality, the evidence lower bound (ELBO) is derived as follows: where D KL is the Kullback-Leibler (KL) divergence.Therefore, instead of maximizing the log-likelihood directly, the ELBO is maximized.By using the reparameterization trick [24], the ELBO is tractable and can be optimized via stochastic gradient descent.The negative ELBO is, therefore, minimized and the loss for a single training example (x, y) is During training, the negative log likelihood (first term) is modeled as the mean square error.The distribution p(z|x) is modeled as a standard multivariate Gaussian distribution p(z|x) ∼ N (0, I).The loss is minimized to find the neural network parameters φ and θ.

B. Gated Recurrent Unit
The GRU [25] is an RNN with a gating mechanism to avoid gradient vanishing problems in RNN.It is similar to LSTM but has fewer parameters.Given sequential data x 1:T , the GRU processes the sequence by repeating the following function: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where x t and h t are the input and hidden state at time t, respectively.r t , z t , and n t are the reset, update, and new gates, respectively.W is the weight of the network.Note that LSTMs can be used instead of GRUs in this article.GRU was chosen because it can consider long-term dependencies while having fewer parameters than LSTM.

C. Interaction-Aware Trajectory Prediction
In order to model the complex trajectory prediction process of marine vessels, we are interested in learning a model for future trajectory prediction that satisfies the following desires.
1) The model is history dependent that the intent or future trajectory can be predicted from the past trajectory.
2) The interaction between the two encountering vessels is captured.
3) The model is able to generate multiple future trajectories to account for uncertainty in the forecasting process.The proposed seq2seq CVAE is illustrated in Fig. 2 .Here, we denote the ship that we want to predict its trajectory as the target ship, while the other one is called the own ship.Three modules are included in the model: An interaction history encoder, a future trajectory encoder, and a future trajectory decoder.All these three modules are parameterized with GRU.
1) Interaction History Encoder: The interaction history encoder is designed to encode the past information from the target ship and own ship into vectors.In particular, the trajectories of the target ship and own ship over the past 3 min are encoded using different GRUs.The interaction history encoder can be represented in the following function: where p t−δt 1 :t os and p t−δt 1 :t ts are the positions of the own ship and target ship from time t − δt 1 to t, respectively.The GRU() is the GRU network by applying (5) recursively and, thus, h os and h ts are the final hidden states.
2) Future Trajectory Encoder: The future trajectory encoder outputs the mean and variance of the latent vector z by encoding the future trajectory conditioning on the outputs of the interaction history encoder, which can be represented as follows: μ, σ = Linear h f ts , h ts , h os (7) where Linear() is a linear mapping.
3) Future Trajectory Decoder: The future trajectory decoder recursively generates the future position by taking the hidden state and the predicted position from the last time step.The initial hidden state and position are obtained as follows: The future trajectory, therefore, can be generated by recursively applying the following function: where gru() is the same function as (5).In summary, the interaction history encoder is designed to address 1) and 2).That is, the future trajectory depends on the past trajectories of both the own ship and the target ship.To address 3), we are interested to learn a conditional probabilistic distribution p(p t:t+δt 2 ts |p t−δt 1 :t ts , p t−δt 1 :t os ) based on the CVAE framework.In particular, the future trajectory encoder is only used in the training phase and not in the inference phase.In the inference phase, we instead randomly sample the latent vector z from the multivariate standard Gaussian distribution z ∼ N (0, I).Taking the random vector z and the conditional vectors from the target ship and the own ship, the future trajectory decoder is able to generate the future trajectory.Therefore, by randomly generating the latent vector z, we are able to generate multiple future trajectories that enable a future trajectory distribution.Algorithm 1 and Algorithm 2 present the pseudocode for training the model and generating trajectory from the model, respectively.

IV. EXPERIMENTS
In this section, we present experiments on AIS data in twovessel encounter scenarios to show the effectiveness of the proposed model.

A. Dataset
The vessel trajectory data are collected from AIS data.The raw AIS data are retrieved from the database of the Norwegian Coastal Administration (Kystverket).The raw data contain essential information on longitude and latitude coordinates, speed over ground, course over ground, true heading, and static information, such as maritime mobile service identity (MMSI) and ship length.Since the raw AIS data are broadcast at a different frequency and contain anomaly data and stationary ship data, the anomaly and stationary data are filtered out and the resampling and downsampling are performed to bring the AIS data to 0.1 Hz.
In this study, we focus on the encounters between ferries crossing from Horten to Moss and merchant vessels navigating from the North Sea toward Oslo or Svelvik.The trajectories of the ferries and the encountered ships are extracted.The data for the whole month of January 2019 are extracted as the training set, while the first ten days of February 2019 are used as the test set.This results in 173 encounter cases in the training set Fig. 3. Encountered trajectory data.The ferry travels from Horten to Moss, while the merchant ship travels from south to north.In such scenarios, the ferry is responsible to give way.

TABLE I DESCRIPTION OF THE ENCOUNTER TYPES IN THE DATASET
and 63 encounter cases in the test set.Each encounter case is approximately 35 min.The trajectories of the ferry and the merchant ship are shown in Fig. 3.In this article, only longitude and latitude coordinates are used.The goal is to make future trajectory predictions for ferries only since the dataset only contains ferries from Horten to Moss and merchant ships from bottom to top (see Fig. 3).In this case, the ferry is responsible for deciding whether to give way or go straight through according to the Convention on the International Regulations for Preventing Collisions at Sea.
Although more ships can be involved in an encounter and relatively large datasets can be easily collected since the AIS data are publicly available, we only focus on this small dataset with two-ship interactions because it is labeled and reviewed by experts.It is more convenient to analyze whether the model can predict the trajectory under different encounter situations.Future work will include more ships and larger datasets.Note that there are three types of encounters as defined in [26] in the dataset.Table I lists a detailed description of these three encounter types.While Type 2 is similar to Type 3 in trajectory, they differ significantly in how ships pass.

B. Implementation Details
For the interaction history encoder, future trajectory encoder, and future trajectory decoder, a two-layer GRU unit with a hidden size of 256 is used.The position of the vessel is normalized with z-score normalization.The Adam with decoupled weight decay regularization is used as the optimizer.A cosine annealing schedule with an initial learning rate of 1 × 10 −3 is used.The model is trained for 1000 epochs.

C. Quantitative Performance
In this part, the performance of the proposed model is evaluated quantitatively.The longitude and latitude coordinates in the dataset are actually converted to meters for training and testing the proposed model.
1) Baselines: The performance of our model is compared with several baselines.

1) Kinematic model (KM):
The model simply extrapolates trajectories with the assumption of constant speed and course direction.2) seq2seq model: The seq2seq model follows an encoder-decoder structures.The encoder is used to encode the past trajectory, while the decoder is used to generate the future trajectory.The encoder and decoder are parameterized by RNNs.Note that even though this model does not consider the interaction between agents, it has been used for trajectory prediction on vehicles [11] and vessels [27].We parameterize the model using a GRU with the same hidden size and layers as our proposed model.In addition, Monte-Carlo dropout [28] is used to generate multiple future trajectories from the model.3) Social LSTM (S-LSTM): S-LSTM [9] models that each agent by an individual LSTM with the hidden states at each time step is shared.Since only two vessels are considered, we do not use the pooling mechanism but directly share the hidden states.Besides, the LSTM is changed to GRU.This model is trained and inferred in an autoregressive manner.The future trajectories at each time step are assumed to follow a Gaussian distribution and the model is trained with negative log likelihood.2) Evaluation Metrics: Two different metrics that are widely used for trajectory prediction are considered: 1) Average distance error (ADE): The average Euclidean distance of all estimated and true points of the trajectory.2) Final distance error (FDE): The Euclidean distance between the predicted final destination and the true final destination at the end of the forecast period.Since the proposed model can predict the future trajectory distribution, we sample 50 trajectories from our model and compute the average trajectory to evaluate ADE and FDE.Furthermore, we evaluate the predictive distributions using the Best-of-N (BoN) ADE and FDE, which we denote as BoN-ADE and BoN-FDE, respectively.BoN-ADE and BoN-FDE were proposed in [12].This is a way to evaluate whether multiple predicted trajectories cover the true one.Here, we sample 50 future trajectories and compute their errors from the true trajectories to obtain the best five trajectories.The errors for these five trajectories indicate how well our multiple predicted trajectories match the true trajectories.
3) Results: Fig. 4 shows the Euclidean distance error at different prediction time horizons.It is shown that the error of the KM increases dramatically with prediction time horizons.When the prediction time exceeds 1 min, the proposed model provides the smallest error among all the baseline models.
In Table II, we compare the performance of our model with the baseline methods.The naive KM produces high prediction errors.The pattern-based methods clearly outperform the KM.While seq2seq has similar errors on ADE and FDE, it does not perform well on the BoN metric, suggesting that Monte-Carlo dropout has difficulty modeling predictive distributions.This may be due to the fact that the method is often used to approximate epistemic rather than aleatoric uncertainty.The S-LSTM and our proposed model outperform seq2seq models, demonstrating the importance of involving vessel interaction.The proposed model provides the smallest error among all metrics, which shows the superiority of the proposed model in modeling the behavior of marine vessels.

D. Trajectory Prediction Analysis
The quantitative evaluation shows that the proposed model outperforms other baseline methods.In this part, the actual behavior of the proposed model in different settings is analyzed.
1) Trajectory Prediction With Different Baselines: In Fig. 5, the prediction results of the proposed model and the baseline methods are shown in several random samples from the test set.
It is shown that the KM model produces high prediction errors, especially around nonlinear regions.The other learning-based methods are able to predict nonlinear behaviors.The sampled trajectories from the proposed model are able to match the real trajectory and the average of these sampled trajectories provides the least error.
2) Trajectory Prediction on Different Time Steps: Fig. 6 presents the predictions on random scenarios at different time steps from type 1, type 2, and type 3 encounters.The results show that across all encounter types, the predicted uncertainty is lower after the ferry passes the merchant ship, which fits with our intuition that the ferry only needs to focus on its destinations after passing.In addition, the predictions show that Type 2 and Type 3 have less uncertainty before passing than Type 1, possibly due to the ferry's ability to quickly recognize that it can safely pass the merchant ship without changing course.

3) Trajectory Prediction With the Change of Encounter Type:
To analyze the change of encounter, we sample 3 encounter scenarios from type 1, type 2, and type 3, respectively.The type 3 scenario is linearly interpolated to the type 1 scenario, as shown in Fig. 7(a).The type 1 scenario is linearly interpolated to the type 2 scenario, as shown in Fig. 7   the transition between these two encounters is not considered.In Fig. 7(a), it can be found that the merchant ship barely changes, and when the ferry approaches the merchant ship, the predicted trajectory changes from passing directly to giving way.Similarly, in Fig. 7(b), the ferry has little changes, and when the merchant ship moves further, the predicted trajectory changes from giving way to passing.These qualitative results demonstrate that our model is able to capture the interaction between two vessels.

E. Ablation Study
To validate the design choice of how we aggregate the interaction information, an ablation study is performed.In particular, we performed experiments on the following.
1) No interaction is considered.
2) The vectors from two vessels are summed.
3) The vectors from two vessels are max pooled.Table III presents that including the interaction improves the prediction performance.In addition, simply concatenating the vectors from two vessels provide the best performance in the proposed model compared with other aggregation methods.
To evaluate the lookback windows in Fig. 2, four different lookback windows were evaluated in addition to 3 min.As shown in Table IV, as the lookback window increases, the prediction error tends to decrease.It shows that the model successfully extracts temporal information rather than fitting temporal noise.However, the improvement in forecasts is not significant when the lookback window is longer than 3 min.

V. CONCLUSION
In this article, an interactive-aware short-term trajectory prediction model for marine vessels was proposed.The model followed the CVAE framework and, thus, was able to model the inherent uncertainty of the forecasting process.By sampling from the latent space, it can quickly generate multiple future trajectories.The interaction was encoded into a context vector for the model.The model was implemented in a seq2seq manner to account for time-series data.Experiments were performed on real-world two-ship encounter AIS data.The proposed method outperformed the baseline methods.In addition, we qualitatively showed that our model successfully models the uncertainty as well as the interactions.
Future work will extend our model to multivessel encounter scenarios over larger areas.More stimuli can be considered, such as ship types, semantic maps, and weather conditions.More broadly, there are also architectural considerations when large datasets are involved and integration with downstream planning tasks when predictive models are available.

Fig. 1 .
Fig. 1.Illustration of the possible trajectories in an encounter scenario.The red lines denote the possible trajectories, which depend heavily on the captain and the interactive ship.

Fig. 2 .
Fig. 2. Schematic illustration of the neural network architecture of a CVAE for vessel trajectory prediction.The solid lines denote all the processes for the inference, while the dashed lines represent the processes only used in training.

Fig. 4 .
Fig. 4. Prediction performance versus prediction time horizon.Since our model produced samples on the predicted trajectories, we computed the average trajectory to calculate the Euclidean distance error.

Fig. 5 .
Fig. 5. Illustration of the predicting trajectories based on different methods.We randomly draw nine samples in the test dataset.The black line indicates the ground truth trajectory of the vessel.
(b).Since the ferry has almost the same trajectory for Type 2 and Type 3 encounters, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 7 .
Fig. 7. Ferry trajectory predictions when the encounter type changes.(a) From type 3 to type 1 encounter.(b) From type 1 to type 2 encounter.

TABLE II QUANTITATIVE
RESULTS OF ALL THE METHODS ON THE DATASET (UNIT: M)

TABLE III EFFECT
OF DIFFERENT AGGREGATION METHODS (UNITS: M)

TABLE IV EFFECT
OF DIFFERENT LOOKBACK WINDOWS (UNITS: M)