Unfolding AIS Transmission Behavior for Vessel Movement Modeling on Noisy Data Leveraging Machine Learning

The oceans are a source of an impressive mixture of complex data that could be used to uncover relationships yet to be discovered. Such data comes from the oceans and their surface, such as Automatic Identification System (AIS) messages used for tracking vessels’ trajectories. AIS messages are transmitted over radio or satellite at ideally periodic time intervals but vary irregularly over time. As such, this paper aims to model the AIS message transmission behavior through neural networks for forecasting upcoming AIS messages’ content from multiple vessels, particularly in a simultaneous approach despite messages’ temporal irregularities as outliers. We present a set of experiments comprising multiple algorithms for forecasting tasks with horizon sizes of varying lengths. Deep learning models (e.g., neural networks) revealed themselves to adequately preserve vessels’ spatial awareness regardless of temporal irregularity. We show how convolutional layers, feed-forward networks, and recurrent neural networks can improve such tasks by working together. Experimenting with short, medium, and large-sized sequences of messages, our model achieved <inline-formula> <tex-math notation="LaTeX">$36/37/38\%$ </tex-math></inline-formula> of the Relative Percentage Difference– the lower, the better, whereas we observed <inline-formula> <tex-math notation="LaTeX">$92/45/96\%$ </tex-math></inline-formula> on the Elman’s RNN, <inline-formula> <tex-math notation="LaTeX">$51/52/40\%$ </tex-math></inline-formula> on the GRU, and <inline-formula> <tex-math notation="LaTeX">$129/98/61\%$ </tex-math></inline-formula> on the LSTM. These results support our model as a driver for improving the prediction of vessel routes when analyzing multiple vessels of diverging types simultaneously under temporally noise data.


I. INTRODUCTION
O VER the years, we have been experiencing a massive maritime vessel trajectory network 1 expansion powered by globalization and the evolution of transportation [1].Maritime navigation is essential in passenger transportation, tourism, and fishing [2]- [5].In addition, it has been historically used for trading between territories and countries worldwide [6]- [8].Over the centuries, many efforts have been focused on forecasting wind, waves, and weather to be prepared for non-ideal navigation conditions [9]- [11].However, ocean activities are far from controllable.In addition to climate-related risks, there exist significant concerns of piracy (e.g., armed robbery and hijackings), equipment defects, and ship collisions, among others [12]- [16].
A convenient way of preventing or responding to ad- 1 Referred in this work as a complex network, network, or graph.
verse events found in the sea is tracking vessels' trajectories through Automatic Identification System (AIS) messages [17], which are part of a more extensive system that monitors maritime navigation activity [18].These messages are transmitted over radio or satellite at ideally periodic time intervals [19], containing information on vessel identification and current status.Details such as geographical coordinates, course, and speed over the ground are also included [20], turning AIS into a supporting technology for vessel tracking with acknowledged relevance to ocean monitoring [21], [22].
The literature on transportation systems has been leveraging the volume AIS data and its overly sequential nature to develop a range of vessel trajectory forecasting techniques [23]- [26].The interest in forecasting trajectories comes from the capability to estimate vessel routes, which increases the safety and reliability of marine transportation [27] and enhances oceans' situational awareness [28].
Popular techniques employed to address those tasks are based on Recurrent Neural Networks (RNNs) [29]- [31], Auto-Encoders (AEs) [32]- [34], and Convolution Neural Networks (CNNs) [35].More recent techniques have focused on Graph Neural Networks (GNNs) [36] and Network Embeddings [37].Several techniques have focused on the impact of multi-directional [38] and multi-layer [39], [40] RNNs for enhancing forecasting tasks in a regression fashion.Others have opened the discussion on leveraging multiple trajectories to improve trajectory modeling and mobility pattern understanding [41].However, the related literature still lacks investigation when these scenarios merge and are composed of streaming data.These conditions are relevant to timesensitive tasks requiring near real-time inference abilities, which are challenging due to a lack of extensive data preprocessing possibilities to deal with outliers.
Trajectories' irregular timing is typically caused by transmission delays, lack of signal coverage, equipment defects, and interference despite vessels sending AIS messages periodically (i.e., every few seconds or minutes).The irregular timing is a preprocessing drawback to overcome when working with AIS data because it can bring inconsistency in picturing a clear vessel route, jeopardizing the maritime domain awareness [42].Moreover, in some cases, such behavior might be deliberate and related to irregular maritime activities [43], but those are usually exceptions among a population of AIS messages.Notice that such irregularity is tied to the vessel's AIS transceiver technology and whether the message will be captured by low-range radio or longrange satellite receivers.Vessels near the shore are usually captured by radio and far away by satellites.Working in a large geographical area, one would be subject to data from multiple sources, including different transmission behaviors.These are tied to the type and the location of the vessel transmitting and the receiver capturing the AIS message.
Previous works in the literature consistently adopted a trajectory interpolation approach to address this issue, which has been actively used as a resource for better trajectory planning and forecasting.Such an approach inserts virtual messages in the vessel trajectory to smooth the timing irregularity, allowing the trajectory to be strictly periodic [44], [45].Therefore, the authors transform the AIS data into a well-behaved discrete-contained time series (i.e., an ordered sequence).However, this approach can introduce uncertainty in vessel routes when the gap between two consecutive AIS messages is too large, which would alter the trajectory's data distribution and picture an inaccurate trajectory.This would be the case for vessels with mobility patterns different from in-line sailing, in which the geometry of the trajectory matters (e.g., fishing and military vessels).Such a disadvantage might provide modeling solutions not robust to outliers.
Our assumption lies in accounting for multiple vessels of varied types and the multiple numerical variables within the AIS message to overcome the timing irregularity and achieve better performance on the foreseen non-preprocessed AIS data, covering larger geographical areas regardless of the AIS message (i.e., either radio-or satellite-based) type.Using this approach, we intend to leverage information that is usually overlooked to increase the ability of the model to learn the intricacies of space (i.e., from where the vessel is transmitting) and time (i.e., since the last message received was acknowledged) to increase the model's generalization capability over different trajectories and mobility patterns.
Unlike traditional trajectory forecasting, smoothing, and compressing algorithms (i.e., series reduction), our focus is on the entire continuous-defined content (e.g., latitude, longitude, Course over Ground -COG, and Speed over Ground -SOG) of the subsequent AIS message in the transmission sequence rather than being concerned only with the next coordinates of the vessel.Therefore, we do not intend to replace traditional series reduction techniques such as the Douglas-Peucker [46], and the same holds for Ornstein-Uhlenbeck processes [47] for trajectory approximation or clustering for mobility pattern analysis.Our proposal is to be used in cases where the AIS messages are unavailable and can be reconstructed simultaneously with other vessels in the trajectory network.As part of the AIS forecasting task, the vessel's positioning is included, and the trajectory is preserved but not at the same level of granularity as traditional trajectory forecasting techniques.The same holds for smoothing-based techniques because our model intends to foresee AIS transmissions.Thus, the number of expected messages is the same as the real-world AIS transmission system ideally receives.
In this sense, this paper focuses on accurately representing the transmission system for maximizing generalization over mixed-typed vessels indistinctly.Our goal consists of minimizing the shared error between the predicted and observed AIS messages coming from heterogeneous vessel tracking sources.To the best of our knowledge, this approach has not yet been studied from the perspective of maritime vessel trajectories due to its inherent timing complexity and volume of data in the form of AIS messages.It could offer a unique milestone for future research with similar patterns.Hence, we seek a sufficiently robust model for different data distributions and outliers arising from the delta time between consecutive AIS messages.Therefore, we propose using an artificial neural network model mixing single-dimension convolution layers, recurrent neural networks, and feed-forward neural networks into a single architecture for multi-task and multivariate AIS transmission forecasting that achieves increased performance in predicting the intermediate states of the vessel trajectory network as upcoming AIS messages.
Our results are based on extensive experiments contrasting the capability of several machine and deep learning models, which are bounded to univariate or multivariate samples.However, the problem we are tackling requires considering multiple variables across multiple instants of time for multiple samples due to different data distributions and mobility patterns arising from different vessels.These models were tested multiple times for different sets of samples and variables.Our results comprehensively compare the forecasting of AIS messages for single and multiple vessels, considering one or more variables.We cover a range of baselines driven to (A) single trajectories with multivariate estimators, (B) single trajectories with multiple univariate estimators, and (C) multiple trajectories with multivariate estimators.
The results show that our model improves the prediction of vessel routes when simultaneously analyzing multiple vessels of diverging types.This translates into a model that, on average, provides more accurate forecasting results over multiple trajectories rather than a model tailored for a single class of vessels or trained on long historical sequences of AIS messages of a single vessel.Moreover, the results point out that traditional machine learning models struggle to generalize over different vessels, while deep learning models can better capture the temporal irregularity and spatial features while simultaneously describing multiple vessels' trajectories.In such a case, deep learning models achieve improved results over competing algorithms, mainly when working with convolutional layers.In experiments with short, medium, and large-sized AIS messages sequences, the proposed model achieved 36/37/38% of the Relative Percentage Difference (RPD) -the lower, the better, whereas we observed 92/45/96% on the Elman's RNN, 51/52/40% on the Gated Recurrent Unit (GRU), and 129/98/61% on the Long-Short Memory (LSTM) network.In addition to the performance improvement derived from our alternative network architecture, we also observed that our model was more numerically stable over the various experiments using different window and horizon sizes, showing better performance in forecasting short and long AIS message sequences for multiple vessels.Contrarily, other models revealed varying performance over different-sized AIS message sequences.
In conclusion, our contributions can be summarized as: • A new perspective for AIS transmission behavior modeling accounting for the full continuous-valued content of the AIS message under temporal noise effect; • A comprehensive benchmark with several machine and deep learning models submitted to the same forecasting task on horizon sizes of comprehensive lengths; • A methodological pipeline that describes how to capture the multiple data distributions on the temporal data for different vessel trajectories in a single model; and, • A proposed model based on recurrent neural networks, convolution, and feed-forward layers to achieve increased performance regardless of the vessel type.
This article is organized into three sections apart from the Introduction in Section I. Section II states the problem, describes the dataset, and presents the methodology.Section III review the main results and discusses our findings.Section IV addresses the conclusions and future works.The supplementary material includes details on the baseline experiments.

A. PROBLEM FORMALIZATION
AIS messages contain different static and dynamic information describing vessel trajectories that vary according to the different ocean and traffic monitoring applications in which they are used.In this paper, we defined an AIS message of a vessel as an event v, which is defined as v = ρ, ω, ψ, , µ , having latitude ρ, longitude ω, time ψ, Course Over Ground (COG) and Speed Over Ground (SOG) µ as attributes.
The sequence of AIS messages of a vessel shapes its trajectory, which has a non-standard (i.e., varying) length.Thus, we define the trajectory of a vessel as τ i = {V τi , E τi }, being a sequence of ordered events v ∈ V τi connected by an edge e ∈ E τi .The edges are unweighted in our formulation, but they could represent, the distance D between the source v n and target v n+1 AIS messages in a sequence, such that Through such data, it is possible to derive a disconnected graph T by modeling the dataset's vessel trajectories as components.T = {τ 0 , τ 1 , . . ., τ c } is a network of multiple connected components, in which τ i ∈ T ∀ 0 ≤ i ≤ c and c is the total number of different vessels.The trajectories are not segmented2 , so each vessel has only one sequence of AIS messages that varies according to the number of messages transmitted by the vessel and received by radio or satellite receivers.Knowing that different vessels cannot occupy the same space at the same time, T is under the condition that In terms of sequences and series, each trajectory τ ∈ T is composed of a sequence of ordered events V τ = v 0 , v 1 , . . ., v p , where p ∈ N + is the total number of events which varies for each vessel.The events are sets of spatiotemporal features describing the vessel trajectory information at different instants of time, such as given by V τ = ρ, ω, ψ, , µ 0 , ρ, ω, ψ, , µ 1 , . . ., ρ, ω, ψ, , µ p .
In this case, the problem for a single vessel can be defined as f : x ⊂ V τ , x ∈ R + → ŷ ∈ R and reduced to f (x) ≈ ŷ, where f is the network reconstruction model that given a set x of observations will yield ŷ that resembles y the most, which refers to the future states of the trajectory.Accordingly, given an arbitrary optimization function g : R 2 → R + computed between sets y and ŷ, in which g (ŷ, y) ∈ R + and ŷ ≈ y, we seek a model f that minimizes g for any x ⊂ V τ .Notice that x and y are contiguously contained in the series, but that does not mean that time between AIS messages is monotonically defined.That is because of the different types of noise faced by transmitters and receivers (see Section I).
For network modeling purposes, forecasting upcoming AIS messages based on historical AIS data for an arbitrary trajectory is unfeasible when using timestamps ψ as it follows a discrete probability distribution while other features are continuously defined.When including ∆T ∈ R + , i.e., the elapsed time since the last message, instead of timestamp ψ, the problem becomes feasible because the elapsed time has a continuous probability distribution.Thus, we have V τ = ρ, ω, ∆T, , µ 0 , . . ., ρ, ω, ∆T, , µ p , p ∈ N + .
In such a scenario, the relationship between time events and delta time of a trajectory V τ is given by ψ i − ψ j = ∆T ij , ∀ i, j ≤ |T |, i = j and ψ i + ∆T ij = ψ j , ∀ i, j ≤ |T |, i = j, which means a timestamp can be safely inferred when at least one delta time prior in the sequence is known.
Motivated by the sequential nature of vessel trajectories, we aim to go further with the trajectory modeling problem by reconstructing the graph's topological structure and the features underneath it.In the case of vessel trajectories, the topology and features are deeply interconnected due to the spatiotemporal nature of the AIS messages.In such a scenario, the problem behaves non-stochastically, where the state of network node as v t depends on a sequence of w ∈ N + past events v t = α 0 v t−1 + α 1 v t−2 + . . .+ α w v t−w subject to a set of scaling parameters α.We can define the previous relationship in terms of subsets x = { ρ, ω, ∆T, , µ 0 , . . ., ρ, ω, ∆T, , µ w }, x ⊂ τ and ȳ = ρ, ω, ∆T, , µ w+1 , . . ., ρ, ω, ∆T, , µ w+s , ȳ ⊂ τ , in which w ∈ N + is the window of past observations and s ∈ N + is the horizon to be predicted, subject to w + s ≤ |τ |.
We now seek a function h that given x will approximate ȳ, which can be written as h : R |x| → R |ȳ| .In such a case, h represents a function that better describes a trajectory network for any vessel or subset of vessels in the dataset, capable of picturing the inner states of the vessel trajectory network in the form of foreseen AIS messages transmissions.

B. SPATIAL COVERAGE
The dataset used in this article comprises a portion of the Atlantic Ocean from Iceland to the south of the United States and the west of Europe to the north of Africa (see Figure 1).It consists of a private dataset provided by Spire3 (former exactEarth) that contains raw AIS messages of over 20, 000 vessels of different types (e.g., cargo, tanker, fishing, and other vessels) collected from March to July 2020, resulting in about 60, 000, 000 AIS messages.It is worth noting that the vessels navigate independently and are not limited to navigating inside the bounding box containing the dataset.In this sense, Figure 2 simultaneously pictures each unique  [48] applied over the dataset's first and last message of each unique trajectory.The colors are arbitrarily used to contrast the flows and ease the visualization, while the thickness of the edges represents the flow intensity.
vessel's first and last appearance by an Edge Bundling visualization technique [48].Although colors are used to contrast vessels' flow, the trajectories' thickness is proportional to the recurrence of the route, indicating the intensity of the marine flow in the studied region.These trajectories have different lengths as well as starting and ending locations, and they contain noise in the form of inaccurate information within AIS messages.The analysis of the inaccuracy behind the AIS messages in this dataset is beyond this work's scope.
Figure 3 illustrates the probability distribution of AIS messages per trajectory.It shows the shape of a long-tail (i.e., Pareto) distribution, meaning that the dataset has most of its AIS messages concentrated on a small number of trajectories, and a few vessels dominate the trajectory dataset.An unbalanced dataset such as this has a trade-off between performance and generalization.Due to that, different data modeling approaches are required to reduce the bias of the heavily populated trajectories.The dataset has another conspicuous feature among the trajectories, which is the irregular timing between consecutive transmitted AIS messages.For example, Figure 4 illustrates the phenomenon in the form of outliers observed between consecutive messages.The image provides the Interquartile Range Analysis (IQR) for fifteen  randomly selected vessels, in which it is possible to note the extreme variance between consecutive transmissions.Most messages are received within seconds or minutes, but there are recurrent cases where, due to transmission delays, it spiked up to a few days and even a couple of months.

C. WINDOW SAMPLING AND SCALING
AIS data are notoriously known for their long historical sequences.Although its volume is considered an asset in many applications, its overabundance can also be detrimental, particularly in unbalanced trajectories (see Figure 3).To increase the model's mobility pattern variability and geospatial coverage while it decreases the training time, we had to design a training technique based on temporal sampling.However, regular AIS message sampling affects the trajectory data distribution similarly to using trajectory interpolation based on virtual AIS messages (see Section I), altering the behavior underneath the transmission system that we seek to model.To preserve the data distribution, expand the model's capacity, and still reduce training time, we transformed each trajectory into predefined temporal segments known as windows, and then we sampled the temporal windows instead of the messages contained in them (see Figure 5).This approach preserves the course of time within the windowed AIS messages without increasing temporal irregularities inherent in message sampling.The idea is to sample sequences from all vessels indistinctly and feed them randomly to the learning model such that the model sees segments of trajectories from multiple vessels at varying timespans and/or locations.
The data among the different sampled sequences are standardized using the z-score normalization, which enforces a zero mean and unit variance for all the records.Next, the standardized samples undergo a min-max normalization to set all values on a zero-one scale.All data transformation is applied on the variable axis shared among all the windowed samples of the dataset.The parameters for each transformation are computed from the training set samples only and then applied to the multiple samples of the testing data.Due to transforming the entire dataset, the models' outputs will follow an ideally similar scale.Therefore, the output must be inversely transformed before assessing the scoring metrics.
We have set 25 windows as the default value for the window-trajectory sampling for the experiments.We refrain from sampling a higher number of windows because the higher the number of samples, the longer the training sessions will be.Notice that the size of the input and output sequences scales cubically due to working on a multi-task and multivariate forecasting problem, meaning that minor variations in the number of sampling windows have the potential to quickly increase the dataset size to a point where hardware limitations will not allow moving forward with the training.Nevertheless, aiming to increase the sampling variability, the experiments are repeated five times using different random seeds, presenting as the results the average of the experiments followed by their standard deviation.This approach allows us to work with a high number of vessels and preserves irregular timing while increasing the reliability of the experimentation.
For training the model based on time-windowed data, the sliding window technique is a straightforward approach commonly used with sequence-and series-like data [49].It works by setting a fixed-size window that slides over the temporal axis of the dataset, predicting a pre-specified number of future steps, referred to as the horizon.Moreover, the fixed window size is known for being a highly sensitive hyperparameter [50], [51], which leads us to set it before the experiments by considering the domain of the data and the dataset itself [52], besides hardware limitations that come with working on large datasets made of long sequences from multiple vessel trajectories.The window w and horizon s sizes used for experimentation along with the paper are presented as follows in three complexity categories:  For two out of three categories, the window sizes were set to be smaller than the horizon to increase the difficulty of the forecasting task, which will look to fewer past events for forecasting a larger horizon.However, forecasting sequences larger than the ones we used might increase the uncertainty of the forecasting process by stacking the error of the sequentially forecasted messages, possibly generating an output that no longer represents the target network.For example, assuming a dataset has 1, 000 different trajectories, the window size is 30, and the horizon size is 50.The model will digest (30 × 25) × 1, 000 AIS messages in a single iteration over the entire dataset and provide as output (50 × 25) × 1, 000 AIS messages.Knowing that our dataset has around 20, 000 different trajectories (see Section II-B), our input/output has a 20 times larger magnitude.Therefore, in the low complexity experiment, throughout training and testing the model outputs 2.5M AIS messages, 12.5M in the medium complexity, and 25M in the high complexity case.These messages are processed in mini-batches, meaning they are not processed at once but in hundred of vessels instead.

D. OPTIMIZATION STRATEGY
The proposed model is trained using a mini-batch-based optimization strategy.In such a strategy, the algorithm iterates over the different samples of the dataset, feeding the network with mini-batches of different windowed data, repeating the process for all samples in random order.Feeding the neural network model with randomly ordered windowed data is imperative to achieve maximum generalization.Otherwise, the model could walk towards a local optimum due to recurrently focusing on samples of the same data distribution at the early beginning of the training.The network parameters are shared among the dataset and optimized towards the minima of the loss function.We used AdamW [53] as the optimizer, a gradient descent-based algorithm.AdamW is a standard optimizer for sequence and series forecasting tasks, a variant of Adam [54] with improved decoupled weight regularization.As the optimization criterion, we used the Hyperbolic Tangent Error (HTE), which is defined as: where Ω are the network parameters, N is the number of mini-batches, y is the ground truth, and ŷ is the prediction.The HTE behaves similarly to the traditional Mean Absolute Error (MAE), and both are less sensitive to outliers, but HTE allows for more refined results generalization in the face of the problem constraints observed in the trajectory network.The significant difference between them is that the derivative used to compute the gradients and update the weights is a step function for the MAE and a non-linear function for the HTE.The optimization criterion is calculated from the full content of the AIS messages and not only the trajectory itself.In such a way, the overall error is a compound function of the individual errors of each variable in the massage, which are all on the same scale (see Section II-C).We aim to find a model that near-optimally minimizes the error of simultaneously forecasting the continuous variables in AIS messages of multiple trajectories of different vessel types.
Due to working with noisy and non-prepossessed AIS data, we inserted a clipping function that enforces the boundaries of the AIS message information they represent (i.e., longitude ρ, latitude -ω, ∆T, COG -, and SOG -µ) after the model output computation and before computing the loss function.The clipping function first undoes the min-max and z-score normalization and then enforces the following constraints: We have used the same clipping function on the entire dataset before computing the evaluation metrics for off-the-shelf algorithms not trained using our network training pipeline.

E. EVALUATION METRICS
In addition to the network optimization criterion, the results are presented with the aid of the Relative Percentage Difference (RPD) and Root Mean Squared Error (RMSE): where y is the ground truth, ŷ is the model prediction, and N is the number of mini-batches.The RMSE, based on the square root, is used to evaluate the model in the face of larger values, which in the case of the vessel trajectory network dataset are known to be outliers.Alternatively, RPD is a signed expression that compares the difference between the values and their average magnitude.

F. NETWORK ARCHITECTURE
The neural network proposed for modeling the vessel trajectory network under irregular timing constraints and in the face of different data distributions consists of two sequential single-directed and single-layered long-short-term memory (i.e., LSTM [55]) cells that operate with the aid of a one-dimensional convolution (i.e., Conv1D [56]) featureextraction layer before each LSTM, while simultaneously leveraging a linear feed-forward shortcut connecting the network input to the output in a residual-like connection [57] with trainable parameters.Each triplet of convolution, recurrent encoding, and sequential decoding is referred to as a block, having independent weights but being trained together, whereby the first is labeled as α and the second as ω.
In such a case, after the windowing and window-sampling preparation processes (see Section II-C), the data from the multiple trajectories is fed to a convolutional layer.In this layer, the multiple features existing within the windowed trajectories in a mini-batch (i.e., input planes) will be combined into an intermediate tensor representation containing the hidden features that arise from the cross-correlation between the weights and the input planes.As a result, the hidden features will have the temporal axis dilated (or contracted) to match the number of output channels of the convolutional layer, initially set to be the window size w.Due to leveraging a single-dimension convolution, the variables will be convolved only with themselves and never with the other variables within the message.This means that a contracted sequence of messages is a smaller representation of the trajectory, similarly to the output of a series reduction algorithm.On the other hand, having an expanded output of the input sequence can be understood as an interpolated segment of the input trajectory.These messages, however, arise from the hidden weights of the network and have no straightforward meaning as the original messages; therefore, we refrain from further comparing the original trajectory with the one arising from the hidden weights of the proposed neural network.
The one-dimensional convolution can be defined as: where W ∈ R O×I×k is the weights, b ∈ R O the bias, the cross-correlation operator, t the time instant indicator, k is the kernel size, O the number of output channels, and I the number of input channels -bounded to a sequence of size w, the sliding window's size.The output of the convolutional layer will be the hidden features with a temporal dimension matching the number of output channels.
Next, the hidden features extracted by the convolutional layer go through the first LSTM of the network, defined as: where W i , W h ∈ R O×O are the weights and b ∈ R O the bias to be learned, i α t is the input and update gate's activation vector, f α t the forget gate's activation vector, g α t the cell gate, o α t the output gate's activation vector, c α t the cell state vector, h α t the hidden state vector, and σ the sigmoid activation function, • the Hadamard product.Next, the last hidden state vector of the first LSTM cell, i.e., h α t , is then fed to a nonlinear feed-forward decoder that will convert the hidden-size dimension of the data into the expected output size regarding only the temporal dimension formalized as follows: where W m ∈ R O×m is the weights, b m ∈ R m the bias, m is the number of variables, and δ the dropout operation.
The previous network layer's block will use the set of gates and memory of the LSTM cell to unfold the sequences in the hidden features created from the cross-correlation operation.It will incorporate traces of the multiple data distributions in the internal weights yielding an intermediate result.Due to the increased complexity of working on a multi-task multivariate forecasting task, a single network block showed not to be not enough.Therefore, we permuted the tensor exposing the variable axis to a different block for re-coding the temporal axis while learning intricacies from the variables instead.Using this approach, the first block learns how the variables of the AIS sequence change through time, while the second learns how time changes through the intermediate hidden weights representing the variables.As a result, the output of the previous block, i.e., xα t , is then in-sequence stacked to a second block formalized as follows: LSTM Encoder Linear Decoder where the weights and bias for the Conv1D and the LSTM Encoder follow the exact dimensions as the first block but VOLUME 4, 2016 not the last linear layer where W n ∈ R O×n is the weights, b n ∈ R n the bias, n is the number of variables.There is no dropout nor activation function applied to this block's output.
As previously mentioned, due to the neural network consistently losing the scale of the output compared to the dataset's input, we leveraged an additional Linear layer that works in parallel with the rest of the architecture.Such a linear layer is comparable to an Autoregressive component [58], in which no non-linearity is applied to either the input or output of the layer.The component works by restoring the scale of the data that, due to subsequent operations and non-linearities, makes the output tend to zero.The following gives the final output of the proposed neural network model:

Baselines
We considered over 60 different traditional and state-of-theart algorithms as a baseline.This experimental set includes machine and deep learning models adapted for the trajectory AIS transmission task, using the training preparation steps described in Sections II-B and II-C.The machine learning algorithms (see supplemental material for a complete list) come from open-source libraries, e.g., scikit-learn [59], scikit-multiflow [60], scikit-extra 4 , lightning [61], and polylearn 5 .Other estimators, such as CatBoost [62], XGBoost [63], and LGBM [64], have their dedicated open-source implementation, which was preferred over others.Notably, most of these out-of-shelf algorithms operate on a single-or multi-output sample space.However, even the more adaptable algorithm lacks straightforward support for multi-output and multi-task forecasting problems.
Therefore, we adapt the single-output algorithms into multi-output ones using a Regression Chain mechanism 6 .This technique combines multiple single-output estimators of the same algorithm in the order specified by the chain, having one different estimator for each inferred horizon unit, in which the previous estimator feeds the following estimator [65].However, even in a chained pipeline, these estimators cannot simultaneously focus on the multiple samples and variables.Therefore, the problem was split into smaller parts, allowing the chained single-output and multioutput algorithms to focus on a single variable shared among all trajectories simultaneously, repeating the process for each variable in the dataset and then averaging the final results.
This approach simplifies the inference process, as the algorithms are now centered on a single variable per time instead of being required to forecast all of them simultaneously.However, it is essential to note that, although the problem is more straightforward in terms of the number of variables simultaneously predicted, there is less interaction between multivariate samples, which might mean these estimators learn a limited amount of inter-variable features when compared to multi-output and multi-task ones.
In order to ease the understanding of the inference limitation of the baseline algorithms, along with the experiments, we have symbol-encoded them using the subsequent scale: ◎ Represents single-output algorithms; Indicates multi-output algorithms; and, Consists of multi-output and multi-task algorithms.
Specifically, among the deep learning baselines, we have used a different set of network architectures adapted and re-implemented for specifically handling the data from the vessel trajectory network.Related to Recurrent Neural Networks, we have conducted experiments with Elman's RNN [66], GRU [67], and LSTM [55].For Auto-Encoders, we have simplified ReGENN [52] for a bi-dimensional input, in which the Transformer Encoder [68] is used to extract an encoded representation from the input features, and an LSTM is used to decode such a representation into the horizon.Regarding Convolutional Neural Networks [69], we experimented on a temporal CNN with a single-dimension convolutional layer followed by a feed-forward layer that translates the output channels resulting from the cross-correlation operation into the horizon.We experimented with a feed-forward network for accessing the results on a linear multi-output and multi-task estimator and included an additional set of deep learning baselines, which are the highway networks [70], [71].Note that these estimators might lose the significance of the output scale predictions compared to the input when the information is propagated throughout the network repeatedly.

Hyperparameter Tuning
Along with the experimentation, we use the default hyperparameters for all algorithms.More specifically, for the machine-learning baselines, the hyperparameters come from the open-source library where they are included (see the supplementary material for details), and for the deep-learning ones, PyTorch's defaults unless specified.We used a gradient norm-clipping of 1.0, a learning rate of 1e −3 , 10%probability dropout, and a learning rate scheduler to reduce the learning rate by a fifth every three stalled epochs.For the CNNs, specifically, we have used a fixed kernel size of 3, padding the input with a stride of 1, so the output has the same shape as the input but with an increased number of output channels (i.e., 128) when compared to the input channel, which matches the size of the window.For the recurrent networks, including our model, we set a pre-fixed hidden size of 128 for all the experiments.
As part of the results, we show how our network behaves when we change the number of output channels of the convolutional layer (between 8, 16, 32, 64, and 128 channels) and also when we vary the recurrent layer (between Elman's RNN, GRU, and LSTM) in addition to their number of stacked layers ranging between 1 and 3.All the experiments were repeated five times with different random seeds (i.e., 2021, 2121, 2221, 2321, and 2421) to increase the variability of the sampled data during the experimentation and the order that the networks will see the samples (see Section II-C).

Computer Environment
The experiments related to machine-learning algorithms were conducted on a Linux-based system with 80 CPUs and 504 GB of RAM.The ones related to deep learning were carried out on another Linux-based system with 48 CPUs, 126 GB of RAM, and a GeForce A100 40 GB (Ampere).

Reproducibility
The dataset used in this paper is not available to the general public for download due to being a private dataset owned by Spire.However, aiming at the reproducibility of the results, we provide the source code, the snapshot of the proposed network on GitHub 7 , guiding the user on how the inference process should be carried out on a sample dataset.

PERFORMANCE OVER COMPLEXITY SCENARIO
This section describes the results considering the different experimental complexity setups as highlighted by Section II-C.Due to the gradual transition in the problem complexity containing different window and horizon sizes, many tested algorithms presented divergent behavior.In these cases, the algorithms could not answer all the experimental settings given their required resources and computing time inherent to the scale of our dataset (see Section II-B).The supplementary material presents a comprehensive list of all the algorithms while highlighting those removed from the pipeline.Among all the machine learning baselines, we included a Control Model, which, like the other machine learning models, will use the Regression Chain mechanism to infer over the data.Such an inference is based on the average window size that feeds the algorithm.Such a model divides the first set of estimators into two further pieces, as denoted by the colored dashed lines in Figures 6, 8, and 10.This division means that estimators above the dashed line performed worse than the average, while those below performed better.The average of the input AIS messages describes vessels nearly stalled, i.e., in a back-and-forth moving pattern, during the horizon duration despite the other features among the AIS messages.Performing worst than the average is a piece of evidence that they cannot represent the multiple patterns arising from different trajectories of different vessel types.
The models below the dashed lines concentrate on some high-scoring machine learning algorithms and the neural network models used for experimentation.It is possible to notice that the neural networks are in first-placed positions, while the low-scoring among the high-scoring ones are chained out-of-the-shelf machine-learning models.This is because neural networks can cope with the multi-task multivariate nature of our problem (see Section II-A). 7Available at https://github.com/gabrielspadon/ais-transmissions.

Low complexity case
The experiments start with the low complexity case, where for a fixed input of 15 messages, we are looking to predict the subsequent 5 messages for multiple vessels simultaneously.The low complexity of such an experiment comes from the fact that most of the AIS messages among the data, as shown in Figure 4, have a low delta time between consecutive transmitted messages.Due to that, the frequency of consecutive messages is higher, which is usually related to terrestrialbased AIS messages.In this sense, for a short period, the variability in the trajectory, speed, and positioning of vessels tend to change very little if not remain nearly constant in the case of COG and SOG.In this case, simpler models, such as a Multi-Layer Perceptron (MLP), i.e., Feed-Forward, showed more effectiveness than our solution, as well as bidirectional double-layered LSTM and its Temporal CNN version.
In Figure 7, we further stress our model, showing how it behaves when leveraging different Recurrent Neural Networks over one or more stacked recurrent cells.The image reveals that stacking LSTMs can increase the performance of the model, as it will be able to capture more nuanced relationships arising from the trajectories.However, that would mean the RNN unit of the model would have up to six times more parameters than it initially had, implicating longer training sessions and potential scalability issues.Contrarily, in the lower half of the image, we show that by using an LSTM working in different directions and with a varying number of stacked layers compared to our proposed model for modeling the vessel trajectory network where algorithms look for the last 15 messages to predict the subsequent 5 messages.In addition to analyzing the impact of the output channels from the Convolutional Neural Network (CNN) in our proposed modeling approach.The performance assessment is based on the Hyperbolic Tangent Error (HTE) and the Root Mean Squared Error (RMSE).
as the RNN architecture, decreasing the hidden size and channels of the LSTMs and CNNs simultaneously on our proposed blocks can reduce the number of parameters and achieve increased performance.In such a way, the proposed solution lies in the standard deviation of the top performers.

Medium complexity case
Subsequently, in Figure 8, we analyze the medium complexity case, where we are looking to forecast the following 25 messages using the 15 previous messages transmitted in sequence by the vessels.In contrast with the low complexity case, for this one, we have one further model that performed worse than the control model, the same holds for the highcomplexity case.The reason is the increased complexity of handling longer sequences and more data.Such behavior was expected because, for small sequences, such models could not capture the interactions from the chained regression forecasting pipeline.Further fine-tuning the hyperparameters of each estimator in the chain could undoubtedly improve the forecasting process and yield better results.However, as the number of estimators per model ensemble increases with the model's complexity, such a modeling perspective would turn into an extensively laborious task not covered in this work.
In the lower half of Figure 8, MLP shows divergent behavior than previously seen because as the sequences start to get large, the more the probability of increasing the temporal gaps between consecutive AIS messages.In such a case, the recurrency within the RNNs is better leveraged, supporting that our proposed achieves increased performance than other models.This can be seen when analyzing the RMSE values.Although the variation is small, our model has a lower RMSE value, achieving slightly better results when larger temporal gaps are present in the sequence of messages.
Figure 10 further supports that adapting the hidden size of the LSTM and the number of input channels of the CNN can improve the performance of the proposed blocks and network architecture, in this case, with more significant improvement in the medium values among the AIS messages, given by the lower HTE, and also the larger values, indicated by an also lower RMSE.In contrast to Figure 8, the LSTMbased variations of our model achieve nearly comparable performance.This indicates LSTMs are more suitable for handling both long and short temporal dependencies of the AIS transmission sequence.This is related to better forecasting sequential AIS message transmission regardless of the presence of outliers in the form of messages too far apart in time and/or space, which can be related to transmission failures or irregular maritime activities (see Section I).

High complexity case
Figure 10 presents the performance benchmarks on the prediction of the 50 subsequent AIS messages given the last 30 AIS messages observed.As depicted early, machine learning algorithms are concentrated among the models at the top of the image, and the results presented at the top performed worse than those at the bottom.As described in Section II-F,

HT E
Ours    these models rely on multiple estimators to infer the problem's multiple samples, instants of time, and variables.These models have a different number of estimators alternating between 5 to 250.In this case, 5 estimators refer to a different estimator trained per variable of the dataset, while for 250, we have an estimator per variable and another for each different horizon in the output sequence, holding for Figure 6 and 8.
In particular, Huber is among those consisting of 250 different estimators located below the control line.This is related to the fact that it uses the Huber loss, a smoothed version of the Hyperbolic Tangent using the Mean Absolute Error (MAE) to be less sensitive to outliers.The same can be observed with the Linear SVR, which is an ensemble of 5 estimators and has the MAE with the soft-margin criterion as the loss function to be less sensitive to outliers.Other relevantto-mention algorithms are based on linear regressors with different stochastic solvers or optimization mechanisms, such as AdaGrad, SAG, and SAGA.The reasonable performance of linear-based algorithms comes from the linear nature of consecutive AIS messages, as seen in the low complexity case, which does not incur many variations in the vessel coordinates besides their course and speed over the ground.A linear estimator can sufficiently model the problem for these particular cases, as an MLP does.However, when the sequences start to increase, such as in the medium and high complexity cases, the behavior shifts in favor of our approach, showing that the hidden features extracted by the convolutional layer and later processed through the longshort-term memory network can improve the solution.Through this set of experiments, we observed that the behavior of neural networks diverges significantly according to the predicted sequence's complexity.Thus, models with fewer non-linearities tend to demonstrate better results for more minor sequences than other more intricate models.This observation comes from the case of a feed-forward neural network (i.e., MLP) being among the top performers for the case of lesser complexity (see Figure 6), showing better performance than some recurrent neural networks submitted to the same task.For the high-complexity case only, GRUs showed in Figure 11 to be an alternative for the recurrent unit over large sequences.That is something to be considered, as the GRU has a simpler formulation than LSTM has and is more efficient and easier to train.Therefore, GRU is a feasible alternative for scaling the proposed architecture and blocks to even larger sequences than used in this work.

RESULTS INTERPRETABILITY
Due to the narrow interpretation of the HTE and RMSE, Table 1 shows the Relative Percentage Difference (RPD) results.Such a metric evaluates how far the forecasted message is from the expected message.As the results of the RPD can be both positive and negative, we can understand if the In addition, the analysis of the impact of the output channels from the Convolutional Neural Network (CNN) in our proposed modeling approach.The performance assessment is based on the Hyperbolic Tangent Error (HTE), Mean Absolute Error (MAE), Huber Error (HE), and the Root Mean Squared Error (RMSE).Besides the ones indicated in the image, no other hyperparameter was changed.TABLE 1. Analysis of the Relative Percentage Difference (RPD) over the three different complexity cases.The results in bold indicate the best-performing ones.Among the algorithms, we included those that consistently performed better than the Control Model, along with the three complexity case studies.predictions are lower or higher than the expected value.For the RPD formulation, the results can be higher than 100%, meaning that the error can be multiple times larger than the expected value.In this sense, reasonable results are below 50% and the closer to 0% (i.e., perfect model), the better.

Complexity Level
The RPD results show a different behavior from the previous metrics, where our proposal consistently shows greater stability in the shared error of forecasting the AIS messages.The models that previously showed great efficiency now show slightly worse results.For experiments with short, medium, and large-sized AIS messages sequences, our model achieved 36/37/38% of the RPD, while Elman's RNN scored 92/45/96%, GRU scored 51/52/40%, and LSTM scored 129/98/61%.This means that the proposed solution showed greater performance in forecasting the content of the AIS message, including the vessel positioning and other dynamic variables such as COG, SOG, and delta time of consecutive messages.This is not only important for controlling and increasing awareness about the AIS transmission system, but it has the potential to be used in detecting misleading transmission patterns, such as on-off AIS transceiver behavior modeling and AIS spoofing activity detection.The variation from the results observed in the HTE regarding the RPD is due to the non-linear nature of the Hyperbolic Tangent, which might not show the same ability as previously observed when in a linear space.That leads us to conclude that our modeling solution over-performs the competing models in all three complexity cases, being more robust to irregular timing.

MODEL ABLATION
Lastly, Table 2 presents the ablation results, highlighting how the traditional LSTM, the FC-CNN, the LSTM-CNN single block, and the LSTM-CNN-AR in a double-block structure behave according to the RPD.Through these results, we observe that our single-block architecture shows suitable performance in all three cases.However, it can leverage the further performance of an additional block in the low complexity case, which relates to the improved performance of stacked RNNs observed when describing the low-complexity case.The fully connected convolutional layer alone has not shown a favorable result compared to the others, but it outperformed the traditional LSTM also in the three scenarios.Overall, the experiments support the proposed modeling approach, demonstrating effectiveness on different horizon sizes.

LIMITATIONS
Due to working with multiple trajectories simultaneously, we provide additional information concerning the transmission behavior of AIS messages.However, it also turns the problem into a more significant challenge for the models due to the increased uncertainty related to the irregular timing of messages.The temporal irregularity between consecutive transmitted AIS messages is considered to be noise, and it turns the AIS messages into outliers when the gap between two messages is too large.By working with multiple trajectories, their presence is even more significant.This issue would be reduced if working with smoothed trajectories because they include virtual AIS messages to fill the temporal gaps and interpolate the trajectory.However, that does not mean that the trajectories will be equally accurately pictured once interpolated due to not being free of uncertainty when the temporal gap is too large.Also, interpolating every trajectory of the dataset might not be straightforward in near real-time conditions such as observed in AIS data streams.
Our proposal shows a different perspective on dealing with this problem.The significant difference is that interpolation techniques are preprocessed prior to the analysis.However, our approach works on cases where that does not hold, i.e., on the raw data.As such, we transfer the responsibility of smoothing the trajectories and reducing the irregularities by randomly inputting increased amounts of temporal data and guiding the algorithm to avoid pitfalls related to the outlier messages.While this may not be the most straightforward approach due to the complexity of training the network, it has been shown to perform better according to the experiments.
It is evident that generalization and specification are opposite qualities of a learning model.That being said, our model behaves and generalizes better over multiple trajectories simultaneously.However, when a trajectory of a single vessel is of interest and the historical AIS data from the vessel of interest is available, a model focused on the specific vessel might yield better forecasting results over its trajectory.That is because models trained for forecasting the trajectory of a single vessel on the observed data of the vessel of interest will capture the particular behavior of that vessel.Regardless, our modeling approach showed more performance and robustness than other modeling possibilities on the task of simultaneously predicting multiple trajectories on the raw AIS data transmitted along with the vessel's trajectory.
Lastly, we use delta time to include a notion of temporality in the data, but more features are needed to achieve superior performance in this task.Information related to the period of the day and the year's season might allow for a more refined understanding of the transmission patterns, which are closely tied to vessels' mobility patterns correlated to these variables.The same holds for geophysical data, such as information about the winds, the waves, tidal patterns, and the weather, which have the potential to refine this process further because the mobility pattern is also expected to change under harsh navigation conditions.The data fusion not covered in our study seems to show potential to further studies in this area.

IV. CONCLUSIONS
This paper addresses modeling the AIS message transmission behavior through neural networks under noisy and temporally irregular data.We presented a comprehensive set of experiments comprising multiple machine and deep learning algorithms submitted to forecasting tasks with horizon sizes of varying lengths.Such results show that traditional machine learning models strive to generalize over many vessels.Deep learning models revealed themselves to easily capture the temporal irregularity while preserving the spatial awareness when forecasting the trajectories of different vessels, given the lower Relative Percentage Error (RPD) assessed on three different complexity cases.The models showed to be more robust to the AIS messages' temporal irregularity and delivered beneficial results over machine learning algorithms, mainly when combined with convolutional layers.
More specifically, joining long-short-term memory neural networks with single-dimension convolutional neural networks enhances the feature extraction process, increasing the neural network's performance under different circumstances.The results show that our model improves the prediction of vessel routes when analyzing multiple vessels of diverging types simultaneously.This translates into a model that, on average, provides more accurate forecasting results over multiple trajectories rather than a model tailored for a single class of vessels or trained on long historical sequences of AIS messages of a single vessel.In such a case, deep learning models achieve better results than competing algorithms, mainly when joining convolutional and recurrent networks.
Experimenting with short, medium, and large-sized AIS messages sequences, the proposed model achieved 36/37/38% of the RPD, whereas we observed 92/45/96% on the Elman's RNN, 51/52/40% on the GRU, and 129/98/61% on the LSTM network.Besides the performance improvement derived from our alternative network architecture, we also observed that our model was more numerically stable over the experiments using different window and horizon sizes, showing better performance in forecasting both short and long AIS message sequences simultaneously for multiple vessels of different types.Through such a multifaceted analysis of estimators' performance, we concluded that our modeling approach performs better on different sizes of AIS sequences.It also allows further improvement by adapting the numbers of output channels of the convolution feature-extraction layer, which can increase or decrease the number of temporal samples the model will use for training.
Nevertheless, much improvement can be achieved along with similar study premises.Those would be related to increasing the geographical boundary of AIS messages to a global scale, which would require greater computational power and processing time.Further improvement refers to using different modeling approaches for the AIS message data, such as motif analysis on grided AIS data.Additionally, different neural network techniques could enhance the interaction between trajectories.This is the case of Graph Neural Networks (GNNs), which might shape the relationship of the variables within the AIS messages, and network embeddings that can be used to bring further knowledge about the mobility of the vessel to the forecasting pipeline.further funded by the Canada First Research Excellence Fund (CFREF), the Canadian Foundation for Innovation MERIDIAN cyberinfrastructure 8 , and the Natural Sciences and Engineering Research Council of Canada (NSERC).

FIGURE 2 .
FIGURE 2. A kernel-based Edge Bundling visualization technique[48] applied over the dataset's first and last message of each unique trajectory.The colors are arbitrarily used to contrast the flows and ease the visualization, while the thickness of the edges represents the flow intensity.

FIGURE 3 .
FIGURE 3. Probability distribution of Automatic Identification System (AIS) messages per trajectory (i.e., vessel) in the dataset.It shows that most vessels have few records, and a few vessels concentrate most of the records within the dataset, a behavior comparable to a long-tail (i.e., Pareto) data distribution.

FIGURE 4 .
FIGURE 4. Interquartile Range Analysis -IQR of delta time for fifteen different vessels ordered from the one with the most AIS messages to the one with the least.The analysis reveals that all vessels present a severe presence of outliers.An outlier indicates an irregularity related to the time elapsed between two consecutive messages, varying from a couple of seconds to a few months.

FIGURE 5 .
FIGURE 5. Temporal sampling technique designed to increase the variability of trajectories seen by the models.It decreases the computational training time on the entire trajectories without increasing the timing irregularity within the windows, proving segments of different trajectories and timespans.

10 2 10 3 10 4 10 5 10 6 10 7 1 Ours with GRU U 1 Ours 1 Ours with LST M U 1 GRU B 2 CN N 128 + GRU B 2 Elman's RN N B 2 CNFIGURE 6 .
FIGURE 6. Performance estimation and comparison among different algorithms used for modeling the vessel trajectory network considering the low complexity case where algorithms look for the last 15 messages to predict the subsequent 5 messages.The performance assessment is based on the Hyperbolic Tangent Error (HTE) and the Root Mean Squared Error (RMSE).The experiments were conducted with algorithms on their out-of-the-box version with no hyperparameter optimization.Specifically, among the neural networks, we use U as a superscript to indicate a single-directed model, B for double-directed, and the subscript numbers as the number of stacked recurrent cells.

FIGURE 8 .
FIGURE 8. Performance estimation and comparison among different algorithms used for modeling the vessel trajectory network considering the medium complexity case where algorithms look for the last 15 messages to predict the subsequent 25 messages.The performance assessment is based on the Hyperbolic Tangent Error (HTE) and the Root Mean Squared Error (RMSE).The experiments were conducted with algorithms on their out-of-the-box version.Specifically, among the neural networks, we use U as a superscript to indicate a single-directed model, B for double-directed, and the subscript numbers as the number of stacked recurrent cells.The estimators used the same dataset, but the deep learning baselines leveraged our proposed model's HTE loss function and further training adaptation.

FIGURE 9 .
FIGURE 9. Performance estimation and comparison among different algorithms used for modeling the vessel trajectory network considering the low complexity case where algorithms look for the last 15 messages to predict the subsequent 25 messages.The performance assessment is based on the Hyperbolic Tangent Error (HTE) and the Root Mean Squared Error (RMSE).The experiments were conducted with algorithms on their out-of-the-box version.Specifically, among the neural networks, we use U as a superscript to indicate a single-directed model, B for double-directed, and the subscript numbers as the number of stacked recurrent cells.The estimators used the same dataset, but the deep learning baselines leveraged our proposed model's HTE loss function and further training adaptation.

2 CN N 128 + Elman's RN N B 2 F C − CN N GRU U 1 Ours with GRU U 1 GRU B 2 CN N 128 + GRU B 2 Ours 1 LSTFIGURE 10 .
FIGURE 10.Performance estimation and comparison among different algorithms used for modeling the AIS transmission behavior.We have machine and deep learning algorithms clustered in two different segments according to their performance.The performance assessment is based on the Hyperbolic Tangent Error (HTE), Mean Absolute Error (MAE), Huber Error (HE), and the Root Mean Squared Error (RMSE).The experiments were conducted with algorithms on their out-of-the-box version with no hyperparameter optimization.Specifically, Elman's RNN, GRU, and LSTM are bidirectional.The estimators used the same dataset, but the deep learning baselines leveraged the HTE loss function and further training adaptation such as the ones used by our proposed model.

HTEFIGURE 11 .
FIGURE 11.Impact analysis of different Recurrent Neural Networks (RNN)working in different directions and with a varying number of stacked layers compared to our proposed model for modeling the vessel trajectory network.In addition, the analysis of the impact of the output channels from the Convolutional Neural Network (CNN) in our proposed modeling approach.The performance assessment is based on the Hyperbolic Tangent Error (HTE), Mean Absolute Error (MAE), Huber Error (HE), and the Root Mean Squared Error (RMSE).Besides the ones indicated in the image, no other hyperparameter was changed.

TABLE 2 .
Detailed results for the proposed modeling approach and further network components describing the Relative Percentage Difference (RPD) and the observed standard deviation.