AIS-Based Intelligent Vessel Trajectory Prediction using Bi-LSTM

Accurate vessel trajectory prediction is essential for maritime traffic control and management. In addition to collision avoidance, accurate vessel trajectory prediction can help in planning navigation routes, shortening the sailing distance, and increasing navigation efficiency. Vessel trajectory prediction with automatic identification system (AIS) data has thus attracted considerable attention in the maritime industry. Original AIS data may contain noise, which limits their application in real-world maritime traffic management. To overcome this problem, this study proposes a vessel trajectory prediction method that combines data denoising and a deep learning prediction model. In this method, data denoising is realized in three steps: trajectory separation, data denoising, and standardization. First, outliers from the original AIS data samples are removed, after which the moving average model is employed to further clean up the data; finally the denoised data are standardized into uniformly distributed time-series data. Bidirectional long short-term memory (Bi-LSTM) is then applied for vessel trajectory prediction. The performance of the proposed prediction model was verified using data on the trajectories of ten vessels and comparing the results obtained with those obtained using other prediction models (exponential smoothing, autoregressive integrated moving average, support vector regression, recurrent neural network, and LSTM models); the trajectory data were downloaded from a public AIS database. The experimental results revealed that model prediction accuracy increased after the data denoising process. Specifically, the Bi-LSTM model had the lowest mean absolute error, mean absolute percentage error, and root-mean-square error, demonstrating that the proposed method is highly efficient for trajectory prediction and can help vessel traffic controllers predict accurate vessel tracks; this would enable them to take early preventive measures to avoid collisions and thus improve the efficiency and safety of maritime traffic.


I. INTRODUCTION
Approximately 71% of the Earth's surface is covered by water, and only approximately 21% of the surface is land. Taiwan is an island country located at the intersection of Northeast Asia and Southeast Asia. The eastern half of Taiwan is close to the largest ocean in the world. Oil tankers, freighters, merchant vessels, and other vessels often travel across the surrounding Pacific Ocean. However, because vessels frequently traverse the western Pacific Ocean international trade routes and because of frequent and prosperous fishing activities, the marine traffic flow around Taiwan is of medium to high complexity. According to data from the Lloyd's Register Casualty Returns, the Taiwan sea area is classified as a moderate-risk environment [1]. Moreover, according to relevant statistical data from the Ministry of Transportation and Communications R.O.C. and the Coast Guard Administration, Ocean Affairs Council, in recent years, the average number of marine vessel disasters that occur in the waters around Taiwan has reached 100 [2]. Maritime transportation constitutes more than 90% of the global cargo trade. Because maritime traffic accidents are likely to cause considerable loss of life and environmental damage, improving maritime traffic safety has become a priority. The safety of ships sailing at sea is a key problem in maritime areas or ports with high traffic density and complicated conditions. A vessel traffic service (VTS), whose purpose is to accurately and effectively monitor and predict vessel trajectories (including in real time), provides valuable technical support for the early warning of marine traffic accidents [3]. To improve the safety of ships sailing in an environment with complex and ever-changing sea conditions, it is necessary to provide trajectory prediction and danger warning functions to a ship's intelligent navigation system. However, the maritime navigation environment is prone to many accidents, especially in crowded port waters, and it is not easy to predict moving targets.
The safety and integrity of marine vessels against hazards should be maintained when planning the ship's course. Considering this, the International Maritime Organization (IMO) proposes the application of various navigation systems, such as a bridge navigational watch alarm (BNWAS), automatic identification system (AIS), and electronic chart display and information (ECDIS), as tools to support the officers on watch (OOWs). Information collected by an automatic identification system (AIS) can help prevent maritime accidents and improve maritime situational awareness. An AIS collects all types of maritime surveillance data to provide accurate early warning information, including maritime traffic spatial information, to maritime traffic operators and provide support for various navigation operation decisions. Thus, an AIS can obtain information in a timely manner along the vessel. Studies have demonstrated that AIS data have a major impact on maritime traffic safety analysis. Therefore, improving the quality of AIS data has become an active topic in maritime informatics research [4][5][6]. The popularity of AISs has led to the recording and storage of a large quantity of vessel navigation data; nearly a trillion pieces of information are stored every day. However, this large quantity of data includes abnormal data caused by factors such as signal interference and mechanical failure [7]. Methods for processing abnormal data from AISs involve the use of statistics [8], unsupervised machine learning, and artificial neural networks [9,10]. In addition, many methods, theories, and technologies involving the use of massive data, data exploration, and machine learning have been proposed. Developing an approach for efficiently using the information obtained to increase maritime transportation intelligence has become critical [11,12]. However, the AIS onboard a vessel usually adds one piece of information every 2 s to 3 min, which results in an extremely large AIS dataset; this also engenders the data redundancy problem, which makes it difficult to use the data for research and practical applications. Data sparsity and erroneous information automatically extracted from unstructured or semistructured sources may result in the creation of the same product two or more times (i.e., duplicates). Product duplicates are not only one of the main sources of bad product experiences but also make product matching harder as multiple entities of the same product exist with overlapping information. The duplicates can have many fatal effects, including preventing machine learning algorithms from discovering important consistencies in product representations [13].
If repeated data is only generated 2-3 times, it will make the model more significant because the variance would be reduced. However, we should ideally remove repeated data. Duplicates are an extreme case of nonrandom sampling, and they introduce bias in the fitted model. Therefore, data compression technology is highly crucial in an AIS [14], [15], [16]. Automation is more challenging at sea than it is at land. Unlike cars on the road, sea vessels can travel in all directions. In addition, compared with car traffic rules, rules stipulated by the Convention on the International Regulations for Preventing Collisions at Sea are less quantifiable and their implementation relies more on experience; this thus increases the difficulty of predicting the navigation behavior of vessels. Numerous studies have presented models and procedures for track prediction using AIS data; examples of such models and procedures include a Kalman filter model [17], Markov model [18], optimal route estimation based on clustering, and an ant colony algorithm [19]. Some researchers also used neural networks to predict vessel trajectories without prior knowledge [20]. The Multihypothesis of ship navigation addresses the occasion where ships may have multiple shipping lanes. Multihypothesis means a ship may choose to follow one of two or more shipping lanes, which may lead to different predictions. Ship track prediction may also classify the initial ships by clustering and predict a similar navigation trajectory to achieve a more accurate trajectory prediction [21]. Recent developments in deep learning methods, such as deep neural networks [22] (including convolutional neural networks [23] and a recurrent neural networks (RNNs) [24]), have had a considerable impact in the fields of computer vision, natural language processing, and speech recognition [25]. Long shortterm memory (LSTM) can be used to model a complex function and extract various features from a large dataset [26]. Bidirectional LSTM (Bi-LSTM) used to predict DNA-Protein binding [27]. However, studies on the use of deep learning methods for vessel trajectory prediction are scarce; only a few recent studies have explored the use of deep learning methods for vessel trajectory prediction [28], [29] [30] [31].
Inspired by the successful use of deep learning methods in sequence prediction, we investigated whether an RNN model can be employed for vessel track prediction. In this study, a method for vessel trajectory denoising and vessel trajectory prediction was proposed for determining future vessel trajectories according to the given AIS observation sequence. The proposed method is based on the bidirectional LSTM (Bi-LSTM) structure, which has become an effective and scalable structure for sequence prediction. Considering the lag between the input sample and the output to be predicted, the ability of an LSTM network to learn data with long-term time dependence renders it ideal for use in the trajectory prediction task [26]. The main contributions of this study are as follows: (1) cleaning of the original AIS data through trajectory separation, outlier deletion, and data standardization; (2) development of a Bi-LSTM model for predicting the trajectory of a vessel using the denoised AIS data; (3) verification of the performance of the proposed model using the trajectory of ten vessels. This study can help vessel traffic controllers accurately predict the trajectory of vessels, which would allow them to take preventive measures to avoid collisions and improve the efficiency and safety of maritime traffic.

II. RELATED WORK
Trajectory prediction methods based on statistical methods are commonly used in the maritime industry, among which methods based on Gaussian process regression are the most common. Anderson [32] used time as the independent variable, obtained the measured value in discrete time, and regarded the trajectory as a one-dimensional Gaussian process.
Neural networks are computing systems with interconnected nodes that operate akin to neurons in the human brain. By using algorithms, they can recognize hidden patterns and correlations in raw data, cluster and classify them, and continually learn and improve over time [33]. Neural networks are ideally suited to solving complex problems in real-life situations. They can learn and model the relationships between inputs and outputs that are nonlinear and complex; make generalizations and inferences; reveal hidden relationships, patterns, and predictions; and model highly volatile data (such as time series data) [34]. Many types of neural networks like Artificial Neural Network (ANN) [35] [36] [37], Back-Propagation (BP) network [38], Multiple Layer Neural Network(MLNN) [30], Convolutional Auto-Encoder Neural Network (CAENN) [31] have been considered for the prediction task. With the popularization of artificial intelligence, neural networks have also been gradually applied to the field of maritime navigation [39] [40]. Historical vessel trajectory data and trajectory characteristics are used as an input for the neural network to output the predicted vessel trajectory data [40]. The clusters achieved by the first step were used to train the artificial neural network (ANN) to predict the vessels' trajectories [41]. The results showed a 70% prediction accuracy. Tang et al. constructed a neural network with two long short-term memory (LSTM) layers, which can observe the first 10 min of the vessel's state to predict the location of the vessel after 20 min [28]. Zhang et al. proposed a deep learning method that integrates multiple ship movements, which can be adapted to predict various categories of vessel trajectories after training the neural network appropriately [42]. Overall, the resulting accuracy varies as a function of ship categories, which entails a need to improve the modeling approach.
A so-called sequence-to-sequence recurrent neural network model has been developed to mesh and serialize a vessel trajectory into a neural network model to predict the main trajectory and arrival time [43]. An LSTM model was introduced to predict the ship's position by evaluating the probability distribution, and it provides relatively accurate results [28]. To improve the accuracy of the prediction mechanisms, a multiple azimuth autonomous device sensor has been used as an additional data input; however, this approach requires a large amount of AIS data and is thus computationally expensive [44]. Although large AIS historical data sets can be used for a reference to predict maritime trajectories, good data quality is often not guaranted and data redundancy is a major problem. At present, there are approximately 1600 AIS receivers on the coastline of more than 150 countries and 65,000 ships sailing at a time [45]. Furthermore, abnormal data due to either environmental conditions or technical problems are likely to generate significant trajectory prediction errors [46] [9]. The most appropriate balance between the use of large data sets (to train prediction mechanisms well) with the minimization of redundant and noisy data is still a crucial issue in practice [47].
The advantages of common statistical methods are that they use data that occupy less storage space during the calculation process, they can be used to realize short-term trajectory prediction, and their calculation method is relatively lightweight; these allow statistical methods to perform comparably with their deep learning counterparts. The disadvantage is that the initial state of the model and violations of the assumptions of ideal conditions greatly affect the prediction results. However, unlike deep learning, statistical methods cannot learn the effects of shallow reefs, islands, and other spatial factors on the trajectory of ships. This feature makes deep learning more practical in trajectory prediction. This motivates our search for a trajectory prediction approach that accounts for the impact of redundant and noisy data on neural network training and that optimizes the input trajectory dataset to improve the final quality of the modeling approach. This led us to combine a neural network with a Bi-LSTM framework, which is described as follows.
Murray and Perera proposed a novel dual linear autoencoder approach for predicting the trajectory of a selected vessel [29]. Forti et al. explored neural sequence-to-sequence models based on the LSTM encoder-decoder architecture to effectively capture long-term temporal dependencies of sequential AIS data and increase the overall predictive power [48]. Although scholars have focused on long-term track prediction (within 5-30 min) to completely prevent the occurrence of short-range collisions, vessel trajectories are affected by factors such as terrain features, ocean currents, and wind direction; therefore, making accurate predictions is difficult. Advances in artificial intelligence have facilitated the development of intelligent transportation systems; the main aim of these systems is to improve the safety and efficiency of maritime traffic. Vessel collision avoidance is one of the most important issues in marine safety. Vessel collision avoidance involves controlling the direction in which the vessel moves and obtaining reliable track prediction. Therefore, track prediction should be accurate and instantaneous.
AIS data are used to understand the historical behavior of a vessel and thus predict its trajectory. An AIS stores numerous vessel parameters in a database. The sailing mode of a vessel is easier to determine through the AIS; therefore, a prediction model using AIS data can be optimized [19], [49]. In this study, by denoising and analyzing original AIS data, we could reduce the complexity of the input prediction model data and the calculation time, thereby increasing the prediction accuracy; thus, fast and accurate vessel trajectory prediction could be achieved and the efficiency and safety of maritime traffic could be increased. The prediction performance of the proposed model was compared with that of the ETS, ARIMA, SIR, RNN, and LSTM models; the advantages and disadvantages of each model were analyzed as well.
ETS is a data-averaging method that considers three factors: the error, trend, and season [50]. Moreover, the weight of ETSweighted data decays exponentially. The weight of the latest data is the highest, with the weights decreasing with the age of the data. However, because of the lack of calculation, a considerable gap exists between the predicted value and the observation value in ETS. The ARIMA model predicts values by examining the differences between time-series values. The ARIMA model comprises three components: AR, integrated (I), and MA components. It also includes a total of three parameters: p, d, and q. To achieve accurate prediction results, the ARIMA model must refer to a large quantity of historical data to determine its optimal parameter combination and must determine the AR (p) and MA (q) parameters through the Akaike information criterion and Bayesian information criterion. In general, a statistical model cannot be used to solve nonlinear problems easily. As displayed in Fig. 9, the ETS and ARIMA models always exhibited the highest error values in this study.
SVR is a classic machine learning method that has been successfully used for bus passenger flow prediction [51], Covid-19 case prediction in India [52], and vessel trajectory analysis [53]. SVR has three hyperparameters: the regularization parameter (C), kernel function bandwidth (σ), and ε-insensitive loss function (ε). Any changes to these parameters would considerably affect the SVR prediction accuracy. However, the automatic adjustment of the three hyperparameters in SVR remains a challenge [54], [55]. The experimental results reveal that the MAE, RMSE, and MAPE of the SVR model were lower than those of the ETS and ARIMA models when the default hyperparameters were used.
Owing to the transient memory, an RNN is suitable for modeling time-series data. An RNN maintains an excitation parameter vector for each time step, especially when short-term correlations are included in the input data. However, if gradient descent is used to train an RNN, it becomes difficult for the RNN to learn the long-term dependence in the input sequence because of the gradient vanishing problem [51], [52]. Hochreiter and Schmidhuber developed the LSTM architecture [26]; in an LSTM network, a special structure and memory unit are adopted to maintain the forward and backward transmission between layers within a stable signal range in order to solve the problem of gradient vanishing and compensate for the inadequate long-term dependence of the RNN on vessel trajectory data feature extraction [24], [53].
An RNN has a fewer number of computations, fewer neurons, and hidden layers when compared with an LSTM network; however, an RNN has a higher error rate. Although LSTM has an outstanding feature extraction ability, its unidirectional memory propagation is not sufficient to achieve the accuracy required for trajectory prediction tasks; therefore, LSTM was not the most ideal method for this study, and its error rate was similar to that of an RNN.
The relationship between past and future time points is crucial for solving the time-series problem. The Bi-LSTM model uses its special valve structure (gate) to control memory access; this gate allows the network to remember the characteristics of long series data and obtain a model for the relationship between future datapoints and past datapoints through the bidirectional design, strengthen the original time series, and make similar predictions for continuous data. Therefore, the Bi-LSTM model can outperform an LSTM network in solving nonlinear problems and can more effectively fit a dataset; hence, it can be used in sequence analysis and provide more accurate predictions.

III. METHOD
In this study, we applied data denoising along with a Bi-LSTM model to predict vessel trajectories. The flowchart of the study method is illustrated in Fig. 1. First, data on the vessel trajectory, speed, heading, and other features were collected. Subsequently, the data were cleaned using trajectory separation, outlier deletion, and data standardization, after which the Bi-LSTM model was employed to eliminate noise in AIS trajectory prediction. The vessel trajectory predicted by the Bi-LSTM model was then evaluated, and the results were compared with those of other algorithms. Finally, the predicted trajectory was compared with the original trajectory data to verify the prediction efficiency of the algorithm.

A. COLLECTION OF DATA ON VESSEL TRAJECTORY, SPEED, COURSE, AND OTHER FEATURES
Vessel track prediction was the main objective of this study. The AIS dataset was downloaded from a public database; the data included the MMSI, speed over ground (SOG), course over ground (COG), record time, and boat length. Finally, the difference of vessel trajectory prediction after data denoising is discussed.

B. DATA CLEANING THROUGH TRACK SEPARATION, OUTLIER DELETION, AND DATA STANDARDIZATION
Vessel trajectory data (AIS data) are typically stored in a database through data transmission and reception. Therefore, abnormal AIS data must be removed before vessel track analysis. This study enhanced the quality of the collected vessel trajectory data through data cleaning, data standardization, and deduplication.

1) TRACK SEPARATION
The raw AIS dataset contained numerous ship features along with the data of hundreds of ships. An MMSI can uniquely identify ships; therefore, it can be used to separate the AIS data samples of different ships. This helps us to improve the accuracy of neural network prediction and to establish different prediction models for different vessel trajectories. We can predict the trajectories more accurately under similar navigation modes [56].
The trajectory data of the same vessel are separated. In navigable waters, owing to the large number of ships and the constraints of the AIS working mechanism, network communication is blocked. For example, the data that should have been received 1 s ago is received after a delay of 1 s by the network. This 1 s is the time interval. Thus, the AIS cannot reserve or listen to idle time slots, causing the AIS information to be delayed and the trajectory data of a given ship to appear at larger intervals. Overall, a continuous vessel trajectory is separated based on the timestamp information of the AIS data.
2) DATA CLEANING This study was conducted off the island of Taiwan, at a latitude between 20° and 25° N and at a longitude between 120° and 123° E. Raw AIS data collected from a single vessel were cleaned through the deletion of duplicate tracks, deletion of data abnormal COG and SOG data, deletion of abnormal MMSI number data, and standardization of the remaining data. As presented in Fig. 2, several factors can engender erroneous or noisy data. Changes in direction (latitude) are common among vessels; hence, to ensure the accuracy of captured data, the moving average (MA) method should be used for location data standardization during the cleaning process. Moreover, manual inspection of the original AIS data revealed abnormalities in the data regarding the speed of the vessels, with some of the speeds exceeding >30 knots. Such pieces of abnormal data were also deleted. To ensure vessel safety, vessel speeds should be standardized for specific navigation conditions. On the basis of the average navigation direction of vessels, the average navigation direction of vessels traveling in a region can be predicted. During data preprocessing, wandering or anchoring trajectories in the original dataset were eliminated. We set the minimum time interval of the trajectory to 1200 s because the AIS information receiving interval is generally specified to be 5-10 min and because an information interval higher than 20 min is used in the next stage of the navigation status [28]. Many noisy data points in the original data were eliminated. The original ship dataset contained data on many floating and anchored ships. The speed of these ships is affected by wind and ocean currents and is often less than 1 knot, which is an abnormal navigation state. The data required for the experiment were for sailing ships; accordingly, a smooth route was obtained along the motion trajectory, making each trajectory easier to analyze. Because the time interval of the ship track point acquisition depends on the ship, the ship track can be regarded as a continuous time series. When encountering outliers, the MA model was used to standardize the data series. Taking Figure  2.(a) as an example, when we find that an abnormal SOG value suddenly appears, MA processing is performed on two adjacent data, and the obtained value is used to replace the original abnormal value [47].

1) LSTM NETWORK
The Bi-LSTM model was applied to model individual navigation features by using the denoised dataset. Consequently, the accuracy of vessel trajectory prediction could be increased. An LSTM network is a special RNN that is suitable for analyzing sequential data [57], [58]. In an LSTM network, a memory cell replaces the hidden layer function that is present in a traditional RNN. An LSTM unit usually comprises a memory cell, forget gate, input gate, and output gate. The LSTM unit is used to enhance the long-term memory ability of an LSTM model and to resolve the long-term dependence problem. Generally, an LSTM network consists of at least one unidirectional LSTM layer and one LSTM unit. This study combines two bidirectional LSTM layers and thus has four LSTM units. The formula for an LSTM network can be expressed as follows: At time t, xt is the input data of the LSTM unit, ht is the output of the LSTM unit, ht−1 is the output of the LSTM unit at t−1, and Ct is the value of the memory unit. The operating procedures of an LSTM network are outlined as follows: 1) The value of the forget gate is calculated. The forget gate controls the update of the historical data to the state value of the memory unit, where represents the weight matrix and represents the bias.
2) The value of the candidate memory unit ̃ is calculated, where represents the weight matrix and represents the bias.
3) The value of the input gate is then calculated. The input gate controls the update of the current input data to the state value of the memory unit, where represents the sigmoid function, represents the weight matrix, and represents the bias.
4) The value of the current memory unit is calculated; −1 represents the state value of the previous LSTM unit.
5) The value of the output gate is calculated. The output gate controls the output of the state value of the memory unit, where represents the weight matrix and represents the bias.
6) The output of the LSTM unit ℎ is calculated, where tanh is a nonlinear activation function. It converges the permissible amplitude range of the output signal to a finite value. The function is expressed as follows: The three control gates and memory cell of an LSTM unit facilitate the process of maintaining, resetting, and updating long-term information. Because of the weight-sharing mechanism in LSTM, the number of dimensions can be controlled by setting the weight matrix. In an LSTM unit, a long delay exists between forward and back propagation because the internal state of the memory cell in the LSTM structure maintains a constant data size, reducing the probability of gradient explosion and gradient vanishing.

2) BI-LSTM NETWORK
In this study, we designed a Bi-LSTM model to predict the trajectory of vessels; this network model computes input vectors containing information about the past and future within a specific time range (Fig. 3). In the proposed method, a regular LSTM state neuron is divided into two parts: one part is responsible for the positive time direction (forward state) and the other is responsible for the opposite time direction (backward state  ) is the output vector. The Bi-LSTM model includes three layers: an input layer, a hidden layer, and an output layer. Trajectory data (lat, lon, s, c) are input to the input layer, and the output layer outputs the vector (x, y). The hidden layer consists of an LSTM layer and a fully connected layer. The model training process includes forward propagation and backpropagation. In forward propagation, the prediction model between the input vector (lat, lon, s, c) and output vector (x, y) is established. First, the training samples (lat, lon, s, c) are divided into several groups according to the batch size and sent to the input layer. Subsequently, they are propagated through the Bi-LSTM layer according to (1)-(6) to obtain the output ℎ . Finally, the prediction offset (xp, yp) is derived using the fully connected layer. During backpropagation, the prediction model is optimized by adjusting the weight parameters. The error between the actual vectors (x and y) and the predicted offsets (xp and yp) is calculated using the loss function (7). The weight parameters are adjusted by minimizing the loss value to improve the prediction accuracy of the model. The loss function is expressed as follows: where t denotes the time step, n denotes the length of the test set, denotes the observation value, and denotes the predicted value. Overfitting affects the statistical noise in the training data for model learning, resulting in the poor timeliness of the evaluation model on the new data (test set). Due to overfitting, generalization error also increases. Dropout is a regularization method that is similar to training a large number of neural networks with different structures in parallel. In the training process, some neuron outputs are ignored or discarded randomly, making the hidden layer appear similar to a new network structure with a different number of neurons to reduce overfitting.

3) DROPOUT
During neural network training, the prevention of overfitting is crucial. Srivastava et al. proposed the dropout method to prevent overfitting in neural network training [59]. During the training process between layers in a neural network, the dropout method randomly drops some neurons with a certain probability, as displayed in Fig. 4.

D. EVALUATION OF THE ALGORITHM AND COMPARISON WITH OTHER MODELS
To increase the accuracy of vessel trajectory prediction and verify the performance of the data denoising process and the Bi-LSTM model, we compared the error rate of our vessel trajectory prediction algorithm with those of common timeseries prediction methods in the literature, including exponential smoothing (ETS) [50], autoregressive (AR) integrated MA (ARIMA) [60], support vector regression (SVR) [61], RNN [62], and LSTM [58] algorithms.

1) ETS ALGORITHM
The ETS algorithm proposed by Brown in 1956 is a classic timeseries prediction method [50]. ETS predicts the future value of the time series by using the weighted average of past observations of the time series. This method gives decreasing weights to past observations and higher weights to more recent observations. This framework enables reliable estimates to be produced quickly in most applications. The simple exponential smoothing model was first extended to seasonality components by Winters in 1960 [63], after which Holt used it to determine trends [64]. In these models, the trend components can be multiplicative (m), multiplicative damped (MD), additive (a), additive damped (AD), or absent (n), and the seasonality components can be multiplicative (m), additive (a), or absent (n). The simple ETS formula is as follows: where

2) ARIMA MODEL
Autoregression is a statistical method used to analyze the relationship between a single variable and a group of independent variables. It is usually employed for time-series forecasting. However, the autoregression can only be used in models that can be fitted with extracted time-series data. Therefore, to obtain favorable experimental results, a large quantity of data must be collected. The ARIMA model was proposed by Box and Jenkins in 1976; this model is also known as the Box-Jenkins model [60]. The ARIMA model can be fitted to a time series to better understand the future value in the series; it is a simple linear method. The ARIMA model consists of two parts: AR and MA components.

3) SVR MODEL
On the basis of the structural risk minimization principle proposed by Vapnik, a loss function ε was derived; subsequently, an SVR model was developed to solve nonlinear problems, especially time-series prediction problems. SVR was proposed by Vapnik et al. in 1997 [61] and has been used in many forecasting tasks such as short-term load forecasting [65] and monthly rainfall forecasting [66]. To obtain good forecasting performance, all three hyperparameters (C, ε, and σ, a kernel parameter) of the SVR model must be determined. These hyperparameters are usually determined through data resampling, which is computationally time consuming. Thus, an efficient approach to simultaneously determine all parameters is necessary.

4) RNN MODEL
Based on Rumelhart's work in 1986, the aim of which was to extract long-term dependency in sequential data [67]- [68], an RNN has a unique memory unit that allows it to be used in short sequence prediction; however, in practical applications, the length of the problem sequence is not known, which may lead to gradient vanishing or gradient explosions during the learning process. Therefore, the practicability of an RNN is limited. However, variants of RNNs exist, such as LSTM [26] and gate recurrent unit [69] models. A simple RNN has only one internal memory unit ht, which is represented by (16): where f is the activation function, U and W are the weight matrices of the hidden layer, b is the bias, and xt is the input vector at time t [70].

E. CRITERIA
The mean absolute error (MAE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE) are typically used to determine the deviations of observed values from predicted values for evaluating the prediction performance of models. These metrics were used in this study to evaluate the performance of the proposed method. Specifically, assuming is the ground truth and ̂ is the predicted value at time t, they are defined as follows:

A. DATA COLLECTION
To verify the efficiency and effectiveness of the Bi-LSTM model, we recorded data from more than 300 vessels in the waters around Taiwan over a specific period (Fig. 5). The vessel trajectory was randomly selected from the AIS dataset between 00:00:00 on July 6, 2019, and 03:00:00 on July 6, 2019. The MMSI numbers of the vessels were 24955800, 229069000, 357402000, 305097000, 249290000, 271045019, 351296000, 671115100, 667001698, and 636014592. Because of the equipment and signal drift, information on the longitude, latitude, direction, and vessel speed in AIS dataset changes considerably.
To ensure data availability, the data had to be preprocessed in this study. The transcoded and preprocessed data are presented in Fig. 6. For trajectory training, the data of 1,364 vessels with high activity frequency were selected from the dataset, and 1month navigation track data were obtained; a total of 1,048,575 track points were obtained. In Fig. 6, Lon represents longitude, Lat represents latitude, and Record_time represents the track point recording time. Table I lists the azimuths corresponding to the COG value in the AIS data, and Table II lists the speed status corresponding to the SOG value in the AIS data.

B. NETWORK PARAMETER SETTINGS
In this study, Keras was used to help develop the Bi-LSTM model, and the Adaptive Moment Estimation (Adam) algorithm that optimizes random objective functions was used to update the parameters in the training process. Adam dynamically adjusts the learning rate of each parameter according to the loss function (MAE, RMSE, and (7)) and the first-order matrix estimation and second-order moment estimation of the gradient of each parameter. Moreover, this algorithm is based on the gradient descent algorithm [71], but a specific range is specified for the learning step of each parameter; additionally, a larger gradient would not lead to an excessively large learning step, and this ensures a relatively stable parameter value. The learning rate was set to 0.0001, and the decay_rate was set to 0.5, which indicates the decay_rate in the learning rate after each update of the parameters. In addition, to prevent model overfitting, the dropout algorithm was employed. During the training process, the weight of neurons was randomly reset to 0, rendering the neurons ineffective. Setting the weight to 0.1 would signify a 10% dropout probability for each neuron. Because the training dataset was relatively small, we employed a batch size of 1000 in this experiment. We used 80% of the dataset as the training data and used the remaining 20% as the test data. Table 3 presents a comparison of the vessel trajectory prediction results of the proposed Bi-LSTM model and those of five other models. The five models were the ETS, ARIMA, SVR, RNN, and LSTM models. In this study, the trajectories of ten vessels (vessels 1-10) randomly selected from the AIS data were predicted. The MAE, MAPE, and RMSE were used as the criteria for model validation. The experimental results revealed that among the models, the Bi-LSTM model had the lowest prediction error for vessels 1-10. The MAPE of the Bi-LSTM model was lower than those of the ETS, ARIMA, SVR, RNN, and LSTM models by 75.8%, 74.5%, 66.9%, 46.0%, and 44.8%, respectively. The Wilcoxon-signed rank test was used to test for statistical significance. The results also indicated that the prediction performance of the Bi-LSTM model was significantly higher than that of the other models.   To better understand the effectiveness of track prediction, we can consider the results for vessel 1 as an example. As illustrated in Fig. 7, the predicted values (red dots) of ETS and ARIMA deviated considerably from the observed values (blue dots); this can be attributed to the smoothing coefficient of ETS and the problem in optimizing the ARIMA hyperparameters p, d, and q. Moreover, the statistical model requires considerable preprocessing before the data are input into the model to ensure that the input sequence is stationary and to increase the prediction accuracy considerably. The prediction performance of the SVR model was superior to that of the statistical model; however, the hyperparameter optimization process remains a challenge. LSTM is a network structure optimized by an RNN. The prediction accuracy of the LSTM network was expected to be higher than that of the RNN. However, as presented in Table  III, we did not observe considerable difference between the prediction accuracies of the LSTM and RNN models. The proposed Bi-LSTM model exhibited a considerable improvement in prediction performance. The superior prediction performance of the Bi-LSTM model can be attributed to the bidirectional structure, which strengthens the feature extraction process in a sequence.

C. RESULTS OF THE PREDICTION OF MULTIPLE VESSEL TRAJECTORIES
In the present study, Vessels 1 to 6 were randomly selected for trajectory prediction. Because of the inertia of vessels, finding the complex trajectories of heavy vessels in the ocean is a difficult task. To verify the effectiveness of the proposed method, Vessels 7 to 10, which had a turning nature, were added. The experimental results revealed that accurate trajectory predictions were more difficult to achieve with the other methods (MAPE > 10). The proposed Bi-LSTM yielded favorable prediction results for the ten vessels (including those with complex trajectories), and a p-value test revealed a significant difference between the results of the proposed method and those of other methods (Fig. 8).

D. STATISTICAL DESCRIPTION
To verify the advantages of the proposed Bi-LSTM model, a Wilcoxon signed-rank test was used to compare the MAPE and RMSE values obtained for the ETS, ARIMA, SVR, RNN, LSTM models with those obtained for the Bi-LSTM model. The Wilcoxon signed-rank test is a nonparametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e., it is a paired difference test). All statistical analyses were conducted using Python. As presented in Table III, the prediction accuracy of the Bi-LSTM model improved by 75.8%, 74.5%, 66.9%, 46.0%, and 44.8% when compared with that of the ETS, ARIMA, SVR, RNN and LSTM models, respectively, and the differences were statistically significant (p value less than .05). Fig. 9 shows the predicted track values generated by the Bi-LSTM, LSTM, RNN, SVR, ARIMA and ETS models during the prediction period. The results indicated that the average MAPE of the Bi-LSTM model was lower than those of the LSTM, RNN, SVR, ARIMA, and ETS models by 44.8%, 46.0%, 66.9%, 74.5%, and 75.8%, respectively; the average RMSE of the Bi-LSTM model was lower than those of the LSTM, RNN, SVR, ARIMA, and ETS models by 32.4%, 35.2%, 51.2%, 68.8%, and 70.4%, respectively; and the average MAE of the Bi-LSTM model was lower than those of the LSTM, RNN, SVR, ARIMA, and ETS models by 31%, 34%, 57%, 62%, and 69%, respectively. Accordingly, the Bi-LSTM model was superior to the ETS, ARIMA, SVR, RNN, and LSTM models in all aspects of vessel trajectory prediction. The Bi-LSTM model exhibited the best prediction accuracy; the experimental results indicate that the Bi-LSTM model can provide more robust predictions with lower error rates than can other models.

E. STUDY OBJECTIVE
Accurate vessel route prediction is crucial for maritime traffic control and management. In addition to collision avoidance, vessel route prediction can help in planning routes in advance, shortening the sailing distance, and increasing navigation efficiency. In this study, we proposed a new AIS data denoising and prediction method. The main contributions of this study are outlined as follows: (1) cleaning of the original AIS data through track separation, outlier deletion, and data standardization; (2) development of a Bi-LSTM model for accurately predicting vessel trajectories by using denoised AIS data; and (3) verification of the performance of the Bi-LSTM model by using the trajectory data of ten vessels and by comparing its performance with those of other common prediction models in the literature. Moreover, the study findings can help vessel traffic controllers to predict the accurate track of vessels, which can enable them to take preventive measures to avoid collisions and improve the efficiency and safety of maritime traffic.
Notably, the method proposed in this study focuses on the trajectory of large ships moving across the ocean. Because of its weight and inertia, a ship's trajectory is straightforward and easy to predict. When the method is applied to small-and medium-sized ships, the high mobility and high speed of small ships must be considered. During the training of the model, we must focus more on the effects of COG and SOG changes on trajectory prediction. The in-depth learning prediction model is also a key method for achieving improvements. It improves not only the prediction accuracy but also the operation speed, both of which contribute considerably to reducing maritime collisions. Moreover, the proposed models exhibit applicability not just for vessel trajectory prediction but also for other applications (e.g., prediction of hurricane, human, and animal movement trajectories). Future studies should include contextual information regarding similarity measures and location predictions of trajectories. Environmental factors such as wind, wave, and weather conditions have considerable effects on the movement of vessels and may cause them to deviate from their normal paths. In this study, a method for predicting vessel trajectory using AIS data denoising and a Bi-LSTM model was proposed. The model can accurately predict the trajectory sequence and be used for short-term prediction. We used data on the trajectory of ten vessels to evaluate our model. Our model was found to outperform the ETS, ARIMA, SVR, RNN, and LSTM models, demonstrating its suitability for vessel trajectory prediction. The proposed model can encode, delete outliers, standardize data, and effectively extract features of historical vessel tracks through data denoising. Furthermore, the proposed model can extract features and apply them to vessel track point prediction, which helps considerably reduce errors in vessel trajectory prediction. Thus, the study findings can help maritime traffic users predict the accurate trajectory of vessels, which can enable them to take preventive measures to avoid collisions and improve the efficiency and safety of maritime traffic.