Long Short-Term Memory Recurrent Neural Network for Tidal Level Forecasting

Tide is a phenomenon of water level change caused by gravity. Tidal level forecasting is not only a key theoretical topic but also crucial in coastal and ocean engineering applications. The waiting time before a cargo ship enters a port affects the efficiency of cargo transportation, the tidal difference affects the establishment of turbine generators, and an excessive tidal water level reduces vessel safety. With the proliferation of information technology, the application of deep learning models in the analysis and study of hydrological problems has become increasingly common. This study proposed a deep learning model to predict the tidal water level. A forecasting model was developed on the basis of the long short-term memory (LSTM) recurrent neural network for predicting the water levels of 17 harbors in Taiwan. Tidal water level data for 21 years were collected from different observation stations. To objectively evaluate model performance, the developed model was compared with six other forecasting models in terms of the mean absolute percentage error (MAPE) and root mean square error (RMSE) of the forecasting results. The results indicated that the LSTM model had the lowest forecasting error for the tidal water level for up to 30 days. The average MAPE and RMSE values for the developed model were 6.97% and 0.049 m, respectively; thus, the model could effectively reduce the overlapping problems caused by machine learning methods in continuous forecasting.


I. INTRODUCTION
Tide is mainly affected by celestial gravity, climate, and air pressure. Therefore, tide is a relatively regular phenomenon in which the water level rises and falls. Tidal changes are closely related to human activities, such as marine economic activities, port development, research plans for coastal and port construction projects, and budget control, which are crucial for economic development [1]. Therefore, tide forecasting requires high-precision tools and methods. For shallow watercourses, tidal changes limit the time that large ships can enter and exit a port. When ebb tide occurs, ships cannot enter certain ports with large changes in tidal levels. According to statistics from Taiwan International Ports Corporation, Ltd. [2], from January to November 2019, 71 841 ships entered or exited Taiwanese ports, with a total tonnage of 1 492 676 527 tons. Moreover, cargo throughput reached 174 The associate editor coordinating the review of this manuscript and approving it for publication was Qiang Lai . 437 692 metric tons, with a trade value of approximately NT$286 778 285. Cargo transported by ships significantly influences Taiwan's economy. Tidal changes should be considered when a ship is at a dock. The tightness of the mooring should be considered to avoid collisions or grounding due to cable breakage on account of excessive tide changes. Accurate tide forecasts are critical to the safety of ships entering and leaving ports as well as the efficiency of port transportation.
Although tidal water levels change periodically, nonlinear fluctuations in water level occur according to changes in the terrain, pressure, time, and moon position at a port. Therefore, accurately predicting water levels is difficult. Tidal level data are related to not only the current time but also previous data. Harmonic analysis proposed by Godin in 1972 is a crucial method for predicting tidal water level [3]. Harmonic analysis has been improved over time [4]. The response [5] and continuous wavelet transform methods [6] have also been applied in tidal water level prediction. After 2005, due to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the rise of deep learning, artificial neural networks (ANNs) have also been applied in tidal water level prediction [7], [8].
Because ANNs have excellent nonlinear problem processing capabilities, they have gradually replaced other methods of tidal water prediction. Neural networks, also known as perceptual neuron models, were proposed in the 1950s and mimic the human brain tissue and manipulation method [9]. Prior to the 1980s, the expert system was the most popular artificial intelligence method, and neural network theory was still immature. In the 1980s, the expert system encountered bottlenecks. Neural networks have received increasing attention since their proposition by Hopfield et al. [10]. New structures and theories of neural networks are still being developed, and their computation speeds are constantly increasing. Thus, the accuracy and use of these networks is increasing. Deep neural networks (DNNs) have been widely used in various fields over the past 10 years [11]. DNNs have been used in fields such as speech recognition [12], image recognition and classification [13], overall survival prediction [14], and time series prediction [15]. In response to the time series problem, DNNs were improved to create recurrent neural networks (RNNs) and long short-term memory (LSTM) networks [16], [17]. In addition to reducing the prediction error, the aforementioned networks solved the overfitting-related problems of DNNs.
In tide analysis, a tide can be regarded as the result of water level change caused by the superposition of the astronomical tide affected by gravity and the nonlinear water level affected by environmental factors. Harmonic analysis was previously the main method used to predict tidal level [18]; however, its disadvantage is its requirement of long-term data to achieve satisfactory accuracy. Long-term tidal prediction is influenced by phenomena such as noises, seasonal effects, missing data, and typhoon-induced surge [19], [20]. Although many neural network structures have been used to solve this problem, most face the same problem [21]- [23]. In other words, the powerful learning ability of neural networks causes the disadvantage of easy overfitting during training. Consequently, the trained models are only suitable for current data. Moreover, they require numerous hyperparameter adjustments, which considerably increase the time required for training.
This study proposes the addition of a network structure based on LSTM to the fully connected layer for solving the aforementioned problems. An LSTM recurrent neural network comprises one control unit (gate) each for forgetting, updating, and output [24]; thus, the LSTM recurrent neural network has a different structure than does the general neural network. Information can be stored long term in the LSTM recurrent neural network. Moreover, meaningless information is forgotten in the LSTM recurrent neural network, which prevents the phenomenon of vanishing or exploding gradient caused by gradient descent. The LSTM recurrent neural network solves the problem of error superposition caused by the application of a neural network to long-term tide prediction.
Error superposition leads to a large misalignment in predicted result. According to the extended application of DNNs and satisfactory results obtained in time series prediction, the tidal water level prediction performance of the LSTM was compared with that of five other methods commonly used in time series prediction.

A. NORMALIZATION
Because the conversion function of the LSTM recurrent neural network is a hyperbolic tangent function (tanh function: e x − e −x /e x + e −x , whose function value ranges from 1 to −1), the training and testing set data are normalized. The normalization equation is as follows: where max(x) and min(x) are the maximum and minimum values in the training and testing sets, respectively, x is the input value, and z is the converted value of x.

B. LSTM
LSTM networks are a variant of RNNs that have been applied in fields such as biomedical science [25], speech recognition [26], sentiment analysis [27], and image classification [28]. However, LSTM recurrent neural networks have not yet been applied in tidal water level forecasting.
The LSTM recurrent neural network provides a solution to the time series problem. Although the backward transfer neural network can establish the correlation between data points through feedback, the function of LSTM is more complete because of the network structure of LSTM units and the forget gates added to the system. These gates can record past crucial features during training and select unimportant features to forget according to weights.
One-step-ahead prediction of the tidal water level time series requires not only current tidal data but also previous data. The RNN model has a self-feedback mechanism in the hidden layer. Because of this mechanism, the RNN model has an advantage in managing long-term dependence problems; however, difficulties remain in practical application [29]. To solve the RNN's problem of gradient vanishing, Hochreiter and Schmidhuber proposed the LSTM model in 1997 [24]. This model was recently improved by Graves [30]. An LSTM unit consists of a memory cell that stores information and is updated by three specialized gates: the input, forget, and output gates. The structure of an LSTM unit is displayed in Fig. 1.
At time t, x t is the input data of the LSTM cell, h t−1 is the output of the LSTM cell at time t− 1, c t is the value of the memory cell, and h t is the output of the LSTM cell. The calculation process of the LSTM unit comprises the following steps: (1) First, the value of the candidate memory cellc t is calculated using (1). In (1), W c is the weight matrix and b c is the bias.
(2) The value of the input gate i t is then calculated using (2). The input gate controls the updating of the current input data to the state value of the memory cell. In (2), σ is the sigmoid function, W i is the weight matrix, and b i is the bias.
(3) The value of the forget gate f t is calculated using (3). The forget gate controls the updating of the historical data to the state value of the memory cell. In (3), W f is the weight matrix and b f is the bias.
(4) The value of the current moment memory cell c t is calculated using (4), in which c t−1 is the state value of the last LSTM unit.
(5) The value of the output gate o t is calculated using (5). The output gate controls the output of the state value of the memory cell. In (5), W o is the weight matrix and b o is the bias.
(6) Finally, the output of the LSTM unit h t is calculated using (6).
The three control gates and memory cell of the LSTM unit allow it to maintain, read, reset, and update long-time information easily. Because of the sharing mechanism of the LSTM internal parameters, the dimensions of the output can be controlled by setting the dimensions of the weight matrix. In the LSTM unit, a long delay occurs between input and feedback. The gradient neither explodes nor disappears because the internal state of the memory cell in the LSTM architecture maintains a constant error flow.

C. DROPOUT
During neural network training, preventing overfitting is crucial. Hinton proposed the dropout method to prevent overfitting in neural network training [31]. During the training process between layers in a neural network, some neurons are randomly dropped from the network with a certain probability, as displayed in Fig. 2.

D. PROPOSED METHOD
To establish a tidal water level forecasting model, this study used two layers of LSTM units. Because the output of the LSTM unit is a multidimensional vector, three fully connected layers are used in each unit. The predicted value is output at the sixth layer. The complete network model structure is displayed in Fig. 3. Neural network hyperparameters lead how the network functions, and further determine its accuracy and validity. In order to achieve successful performance in each problem domain, LSTM hyperparameters must be adjusted. The hyperparameters of LSTM include number of hidden layers, number of neurons, learning rate, activation function, batch size, epoch and loss function. In this study, depending on how the tuned hyperparameter values affect model performance, LSTM hyperparameters optimization were adjusted manually by the experts. The following hyperparameters settings were obtained, the learning rate is 0.0001, and there are  To extract features (Fig. 4), layer 3 has 50 neurons and the output layer has only one neuron, which is also used as the output of the predicted value y t . All the activation functions use rectified linear units (ReLUs) because in addition to the simple calculation process, ReLUs can perform gradient descent and reverse transfer efficiently, thereby preventing gradient explosion and vanishing.
The autoregressive integrated moving average (ARIMA) model was proposed by Box and Jenkins in 1976 [32] and is also known as the Box-Jenkins model. In the ARIMA method, several fragments formed after a time series has passed are used as input. Moreover, regression analysis is performed to establish a mathematical forecasting model, which is often used for the prediction of short-term economic trends. The trigonometric seasonality, Box-Cox transformation, ARMA errors, and trend seasonal components (TBATS) model, which was proposed by Livera in 2011 [33], is a new method that combines trigonometric seasonality, Box-Cox transformation, ARMA errors, trend, and seasonal components. The TBATS model is based on the Exponential smoothing. It can predict whether seasonal data exists and can analyze this data. Although a combination of multiple models can provide highly accurate results, considerable training time is required for such a combination, which results in slow calculations.
F. MACHINE LEARNING 1) SUPPORT VECTOR REGRESSION Support vector regression (SVR) was proposed by Vapnik et al. in 1997 [34]. The SVR algorithm includes functions such as the insensitive loss and penalty factor functions; thus, it is more robust than is the support vector machine algorithm [35], [36]. After the SVR algorithm projects the data onto a high-dimensional hyperplane, the total distance from each point to the hyperplane is calculated. If a hyperplane is identified using the minimum total distance, this hyperplane is the solution. Prior to the emergence of deep learning, SVR was the most common method for predicting entire time series.

2) PARTICLE SWARM OPTIMIZATION-BASED SUPPORT VECTOR REGRESSION
Particle swarm optimization (PSO), proposed by Kennedy and Eberhart in 1995, originated from analysis of bird flight when foraging. In particular, birds provide constant updates on the location of insect food sources to the entire group, and the group decides on the optimal feeding ground, akin to solving an optimization problem [37]. In SVR modeling, the parameter settings affect the performance of a forecast time series, as mentioned in the preceding discussion on PSO. The crucial parameters are the regularization parameter (C), bandwidth of the kernel function (σ ), and tube size of the ε-insensitive loss function (ε). Inappropriately selected parameter values can result in either overfitting or underfitting [38]. Consequently, selecting the optimal parameters is crucial when employing SVR to forecast a time series. Liu et al. used PSO-based SVR to forecast tourist arrivals [38].

3) ARTIFICIAL NEURAL NETWORK
The backpropagation network (BPN) proposed by Hinton is the most commonly used supervised-learning ANN model [39]. The BPN is an optimization algorithm that combines the backward pass, gradient descent [40], and chain rules in calculus. The gradient descent method is used to advance from the initial position of the parameter to the steepest downhill direction and to update the parameter position. The slope information is obtained using the derivative function (to calculate the function slope). Gradient descent uses this characteristic to optimize the cost function.

4) CONVOLUTION NEURAL NETWROK
Convolution neural network (CNN) was proposed by LeCun in 1989 [41]. CNN is a multilayer neural network structure simulating the operation mechanism of a biological vision system. It is a neural network composed of a multilayer convolution layer and a descending sampling layer. Moreover, CNN can obtain useful feature descriptions from original data, which is an effective method to extract features from data [42].
CNN is a special type of NN that introduces convolution and pooling operations to generate deep features, thus improving the network's ability to recognize patterns. Many studies have shown that CNNs are effective tools for handling complex tasks, such as image recognition [43], text recognition [44], and video recognition [45]. Thus, because of the properties of its convolutional layers, CNN is plausibly appropriate for a seasonal time series with trends. A CNN framework containing convolutional layers, pooling layers, and fully connected layers for time series classification was designed in [46]. This framework performs well in discovering and extracting the internal structure of data.

G. PERFORMANCE CRITERIA
The root mean square error (RMSE) and mean absolute percentage error (MAPE), which are common statistical variables, are used to compare the deviation of the real value from the forecasted value for evaluating the forecasting performance of DNNs. The RMSE, the MAPE, and R 2 are expressed in (7), (8) and (9), respectively.
where η io is the observed tidal water level at the ith time step, η iM is the corresponding simulated tidal water level, n is the number of time steps, η o is the mean of observational values, and η M is the mean value of the simulations. Table 1 indicates that most ports had a low skewness in their tidal data and close to Gaussian distribution. However, the absolute coefficients of variation (COV) of ports 1116, 1246, 1366, 1396, and 1436 were considerably larger than those of other ports. The higher the absolute COV, the greater the degree of data dispersibility is. The average difference between the highest and lowest observed tidal water level reached 7.36 m. Such extreme data can be interpreted as water level rise caused by changes in the terrain natural disasters such as typhoons and tsunamis, and sudden storms.

A. DATASETS
In this study, the data of 17 tide stations in Taiwan were analyzed. The names of the 17 stations are Keelung (1516),   (1356). Tidal water level data for 21 years were obtained after removing the influence of air pressure. In this study, the LSTM model was used to predict the tidal water level. Study data for 1998-2018 were obtained from the Central Weather Bureau of Taiwan (Fig. 5). VOLUME 8, 2020 The tide stations were divided into two categories: stations using sonic tide gauges and those using pressure tide gauges for tidal level detection. Sonic tide gauges use microprocessor-based technologies to collect sea level data, whereas pressure tide gauges use mechanical floats and recorders. Modern monitoring stations use advanced acoustics and electronics. Currently used recorders send an audio signal down a half-inch-wide ''sounding tube'' and measure the time the reflected signal requires to travel back from the water's surface. Pressure tide gauges are installed on the sea bed. The instantaneous wave height is calculated according to the water pressure change caused by wave height. However, a pressure tide gauge is suitable only for shallow sea areas (approximately 20 m deep). The change in water pressure caused by lifting is too small to obtain accurate data.

B. DATA PREPROCESSING
In this study, 21 years of data were divided into training and testing sets. The training set comprised data from January 1, 1998 to November 30, 2018, and the testing set comprised data from December 1 to December 30, 2018. To reduce the sample size without considerably affecting prediction accuracy, the training and testing sets were normalized such that all data points were within the (0, 1) interval. In addition to achieving standardization, normalization can increase the speed of neural network training. When normalization is performed, convergence can be achieved relatively rapidly, and the possibility of falling into the local optimal solution is relatively low [47]. Moreover, to eliminate the effect of missing values on the prediction, missing values were deleted to prevent the learning of incorrect features during the learning process.

C. ANALYSIS OF FORECASTING RESULTS
The evaluation of the seven methods indicated that the LSTM and ANN models had similar accuracy and superior performance compared with the conventional statistical models. Fig. 6 displays the observed tidal values and the forecasted tidal level values generated by the optimal LSTM, CNN, ANN, PSOSVR, SVR, TBATS, and ARIMA models during prediction. The figure indicates that the LSTM and ANN models had similar accuracies. Compared with the other models, the statistical models had more decentralized predictions. The prediction results indicated that an increase in the prediction interval reduced model accuracy (  The parameter R 2 represents the interpretability of the forecasting model for the forecasting result. The higher the R 2 value, the higher is the interpretability and the higher is the model accuracy. As the forecast time increases, the accuracy of the time series forecast decreases; however, exceptions exist to this trend. Because of the increase in the amount of data, a small amount of accurate data occasionally causes R 2 to increase (e.g., the performance of the ARIMA models at 576 and 720 h; Table 2).
This study used LSTM recurrent neural networks for the long-term prediction of tidal water levels. To demonstrate the forecasting performance of our LSTM recurrent neural networks, we evaluated LSTM's forecasting performance against those of two statistical methods (ARIMA and TBATS methods) and four machine learning methods (SVR, PSOSVR, ANN, and CNN). For all methods, the MAPE and RMSE were used as indicators of forecasting performance; their averages are presented in Table 3. The LSTM method's MAPE value was 86% and 62% lower than those of ARIMA and TBATS, respectively; the LSTM method's RMSE value was 90% and 79% lower than those of ARIMA and TBATS, respectively. The LSTM predicted tidal water level significantly better than ARIMA and TBATS did (Fig. 7). As for the machine learning methods, the LSTM method's MAPE value was 84%, 42%, 48%, and 40% lower than those of the SVR, PSOSVR, ANN, and CNN approaches, respectively; the LSTM method's RMSE value was 75%, 55%, 65%, and 51% lower than those of SVR, PSOSVR, ANN, and CNN, respectively. These results indicated that the LSTM model significantly outperformed the machine learning methods, which are commonly used in time series predictions of tidal levels. Fig. 8 illustrates the changes in the MAPE and RMSE of the seven methods. The red solid line represents the RMSE, and the blue solid line represents the MAPE. The error value increased with the number of forecast days. Figs. 7(b), (c), (d), and (f) indicate a significant increase in error value when the total forecasting length was between 40% and 80% (approximately 12-24 days). The growth rate of the MAPE was 158%, 20%, 22%, and 26%; however, the error values of the ARIMA, ANN, and LSTM models increased slowly [ Fig. 8(a), (e), and (g)]. The curve was the most gentle, MAPE only increased by 10%, 9%, and 5% compared with the initial value. The LSTM model had the lowest rate of increase; the LSTM model's long-term prediction performance was thus excellent and stable. Figs. 9 and 10 illustrate the MAPE and RMSE of the seven methods in each port. The LSTM model consistently had the lowest RMSE and MAPE. Therefore, the LSTM method was more stable and accurate than were the other six methods. To verify the advantages and disadvantages of the proposed LSTM forecasting model, a Wilcoxon signed-rank test was conducted to compare the MAPE and RMSE values of the ARIMA, SVR, TBATS, PSOSVR, ANN, CNN, and LSTM methods. To analyze the differences in the prediction results of each method, a null hypothesis (H 0 ) and alternative hypothesis (H 1 ) were considered. H 0 posited that the LSTM method's prediction results were not significantly different from those of the other six methods, and H 1 posited the converse (that the prediction results significantly differed). The prediction results differed with statistical significance if p < .05. As detailed in Table 3, the LSTM method had a significantly lower RMSE and MAPE than did the other six methods; thus, the LSTM method outperformed the other methods in tidal level prediction. According to the comparison test results, the LSTM method was more likely than the other methods to provide a robust forecasting model with a small error rate.

A. COMPARISON OF ARIMA AND TBATS
The ARIMA model, with autoregressive movement assumed, must refer to a large quantity of historical data to determine the optimal parameter combination of the model. TBATS was proposed in 2011. Statistical hybrid models were designed to compensate for inaccurate prediction results when a single model is used: it pools the advantages of multiple models and increases the number of calculations. However, it has the disadvantage of being computationally expensive. Moreover, and in general, statistical models still solve nonlinear problems with difficulty. In Fig 9 and 10, ARIMA and TBATS consistently had the highest error values.

B. COMPARISON OF SVR AND PSOSVR
Machine learning has been popular and widely applied. Prior to Hinton's pioneering formulation of machine learning for the time series prediction problem, SVR was the dominant machine learning method. Therefore, we selected SVR as a representative method for comparison.
The use of a single machine learning method is unlikely to yield the best accuracy, and hybrid methods are likely to be better. For example, PSOSVR uses PSO optimization algorithms to improve on the shortcomings of the original SVR. PSO, developed by Eberhart and Kennedy, is a population-based iterative optimization algorithm inspired by the social behavior of bird flocking. Because SVR has  three hyperparameters-the regularization parameter (C), bandwidth of the kernel function (σ ), and tube size of the ε-insensitive loss function (ε)-differences among these parameters greatly affect SVR's forecasting accuracy. The automatic adjustment of these three hyperparameters in SVR remains a prominent challenge for improving SVR's forecasting accuracy. Through an optimized algorithm, PSO can help to adjust the hyperparameters to their appropriately selected values to avoid either overfitting or underfitting [38]. This study proved that the hyperparameters is crucial for SVR. Therefore, compared with SVR, PSOSVR has a lower MAPE and RMSE (Figs. 9 and 10).

C. COMPARISON OF ANN, CNN AND LSTM
ANNs is a part of a neural network, which simulates how the human brain learns. ANN is a model constructed by transmitting neurons, and it has the advantage of good feature extraction capabilities. After the backpropagation neural network was proposed, ANNs also allowed shallow neural networks to contribute distinctively in machine learning. Conventional ANNs with shallow architectures are difficult to train if they become too complex-for example, when the network includes many layers and, consequently, many parameters. It has been widely demonstrated that deep ANN architectures, known as deep neural networks (DNNs), outperform conventional shallow ANN architectures in several applications [48]. Recently, deep learning has gained substantial popularity in the machine learning community because it is considered a general framework that facilitates the training of deep neural networks with many hidden layers [49]. Many neural network models have been widely used to solve several types of time series forecasting problems. Of these models, recurrent neural networks (RNNs) have received much attention [48], [50].
The reason for such attention is that RNNs are a class of ANN models that possess an internal state or short-term memory due to recurrent feedback connections, making RNNs suitable for modeling sequential or time series data. In such modeling, the RNN maintains a vector of activation parameters for each time step, especially when short-term dependencies are included in the input data. However, if an RNN is trained using stochastic gradient descent, the RNN has difficulty learning long-term dependencies that are encoded in the input sequences due to the vanishing gradient problem [51], [52]. To then allow the LSTM to learn long-term dependencies, a specialized neuron or cell structure is employed in the LSTM network, which maintains constant backward flow in the error signal [24], [53].
Recently, deep learning in general and CNN in particular has become the methodology of choice for image analysis, following its tremendous success in routine computer vision applications [54], [55]. CNN is more data efficient than fully connected networks are due to CNN's translation weight sharing properties in the convolutional layers. Layers in CNN are translation equivariant (i.e., when the network input is shifted, internal representations are also shifted), which makes transnational weight sharing effective in each layer. Time series with trends are the most common data sets used in forecasting. Both the convolutional layer and the pooling layer of a CNN can be used to extract crucial features and patterns that reflect the seasonality, trends, and time lag correlation coefficients in the data. Therefore, in addition to being applied in image classification [56], semantic segmentation [57], and object detection [58], CNN is applied in time series forecasting.
Compared with CNNs and LSTMs, ANNs have the advantages of lower computational cost and fewer neurons and hidden layers but the disadvantage of a higher rate of error. Although CNN has strong feature extraction capabilities, it is mostly used in tasks such as image classification or image recognition. Because CNN could not leverage on its advantages in this study, its error rate was similar to that of ANN. Because the intervals between past and future timepoints are crucial to solving time series problems, LSTM, by virtue of its special gate structure, can determine the relationship between past and future timepoints and thus furnish accurate predictions in a long series.

V. CONCLUSION
In this study, tidal water level prediction indicators and related time series forecasting models were examined to increase the transportation efficiency and safety of entry for large vessels in commercial ports. The LSTM method was proposed for long-term tidal water level prediction. The proposed method was tested using 16 datasets. The performance of the proposed method was compared with that of five time series forecasting models, namely the ARIMA, SVR, TBATS, PSOSVR, ANN, and CNN models. The experimental results indicated that the LSTM model had a lower MAPE and RMSE than the other six forecasting models did. The LSTM model also had the highest R 2 value among all the compared models in all the prediction intervals. Thus, the LSTM forecasting model provided more robust results than the other five models did. In addition to lower error values, the LSTM model exhibited a higher stability and more rapid convergence compared with the other five models during training. The prediction results of the LSTM model were smooth and interpretable. The LSTM model also exhibited a superior fit with the collected data compared with the other models. Forecasting the tidal water level with a higher accuracy than that achieved in this study may result in a considerable increase in training time. Therefore, in future studies, a suitable method should be developed for achieving increased accuracy without increasing the training dimensions and time.