Application of the LSTM Models for Baltic Sea Wave Spectra Estimation

This article proposes to apply long-short-term memory (LSTM) deep learning models to transform Sentinel-1 A/B interferometric wide (IW) swath image data into the wave density spectrum. Although spectral wave estimation methods for synthetic aperture radar data have been developed, similar approaches for coastal areas have not received enough attention. Partially, this is caused by the lack of high-resolution wave-mode data, as well as the nature of wind waves that have more complicated backscattering mechanisms compared to the swell waves for which the aforementioned methods were developed. The application of the LSTM model has allowed the transformation of the Sentinel-1 A/B IW one-dimensional image spectrum into wave density spectra. The best results in the test dataset led to the mean Pearson's correlation coefficient 0.85 for the comparison of spectra and spectra. The result was achieved with the LSTM model using <inline-formula><tex-math notation="LaTeX">$VV$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$VH$</tex-math></inline-formula> polarization spectra fed into the model independently. Experiments with LSTM neural networks that classify images into wave spectra with the Baltic Sea dataset demonstrated promising results in cases where empirical methods were previously considered.


I. INTRODUCTION
E STIMATION of the wave spectrum in the Baltic Sea based on the Sentinel-1 image spectrum constitutes the scope of the present research. Synthetic aperture radar (SAR) data has been used in numerous studies to estimate the parameters of the surface gravity wave [1], [2], [3]. The commonly used Wave Mode data and sophisticated inversion methods are well suited to estimate swell parameters over the open ocean [4], [5], [6], [7]. Inversion methods allow one to estimate the wave propagation direction without ambiguity and provide much more information on wave conditions. For coastal areas, where ScanSAR (TOPS mode for Sentinel-1) mode data are generally acquired, such inversion methods have less success, which has motivated the development of empirical methods [8], [9], [10], [11]  estimate bulk wave parameters, e.g., significant wave height (H S ) and peak period (T P ). The sea state in the Baltic Sea is dominated by wind waves from the sea, although there is a swell component within wave systems more than half the time [12], [13]. H S is mainly in the range of 0-3 m and the peak period (T P ) between 3 and 7 s [14]. However, extreme cases of H S up to 8.2 m and a maximum T P greater than 12 s are captured [13], [15].
Recent trends in automated estimations of oceanic and maritime characteristics such as wave (windsea or swell) or wind parameters have evolved toward utilizing SAR imagery with machine learning methods, namely deep learning convolutional neural networks (CNN). The result of a paper published by Quach [16] demonstrated that the use of deep learning in Sentinel-1 SAR data for the prediction of wave height reduced the root mean square error by 50%, from 0.6 to 0.3 m. Similar trends are observed even if older and simpler machine learning techniques are used [11]. Although, in general, the swell-dominated wave field is less complex than the coastal wind-dominated wave field on the SAR data, machine learning methods show the connection between radar backscattering and the underlying wave parameters.
There are other examples in which CNN models on SAR data prove useful for the estimation of ocean geophysical parameters. The application of the Inception V3 architecture [17] allowed the classification of 10 different geophysical phenomena with an overall accuracy score above 0.93. As many of the waveform estimators rely on analytical approaches with some form of statistical tuning, one might question the need for deep learning methods. However, in a study led by Bazzett [18], multiple shallow machine learning methods were compared with deep learning CNNs, producing results that show that neural networks outperform or are comparable to results with kriging estimators. A research article published in the Journal of Oceanology and Limnology [19] reported faster recurrent CNNs that recognize different modes of internal ocean waves, with an accuracy rate of 94.78%. This was significantly superior to conventional methods, such as the Hough/wavelet transformation and intrinsic mode functions. However, the team found that the model became sensitive to morphologically similar wave shapes [19].
The working hypothesis of this present article is that the sequential nature of the wave density spectrum allows the application of long-short-term memory (LSTM) [20] deep learning models to transform the Sentinel-1 A / B interferometric wide-width (IW) image spectrum into wave density spectra. Second, our objective is to describe the different effects of The remainder of this article is organized as follows. Data preparation is described in Section II. The deep learning model to estimate the wave parameters is presented in Section III. Section IV presents the model training and validation activities with a brief discussion of their meaning and the results are drawn in Section V. Finally, Section VI concludes this article.

A. Buoy Measurements
For the current article, surface gravity waves were measured and wave density spectra were calculated at four different locations in the Baltic Sea (see Table I) using two different devices: Lainepoiss (LP [21]) and Datawell Waverider Mk-III (DWR [22]). The spectra were derived from 30 min of raw signal (50 Hz sampling rate for LP and 1.28 Hz for DWR). The time period when the spectra were calculated was consequently shifted forward with 1 min time increments centered around the SAR overflight time; this gave a total of 31 buoy spectra at each SAR image time.

B. SAR Image Processing
Sentinel-1 A/B IW subimages were first calibrated and then speckle filtered using the Lee-Sigma method (filter size 5 × 5). Subimages are left in radar projection to keep the pixels as close to the original form as possible. Image spectra are calculated from both polarizations (V V and V H) using the fast Fourier transform (FFT) method applied to the subimage. The 2-D image spectrum is then integrated over all directions for each wavenumber to get a 1-D spectrum. Two different subimage sizes are extracted over the buoy locations: 256 by 256 pixels and 512 by 512 pixels. Second, three upscaling factors of 1, 2, and 4 (spline interpolation of the order of 3) are also applied for the subimages, resulting in an effective pixel size of 10 × 10 m (e.g., original pixels), 5 × 5 m, and 2.5 × 2.5 m, respectively. This would allow us to test different wave-number limits extractable from the SAR data.

C. Match-up Dataset Creation
Using SAR overflight time, wave spectra are generated between −30 min ≤ datetime SAR ≤ +30 minutes with a step of 1 min, resulting in 31 separate spectra. Each of the 31 measured  spectra was matched with different SAR subimages 31. All SAR subimages are selected around the buoy location so that they consist of different pixels and incorporate the wave measurement location (Fig. 1).
The resulting H S distribution over propagation directions is shown in Fig. 2. The significant wave height is mostly under 2 m and the waves are propagating mostly from the south-west direction. In total, 5509 collocation pairs were formed for the study.
Since Sentinel-1 operates in dual-polarization mode (VV and VH) in IW mode, it enables one to use spectra calculated from both polarizations. We have generated following six different combinations.
3) CO and CRO: V V and V H polarization information is given as independent datasets into the model. 4) CO div CRO: V V divided by CRO. 5) CO multi CRO: the absolute value of the product of V V and V H. 6) CO diff CRO: the absolute value of V H subtracted from V V . Before training, a logarithmic operation at the base e is applied for each spectrum.
The measured wave and SAR image spectra are limited with the minimum wave number of 0.023 1/m or 0.05 Hz and depending on the upscaling factor (orders 1, 2, and 4), the maximum limit of 0.304, 0.612 and 1.240 1/m, which corresponds to 0.27, 0.39, and 0.55 Hz. Furthermore, the SAR image spectra values are interpolated to fixed wavenumber values that were retrieved from the measured spectra (resolution in frequency is 0.005 Hz). Finally, the dataset was divided into training, validation, and testing subsets (65%-25%-10%, respectively). Training and validation subsets were used for neural network training and hyperparameter tuning, whereas the testing subset (not seen by the model) was used for final model testing.

III. LSTM MODEL
Estimation of the wave spectrum as a function of the sentinel spectrum may seem like a simple regression problem; nevertheless, initial attempts to use generalized regression models did not lead generic models to the desired quality. LSTM-type neural networks [20] were designed to capture long-term dependencies in sequential data. Also, LSTM models allow one to avoid the vanishing gradient problem common in simpler families of recurrent DNN models. Finally, LSTM-s include feedback connections, allowing one to treat the entire spectrum as an observation point. The core element of deep LSTM, the LSTM cell, is shown in Fig. 3. In Fig. 3 "x," "+" and "tanh" on the pink background represent element-wise operations and σ and tanh on the green background represent the sigmoid and tanh NN layers. LSTM deep or stacked assumes multiple stacked LSTM layers [23]. For the present investigation, a deep model with two LSTM layers and one dense layer was chosen. See Fig. 4 for details.
The structure was chosen experimentally in an optimization process to maintain the balance between avoiding overfitting and properly capturing spectrum transformations.

IV. COMPUTATIONAL EXPERIMENTS
In Section II-C, six different ways to create a matching data set are described. The differences come from different combinations  of available polarisations. This led to six different representations of the same data set. For each representation, the model was trained with a vanilla LSTM architecture for the 10 000 epoch in each combination of data sets to determine the most promising channel merging method. From these preliminary runs, the combination of CO and CRO, where the polarizations V V and V H were independent, extracted from the subimages of the pixels 512 × 512, produced the best results (see Table II).
After these tests, the hyperparameters of the architecture of the LSTM model were tuned with a Bayesian optimizer and the architecture represented in Fig. 4 was concluded to be the most suitable for the fit of the model. The variation in batch size was tested from 32 to 2048 in multiples of 32, which is recommended [24]. As the training was computationally exhaustive, a strategy was developed to evaluate the convergence of batch values of 2048 to 256 with iterations of 10000. As the results plateaued before the batch size 512, the rest of the tests from 128 to 32 were carried out with 1000 iterations. If the smaller batch sizes were to have a strong impact, it was hypothesized that the improvement would have an effect early on. For the final model, an optimal batch size of 512 was chosen because no significant improvement was observed with smaller batch sizes and the training speed allowed the tuning of other hyperparameters as well within reasonable computational effort. Further optimization was carried out with a Keras tuner Bayesian optimizer, and the results were further enhanced by prior knowledge of the research team. The analysis of the optimization results is described as follows, though the optimizer's samples were not linear through the hyperparameter dimensions due to computational and time constraints. Experiments with the dropout layer with an effect of 0.1, . . . , 0.5 did not have positive effects on the results and were not used in the final model. This could indicate that the dataset has enough variability to Fig. 4. Structure of LSTM model for order 1 data. Fig. 5. Examples of (a) good, (b) mediocre, and (c) bad matches between the measured buoy spectra and predicted spectra; order 2 data is displayed. avoid overfitting. The tangent hyperbolic function was shown to give better results more consistently compared to the sigmoid function. Both are generally recommended to be suitable for LSTM DNN models.
Multiple runs of model fit in the four-layer LSTM echoed similar overfit patterns, as underlined by [19]. Rebalancing and cleaning the raw dataset before flattening the two channels alleviated the issue to a satisfactory level. The four-layer LSTM structure represented in Fig. 4 demonstrated the usefulness of using these general sequence-to-sequence methods to translate the SAR 1-D image spectra into the variance density spectra in the coastal wind wave dominant regions.

V. RESULTS
As can be seen from the results of the final model shown in Table II, the prediction of the wave spectrum yields promising results when using the CO and CRO spectra as independent variables for the model. The correlations between the measured and predicted spectra in the test dataset show a higher than 0.80 for all different upscaling factors (i.e., effective pixel sizes). The mean squared errors are lower in the training dataset than in the test dataset, which is expected.
Taking into account the mostly low wave conditions of the dataset (see Fig. 2), the conversion of the image spectra into wave spectra is generally accurate. The fact that most of the propagation directions approximate align with the SAR range direction, which has higher resolution and, therefore, higher information density, could also have an impact. However, not all image spectra conversions give equally good results (see Fig. 5).
The wave spectra measured in Fig. 5(c) indicate low swell and low wind sea conditions and therefore very low backscattering, which helps explain the bad spectrum-to-spectrum correlation. Although the energy levels of predicted spectra at higher frequencies have large errors, it is promising to see that the peak locations match well. From Fig. 5(a) and (b), it is clear that the model prefers to estimate the peak energy around 0.23 Hz, which produces errors in peak periods. Overall, the predicted spectra have less detail compared to the measured spectra.
From Fig. 6(a), it is clear that at around 0.1 Hz the correlations are the highest (the same can be seen in Fig. 5, where the lower frequency levels match in all cases) for all data of different order. Even though the correlations remain quite high for higher frequencies, it is clear that the estimations are done mostly from noisy data (especially for the order 4 data, which have most of the data "generated" by the interpolation). Average bin-to-bin correlations are 0.91, 0.85, 0.82, for orders 1, 2, and 4, respectively. Correlations are consistently higher than 0.90 when integrated H S is used (see Fig. 6(b), while the extraction of the T P values from the spectra shows lower correlations (around 0.50). This is probably due to the higher degree of information comparison in the case of prediction spectra with measured spectra evaluation. Due to a limited dataset, we cannot accurately tell under what wave conditions the model fails since there are bad predictions for every H S range. Possible considerations of different wind directions with respect to SAR overflight and wave propagation directions, as well as surface current fields, have not revealed any strong evidence for conditions when the model fails. Increasing the dataset to have homogeneous H S with respect to the wave propagation direction could help to describe the model behavior better.

VI. CONCLUSION
Main result of the present investigation demonstrates the applicability of LSTM-type deep neural network models to transform 1-D SAR image spectra into variance density spectra in coastal regions, where wind waves dominate. Despite the high values of the metrics that describe the goodness of the model, there is still room for improvement, which may be achieved by exploring different techniques for data preparation or tuning the model structure. Another option is to explore the applicability of other deep neural network architectures oriented to sequential data processing.

AUTHOR CONTRIBUTION STATEMENT
Martin Simon: Deep learning computational experiments, visualization, and original draft writing. Sander Rikka: Methodology of data preparation, model validation, writing, review, and editing. Sven Nõmm: Methodology of computational experiments, validation, writing, review, and editing. Initial data analysis and preparation by Victor Alari.

DECLARATION OF COMPETING INTEREST
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the results reported in this document.
Martin Simon is currently a Ph.D. student with the Tallinn University of Technology, Tallinn Estonia. He holds a Junior Research Scientist Position with the Department of Software Science, Tallinn University of Technology (TalTech). His academic interests are in the areas of remote sensing, disruptive sensors and applications of AI for earth observation. Sander Rikka received the Ph.D degree in earth sciences (in oceanography) from the Tallinn University of Technology, Tallinn, Estonia, in 2019.
He is currently a Senior Research Scientist position with the Department of Marine Systems, Tallinn University of Technology (Taltech). His research interests lay in AI applications and method developments in geoscience and remote sensing.
Sven Nõmm received the Ph.D. degree in automatics and applied informatics jointly from the Tallinn University of Technology, Tallinn, Estonia and École centrale de Nantes et L'Université de Nantes, France, in 2004.
He currently holds Senior Research Scientist Position with the Department of Software Science, Tallinn University of Technology (TalTech). He has authored or coauthored more than 100 articles in scientific. His research interests include human-machine interaction, analysis of human motions, applications of AI to the problems of cybersecurity and geoscience.
Victor Alari received the Ph.D. degree in earth sciences (in oceanography) from the Tallinn University of Technology, Tallinn.
He is currently a Physical Oceanographer, focusing on the measurements and modelling of wind waves. He has lead the development of a new wave measuring buoy LainePoiss (www.lainepoiss.eu).