Wavelet-Based ResNet: A Deep-Learning Model for Prediction of Significant Wave Height

Predicting significant wave height (SWH) is significant for coastal energy evaluation and utilization, port construction, and shipping planning. It has been reported that SWH is difficult to forecast for the complex marine conditions and chaos in nature. Current methods either require reliable prior information or reach the upper limit of prediction accuracy. To this end, this paper proposes a wavelet-based residual network to predict SWH with high accuracy. First, the time-series data of wave-related factors collected by the ocean buoy station is decomposed using the wavelet transformation. Then, the transformation results are used as the inputs to train the residual neural network. Finally, the data obtained from the NOAA’s National Data Buoy Center is used to prove the outperformed prediction accuracy of the proposed method. The analysis results suggested that wavelet transformation can improve the prediction performance of the neural network, and the proposed model achieves better performance compared with several other deep neural network schemes.


I. INTRODUCTION
Marine meteorological forecasting, especially wave elements forecasting, significantly impacts human marine activity. Accurate estimation of wave parameters not only serves as a major reference for coastal energy evaluation and utilization [1], [2] but also provides essential guidelines for port construction and shipping planning [3]. The significant parameters for wave characterization include significant wave height (SWH), wave period, and wave speed. Among them, SWH is one of the most relevant and necessary parameters to evaluate the wave energy source and ocean meteorological condition, and thus, the estimation of SWH is a core question of wave characterization [4].
The measurements of SWH usually consist of using remote sensing methods [5], [6] or using wave-buoy-type in-situ sensors [7]. The remote sensing methods often require additional data reprocessing and have a significant deviation, which still needs to be corrected using the buoy station data. Currently, the prediction of SWH mainly relies on the wave buoy observations data, which provide time series of wave The associate editor coordinating the review of this manuscript and approving it for publication was Zhipeng Cai . state information in fixed points [8]. Thus, it is necessary to use large and reliable buoy datasets to estimate SWH accurately. Currently, the largest marine meteorological database is the National Oceanic and Atmospheric Administration (NOAA), which consists of a great deal of real-time and longterm marine buoys observation data, and are the significant sets used for the estimation and evaluation of ocean wave SWH [9].
The SWH estimation problem has been considered by using various methods, including numerical wave models [10], [11], time series models [12], and artificial neural network (ANN) [13]. The numerical models, like WAVEWATCH III of wide ocean, Simulating Waves Nearshore (SWAN) of offshore, are based on the approximation and simulation of wave-spectrum. Through numerical calculation, SWH and other parameters can be derived from the wave action equation to describe future wave states [14]. This approach can achieve comparatively ideal results given sufficient external prior knowledge and has been the main approach for wave forecasting over a long period in the past. Rogers [15] used SWAN to predict the wave SWH trend near the Southern California Bridge and compared the results with measured data to investigate the effect of island topographical resolution on prediction. Zheng [16] applied WAVE-WATCH III framework to simulate the seasonal and long-term trends of SWH in the South China Sea and further analyzed the statistical data distribution of SWH in different time and areas. The shortcoming of numerical models is that they require reliable prior information such as geography and meteorology of the predicted location. Thus, slight deviations of prior information will be gradually amplified and greatly impact the accuracy of the final prediction. Therefore, datadriven methods like time series models and machine learning algorithms have recently acquired great attention for SWH estimation to improve prediction performance by digging into the inherent characteristics of historical data and reducing the reliance on prior knowledge. Experimental results have revealed a better performance of machine learning methods over the statistical predictive model in SWH prediction [17].
In the classical time series and machine learning approaches for SWH, empirical mode decomposition (EMD) and wavelet transform (WT) have been used to extract features from wave time-series data [18], [19], [20] and the prediction is consequently conducted through models like autoregressive (AR) moving average, support vector regression (SVR), and symbiotic organism search (SOS) [21]. For example, Salcedo-Sanz [4] applied SVR to SWH prediction using the shadowing effect of radar images and achieved excellent performance. Meanwhile, Duan [22] performed EMD ahead of SVR prediction for feature extraction and reached a higher accuracy on NOAA buoys datasets. However, with the rapidly increase of marine data dimensions, the complex relationships between unstructured or semistructured data limits the upper bound of the prediction accuracy of traditional machine learning analysis approaches. These shortcomings have been gradually addressed by developing neural network based models. With the increasing computing speed of graphics computing devices, ANN has been widely applied for wave forecasting.
EMD and wavelet have also been combined with ANNs [23]. Shahabi [24] proposed a GMDH network to predict the SWH of north Atlantic coast based on the buoys data and achieved better results at the 6-step to 12-step period prediction than time-series and machine learning models. Pushpam [25] used the long-short-term memory (LSTM) network to reconstruct and predict the wave height of Bay of Bengal, which achieved better performance than the traditional forward ANNs and recursive neural networks (RNN). Kaloop [26] proposed a wavelet-particle-swarm optimization extreme learning machine (ELM) to estimate the ocean wave height, and the experiment results on buoys data of the US south-east coast outperformed SOS, LSTM, and SVR. Wang [27] applied deep neural network for the calibration of HY2B SWH by using input from parameters provided by the altimeter and greatly improved the performance of HY2B. The limitations of the existing ANN based models are that RNN and its variants are of high computation cost for its sequential calculation, and are thus lower in efficiency. On the other hand, ELM is limited by the single hidden layer structure and thus runs into the problem of insufficient nonlinear fitting ability. Residual network (ResNet) is a powerful framework that can learn a wide range of complex relationships from data, its efficiency and robustness has been convinced in a variety of applications [28]. The effectiveness of this model suggests a potential application in marine element forecast.
In this research, a wavelet-based residual network is proposed to construct an effective deep-learning method for prediction SWH. First, the time-series data of ocean wave data is wavelet-transformed for feature extraction and noise elimination after min-max normalization. Then, the processed data is used as the input of ResNet for prediction. The output of ResNet is transformed into one-dimensional sequence and then flattened into a one-dimensional vector through a two-layer linear block as the final prediction of ocean SWH.
The rest of the paper is organized as follows: In the second part, the overall framework and detailed settings of the proposed model are illustrated. Performance validation and results analysis are given in the third part. The conclusions are given in the last part. Figure 1 illustrates the overall framework, in which the proposed model takes the buoys data as input and predicts the SWH of a certain time step period.

A. PREPORCESSING
The features of buoy data include SWH, gust wind speed (GS), average wind speed (WS), dominant wave period (DPD), average wave period (APD), and air temperature (AT). The features are preprocessed to keep the network weights and biases staying to avoid extremely large parameters of the trained networks. First, all the features are normalized into the interval [0, 1] by where x denotes the feature measurement and x is the corresponding normalized feature value. It is worth noting that the minimum and maximum values used for normalization are selected from the training set. WT transformation is conducted to the normalized data to further denoise and identify the short-term change and longterm trend. Using the time and frequency characteristics of wavelet functions, WT can adaptively sense the frequency changes of signal in the time metrics. For this reason, WT is suitable for the frequency contents analysis of signals with no redundant components in the time domain [29]. By using WT, the data are decomposed as the following where ψ j,k (t) are the wavelet signals, φ 0,k (t) are the scale signals of wavelet basis function. Figure 2 shows the preprocessing results.

B. ARCHITECTURE OF NETWORK
The major characteristics of a convolution network (CNN) are weight sharing, local connection, and down sampling. Compared with RNN and its variant LSTM, CNN can better utilize the multi-core parallel computing performance of computing devices and hereby has a faster computation speed [30]. Let ω be the convolution kernel, x be the input, b be the bias, then the convolution operation of CNN can be expressed as where * denotes the convolution operator and f (·) denotes the activation function.
The utilization of CNN in time series sequence prediction usually takes two ways: one-dimensional convolution and two-dimensional convolution [31], [32]. In this article, the wavelet decomposition results of different features with the same order are subjected to different channels of the same input group. The one-dimensional sub-sequences of WT are split equally and reshaped into two-dimensional matrix as the input of certain dimension of input channels.
As depicted in Figure 3, the overall framework of the proposed model contains eight forward layers and one residual shortcut connection between the output of layer 1 and layer 5 to avoid accuracy degradation. The first six layers are convolutional and the remaining two are linear. The convolutional layers have 3 × 3 filters and follow two rules: (1) each layer is followed by a zero entries padding of size 1 to keep feature map size unchanged; and (ii) the stride of all filters is set as 1. For residual connection, a 1 × 1 convolution with 80 channels is used for dimension matching. Specifically, the first convolutional layer filters the 8 × 9 × 24 input with 80 kernels, and the output is taken as input of the second layer which is further filtered by 160 kernels. The third convolutional layer has 320 kernels, the fourth has 160, and the fifth has 80. The output of the fifth convolutional layer is filtered with 24 kernels of size 3×3, which keep the sizes of input and output layers. The output of convolutional block is flattened into a one-dimensional vector and is fed to a linear network with two layers (Figure 4), which produces a distribution of the SWH.

C. TRAINING STRATEGY
In the training stage, dropout is used in the last two linear layers to zero the outputs of hidden neurons with a probability of 0.3. The dropped neurons do not participate in forward  pass and back-propagation in the training process, but in test time, all neurons are used but multiply weights by 0.3. The stochastic gradient descent (SGD) method was used to optimize the parameters. The descent of learning rate adopted the following strategy: (i) for each 5 rounds of unreduced training loss, reduce the learning rate by 50%, and (ii) if the loss value of the model on the validation set does not decrease after 17 training consecutive rounds, then stop the training process. The mean squared error (MSE) and the correlation coefficient R are used to evaluate the performance of the deep learning model.

A. DATASET
The proposed model is validated by using a buoy dataset from the NOAA database (https://www.ndbc.noaa.gov) [3]. The buoy station 46087 is located at 48.49 • north latitude and 124.73 • west longitude and the sea depth of this area is 260 meters. The raw buoy data obtain the features SWH, gust wind speed, average wind speed, dominant wave period, average wave period, and air temperature. The buoy data from January 2016 to December 2018 are used for training the network and the data from January 2019 to August 2019 are used for testing. The statistical properties of all features are presented in Table 1.
The buoy data of 72 last hours from time t are used as inputs. SWH values of 0.5 hours, 1.5 hours, 3 hours, 6 hours, and 12 hours ahead are used as the outputs.

B. PREDICTION PERFORMANCE
To evaluate the performance of proposed method on SWH prediction, the model is compared with several standalone models including CNN [3] and LSTM [33] with the same settings for learning rate, optimization algorithm, and batch size. Moreover, two hybrid models Wavelet-CNN and Wavelet-LSTM that combine the series decomposition technique were compared. Among them, CNN adopts the same convolutional and linear block with the proposed model. LSTM with and without wavelet impact are both set as the same two-layer bidirectional RNN, only differing in the parameter tuning process owing to different hidden and cell layers dimensions.
Experimental results of the proposed model and other methods on SWH prediction are given in Table 2. Comparison results suggest that wavelet decomposition of the origin time series sequence can better extract the inherent features of the ocean wave height variation and consequently improve the prediction performance of model in the training stage. The CNN and LSTM models with wavelet generally achieve better performance than those without wavelet impact. VOLUME 10, 2022  The results of multi-step-head predictions suggest that all four models can achieve good performance on short-term prediction, especially for 6 or fewer steps. As shown in Table 2, the values of MSE and R of the four models are all around 0.07 and 0.94. Meanwhile, Figure 5 also shows the good performance at 6-step-ahead prediction. Furthermore, Figure 5 also indicates that LSTM model with or without wavelet-based preprocess achieved better performance at early stage of prediction, while, the performance of both the CNN models are improved at later stage. However, for 12-step-ahead prediction, the wavelet decomposition improves the accuracy of CNN and LSTM, the values of MSE and R reach 0.12, 0.89 and 0.13, 0.88, respectively. Figure 6 shows that the performance of the CNN model without the wavelet-based preprocessing becomes worse at the late stage.   the overall training time of hybrid models is generally longer than that of the single models. Specifically, Wavelet-LSTM has the longest training time, followed by Wavelet-CNN and LSTM, and the shortest of CNN. Moreover, the training time of four models all decreases with the increase of forecasting time. However, it can be seen that the prediction time of four models is basically the same, all about 0.4 s, which suggests that the efficiency of the four models is roughly the same in the actual prediction scenario.

IV. CONCLUSION
In this paper, a wavelet-based ResNet model is proposed for the prediction of SWH. This proposed model consists of two parts, in which the wavelet decomposition extracts the learnable features from buoy data, and the ResNet provides SWH prediction in terms of the features decomposed by WT.
Overfitting is often an inevitable problem when training model for complex tasks like time series prediction due to the overly complicated model structure, especially when there is limited amout of training data. Several common techniques like data augmentation, regularization, dropout and early stopping can be adopted in order to avoid this situation [34], [35]. In this study, a combination of these training techniques is adopted to avoid model overfitting. Specifically, the datasets used for model training and testing are large enough, thereby ensuring a better coverage of the real data distribution. In the training stage, dropout and adaptive learning rate techniques are adopted to reducing the complexity of model parameters. As a result, the proposed model achieves better performance than other approaches in the testing stage and has similar prediction accuracy as that in the training stages, indicating the model is not overfitting.
To validate the performance of the proposed model, comparative investigations among the LSTM, Wavelet-LSTM, Convolutional ResNet and Wavelet-ResNet models have been conducted. The experimental results indicate that the wavelet transformation can improve the prediction performance of the neural network. Through the wavelet basis function's time and frequency domain characteristics, the wavelet transformation can adaptively decompose the signals into trends with different scales and achieve effective features for the further deep-learning issue. Meanwhile, the wavelet-base ResNet can achieve better performance on longterm prediction. In the experiments of 24-step ahead estimation, the proposed model showed relatively slighter time delay and higher accuracy on the prediction of high wave heights, indicating that better performance was achieved. Nevertheless, the model still shows some limitations in longterm prediction, especially in the prediction of extreme SWH values, there is still a certain error gap between the prediction results of model and the measured data. Further work is needed in order to address this problem.
In summary, this paper verifies the practicability of wavelet transform and residual convolution neural network in SWH prediction. This not only provide a reference for the selection of signal decomposition methods in time series prediction like SWH forecasting, but also contribute a new method to SWH prediction. For the follow-up study, other similar time series decomposition or transformation methods and other neural network structures can be considered on the basis of this study, and their optimal combination can be explored to further improve the accuracy of model prediction results.

ACKNOWLEDGMENT
(Xiangjun Yu and Yarong Liu contributed equally to this work.)