Computer Prediction of Seawater Sensor Parameters in the Central Arctic Region Based on Hybrid Machine Learning Algorithms

In recent years, with the large-scale reduction of Arctic sea ice, the supplement of chlorophyll sensor data in seawater has become an essential part of environmental assessment. Accurately predicting the chlorophyll sensor data in seawater is of great significance to protect the Arctic marine ecological environment. A machine learning prediction method combined with wavelet transform is proposed. This process uses data from upper ocean observation buoys placed in the Arctic Ocean (A.O.) to predict the sensor analogue of chlorophyll-a (C.A.) in the upper ocean of the Arctic Ocean. Choose the best wavelet transform method and prevent the LSTM gradient from disappearing. A model combining SAE (stacked autoencoder) Bi (bidirectional) LSTM (long short-term memory) and wavelet transform is proposed. Experiments were conducted to compare the predictive performance of buoy data input as univariate at two different times and locations in the Arctic Ocean. The results show that compared with other models (such as LSTM), in the SAE Bi LSTM model, the data of the two sites have the highest prediction accuracy. The best wavelet transform methods are fourth-order four-layer and first-order four-layer. The observational data of the Chukchi Sea from 2018 to 2019 obtained the best prediction results. The root mean square error (RMSE) value is 0.02003 volts; the average absolute error (MAE) is 0.0841 volts. This research provides a new method for predicting the chlorophyll sensor parameters in the upper ocean through the sea ice buoy observed at a given point, which helps to improve the accuracy of the ocean sensor parameter prediction on the Arctic ice buoy.


I. INTRODUCTION
The Arctic is one of the earth's cold sources, and its shape changes more obviously during the seasonal changes. Besides, as the A.O. is increasingly altered by anthropogenic climate change, it is critical that we accurately assess ongoing The associate editor coordinating the review of this manuscript and approving it for publication was Haiyong Zheng . changes in its capacity to support marine life [1] The past decade has seen substantial advances in understanding Arctic amplification -that trends and variability in surface air temperature tend to be more significant in the Arctic region than for the northern hemisphere or globe as a whole [2] Other environmental parameters also play an essential role in the process of Arctic changes, such as C.A. and dissolved oxygen in seawater [3]. Besides, monitoring ocean C.A. content also VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/  provides a tool to achieve a deeper understanding of the contribution of CO 2 to the climate [4], [5].
In the observation of ice buoys in the A.O., it is challenging to obtain interannual data from large ice buoys (especially ice buoy data including ocean profile observations) due to equipment damage caused by sea ice compression or polar bear destruction. Besides, the environmental parameters of the Arctic region is volatile, and accurate prediction of the corresponding ecological parameters is an urgent problem in the field of international ocean research. The models and methods introduced by relevant experts in atmospheric sciences, marine sciences and other research fields in the observation of parameters that fluctuate with time are mainly divided into two categories: (1) Conventional prediction model; (2) Machine learning prediction model. The machine learning model also includes three data input and output forms: single input single output, multiple input single output and numerous inputs various outputs.
It can be seen from Table 1 that standard conventional prediction models are applied in the ocean, atmosphere, and Arctic sea ice. Table 2 describes the application of machine learning prediction models, which belong to the same category as this research. At present, the deep learning algorithm for the internal parameters of the data-driven ocean optical measurement sensor has not been applied in the Arctic Ocean.
Compared with the traditional prediction model, the prediction accuracy of the machine learning prediction model has been significantly improved, and it has been applied in applications such as stock price prediction, electricity price prediction, and weather forecast. The rise of artificial intelligence provides powerful methods and theoretical support for other disciplines. For the time series collected by sensors (the data acquisition method used in this article), there is a problem of local extrema when using neural network prediction. Therefore, the Fourier transform in the full-time domain and the wavelet transform suitable for the short-time environment have become the primary solutions to this problem. In the study predicted Arctic seawater sensors parameters, few researchers used machine learning method based on wavelet transform. For the forecast time series, long and short term memory networks to some extent, solve the problem of recurrent neural network gradient disappears. The research work of this article mainly achieves the following two goals: (1) The self-stacking encoder is combined with neural network algorithms such as long and short-term memory to solve the problem of vanishing gradient. (2) The selection of wavelet basis functions and the choice of basis function orders and decomposition levels. The optimal basis function and the optimal number of decomposition layers and orders improve the prediction accuracy of the neural network.

II. STUDY AREA AND DATA
A. RESEARCH AREA OVERVIEW The A.O. is the northernmost ocean on the earth, and changes in sea ice have an essential impact on the earth's climate. At 66 • 30' north latitude of the planet. The Arctic region refers to the area within the Arctic Circle near the North Pole. The A.O. is a vast frozen ocean surrounded by numerous islands and the coastal regions in North America and northern Asia. The climate in the Arctic is cold all year round. Under the rapid changes in the Arctic, the factors affecting the nutrient budget of the Northwest Ice Ocean water mainly include the transportation of the Pacific Northward Current, freshwater input, vertical mixing, sea ice melting, photosynthetic biological utilization, etc. Among them, the central role of photosynthesis is the C.A. in seawater. The experimental data collection platform of this research is the sensor mounted on the B-type buoy in the Arctic sea ice and natural gas unmanned station buoy system. The sensor was placed in the northern part of Chukotka in August 2018 (China's ninth Arctic scientific expedition), and in November 2019 with the International Sea Ice Drift Program (https://mosaicexpedition.org) placed in the central Arctic area.
As shown in Figure 1(a). The 2018 Chinese expedition team set up the site. The unmanned ice station system consists of four parts. The data we obtained is provided by the upper ocean sensors of the Sub-B ice station. The sensor is located in two layers in seawater, the first layer is 5m deep, and the second is 20m deep. As shown in Figure 1(b), the layout of the unmanned ice station of the MOSAIC International Drifting Project consists of two parts. The data used in this article comes from the upper ocean sensor of the Sub-B ice station, and the horizon is the same as before.
The ice station is based on a whole piece of floating ice. Due to the transpolar current, the floating ice drifted. As shown in Figure 2. The photo of the buoy was taken by an Iridium camera installed on site. It was melting of sea ice drift and bottom, resulting in a slightly offset position of the camera.

B. DATA
The data source used in this study is the data obtained by the upper ocean sensor of the sub-B buoy of the Arctic Sea-Ice-Air Unmanned Monitoring System. The sensor used is a fluorometer sensor of TURNER company. Its main measurement principle is: C.A. naturally absorbs blue light and emits red light. Fluorometers detect C.A. by transmitting a beam of light (440nm) in an amount proportional to the concentration of phytoplankton. The fluorometer then catches the light emitted by the excited phytoplankton particles (680nm). The data we use are all sensor state voltage analogue quantities. If you need to convert to relevant ocean parameters, you need  to contact the calibration department and the measurement department.
The data interval of each adjacent time point is 1 hour. The total time of the ninth Arctic scientific expedition buoy from deployment to the end of the ice age (August 18, 2018, to June 30, 2019) is 7662 hours. The 7662 sets of data collected include such as C.A., conductivity, temperature, and VOLUME 8, 2020 depth, Dissolved oxygen, air saturation, geographic coordinates and voltage and other parameters. The total time of the period of position 2 (November 6, 2019, to June 30, 2020) is 5820 hours. The data period used by this model is (June 30, 2019, to October 1, 2020), and the total time is 2007 hours. In this article, we only study the sensor voltage and other parameters related to the algorithm. Figure 3   training set and the test set is the same as the 2018-2019 data set division ratio and will not be repeated.
The upper-ocean data processing and analysis were normalized before. The features of a single sample are subtracted from the average of all training samples (same features) and then divided by the variance of all training samples. The characteristics of a single piece are subtracted from the average of all training samples (same features) and then divided by the variance of all training samples. Thus, for each segment, all data are clustered around 0 with a variance of 1. The specific calculations are as follows: where x is the training sample and the test set single sample feature value, µ is the average of the training sample data, σ is the standard deviation of the training sample data, and X is the normalized feature value [18].
One of the goals of machine learning is to standardize parameters while minimizing prediction errors. Minimizing the error is to make the model fit the training data, and the regularization parameter is to prevent the model from overfitting the training data. Because there are too many parameters, the model complexity will increase, and it will be easy to overfit, that is, the training error will be small. But small training error is not the ultimate goal. The goal is to hope that the test error of the model is small; that is, it can accurately predict new samples. Therefore, we need to ensure that the model is ''simple'' and minimize the training error, so that the parameters obtained have good generalization performance (that is, the test error is also small). The model ''simple'' is achieved through the rule function.
To prevent overfitting in the relevant predictions of the neural network model, this article adopts the L2 regularization method. The common term is Formula 2, which can limit the complexity of the training model.
where J 0 is the original loss function, the latter term L 2 is the regularization term, and α is the regularization coefficient.

III. METHODOLOGY
This research is based on two experiments. The two subject experiments use wavelet transform, BPNN (backpropagation neural network) [19], ELM(extreme learning machine) [20], SAE LSTM [19], Bi LSTM [21],LSTM [20] and other basic neural network models. And on this basis, some model variants such as SAE ELM and SAE Bi LSTM have been made. Furthermore, statistical evaluation indexes and forecasting framework are given in detail.

A. WAVELET TRANSFORM
The development of W.T. (Wavelet Transform)is inspired by Fourier transform [22], which successfully overcomes the fact that the window size does not change with frequency.
There are many kinds of wavelet functions. The common ones are Mexican hat wavelet, Meyer wavelet, Morlet wavelet, Daubechies Wavelet, Gaussian wavelet, conflicts wavelet, etc. Because this article uses the autoencoder stacking method for prediction, and refer to Altunkayna et al. in water level predictions at northern. After the work of the southern boundary of Bosphorus, it was found that the discrete wavelet transform (DWT) is more appropriate for the natural water time-series parameters [23].
Multi-resolution analysis (Muti-Resolution Analysis), also called multi-scale analysis, was first proposed by S. Mallat in 1989. The function of this algorithm in wavelet analysis is equivalent to that of fast Fourier transform in Fourier analysis. The Mallat algorithm uses wavelet filters H 1 , L 1 , H 2 , and L 2 to decompose and reconstruct the measured signal. Its decomposition algorithm can be expressed as: Among them, L 1 and H 1 are wavelet decomposition filters in the time domain, which represent low-pass and high-pass filters, respectively; AC i and DC i are the approximate part (low-frequency part) and detail component (high-frequency part) after layer decomposition. The specific reconstruction expression is as follows.
Among them, L 2 and H 2 are the inverse operations of L 1 and H 1 . The process based on the wavelet transform is shown in Figure 4. In Figure 4, the symbol OS represents the original time series.

B. NEURAL NETWORK MODEL 1) STACKEN AUTOENCODER
Autoencoder is an unsupervised neural network model, and it can learn the hidden features of the input data, which is called coding (coding) formula while using the learned new features to reconstruct the original input data, called This is the decoding formula. AE (Autoencoder) has only one hidden layer. More specifically, the input layer and output layer of A.E. are equal.
The sigmoid function is used as s f 1 . Similarly, the sigmoid function can be used as s f 2 . In the formula, Where h is the connection vector between x 1 and x 2 ; b 1 and b 2 are the deviation vectors.
SAE is the superposition of multiple AEs. After the first AE is executed, subsequent AEs are performed in order until the Nth, and the output result is the SAE superimposition result. Equation (7) is the parameter that each AE propagates to the next layer. The program flow chart is as follows:

2) LONG SHORT-TERM MEMORY
The full name of LSTM is Long Short-Term Memory, which is a kind of RNN (Recurrent Neural Network). Due to its design characteristics, LSTM is very suitable for modelling time-series data, such as daily air temperature, humidity, air pressure, seawater salinity and other data obtained by text buoys. Recently, a new neural network was proposed, called LSTM [20]. The three core arithmetic structures contained in LSTM determine that it can achieve long and short-term memory based on RNN The algorithm structure is shown in Figure 5. The forgetting door is the process of choosing to forget, and the representation of the forgetting door is as follows: Among them, f t is the result obtained by the forget gate, and W f is the vector that determines the input weight; b f is the bias vector in the forget gate; h t−1 is the hidden layer state at the last moment; the current input is x i ; σ is the activation function. Among them, W * f [h t−1 , x i ]is calculated as follows: The input gate is the process of selecting the information to be memorized. The representation of the input gate is as follows: The output result at the last moment is h t−1 . I t is the value of the input gate, c t is the activation state in the cell unit, the cell state at the last moment is c t−1 , W i is the weight in the input gate; W c is the weight in the forget gate. b i is the bias vector of the input gate; b c is the bias vector of the forget gate.
The representation of the output gate is as follows: where O t is the vector in the output gate, h t is the result of the output gate, and W o is the weight in the output gate. b o is the offset vector of the output gate.

3) Bi-DIRECTIONAL LONG SHORT-TERM MEMORY
LSTM can only predict the output at the next time based on the timing information of the last time. Still, in some problems, the production at the current time is not only related to the previous state but may also be associated with the future state [19]. Generally speaking, the information in the LSTM network is a one-way transmission, and LSTM can only use the past information but not the future information.
Bi LSTM can consider both past and future data information. The principle of LSTM connecting two networks is the same. The forward LSTM can obtain the past data information of the input sequence, and the backward LSTM can get the future data information of the input sequence [24], [25]. Can be expressed by the following formula: The algorithm structure is shown in Figure 6: The hidden state H t of Bi LSTM at time t includes forward h tf and backward h tb ; W 1 , W 2 , W 3 and W 4 are respectively the corresponding weight coefficients; x t is the input at time t; ht is the hidden state at time t.

4) THE COMBINATION PROCESS OF SAE AND Bi LSTM
To achieve the fusion of the two algorithms, the process of data transmission is fusion. Partially supervised fine-tuning of the network, introducing the evaluation index E o , and fine-tuning the weights through the backpropagation method; Specifically, in this process, SAE performed unsupervised learning and supervised fine-tuning. During training, the input data is mapped to the hidden layer through the first layer AE based on formulas (5), (6) and (7). Subsequently, the AE is superimposed, and the entire network is trained until the last AE. The fine-tuning of the entire algorithm by formula (13) is performed by using backpropagation, the purpose of which is to obtain improved weights.
In the formula, N represents the number of samples; A i is the actual value, and F i is the predicted value. Train the Bi LSTM network according to the output of SAE; the trained Bi LSTM network makes predictions for the training group, test group and prediction group. After passing the comparison of the evaluation criteria, the result is obtained.

5) IMPLEMENTATION FRAMEWORK AND EVALUATION INDICATORS
The algorithm flow during the experiment is shown in Figure 7. The main experiment flow is divided into the following stages: Step 1: After normalization, C.A. time series is divided into training samples, verification samples and test samples; Step 2: Configure computer, build model, select SAE, Bi LSTM parameters; Step 3: After normalizing the time series, the multi-layer and multi-order wavelet function is decomposed and input into the neural network model; Step 4: Train SAE network; Step 5: The trained SAE network is used to predict the training samples, and the prediction results are used as the the input of Bi LSTM; Step 6: After the prediction of the Bi LSTM neural network is completed, the result is denormalized, and finally, the prediction result of the mixed model is obtained.
To comprehensively evaluate the characteristics of different forecasting models, RMSE and MAE are selected as the evaluation criteria. The final result was assessed with R 2 evaluation index. The definition of RMSE is shown in formula (13), MAE is defined in formula (14), U 1 is defined in formula (15), U 2 is defined in formula (16), and R 2 is explained in the recipe (17).
where n is the number of training, test and prediction samples; F t is the prediction value; A t is the actual value. U 1 and U 2 are used as auxiliary judgment conditions. The evaluation standard was proposed by Theil in 1967 [26]. U 1 is a comprehensive parameter with RMSE as the numerator of the fraction and the sum of the predicted value RMS (root mean square) and the RMS of the real value as the denominator. U 2 is a comprehensive parameter, RMSE is the numerator of the fraction, and the RMS of the actual value is the denominator of the fraction. In the model training process, preference is given to the model with the smallest value of four decimal places of RMSE. The smaller the RMSE, the better the model prediction (to exclude the occurrence of overfitting). If the RMSEs are equal, the MAE, U 1 and U 2 discriminations will be performed sequentially. If the above conditions cannot be discriminated, the decision factor R 2 will be compared.

A. EXPERIMENTAL DESIGN AND PARAMETER SETTINGS
In this study, to guarantee fairness, all experiments, including Experiment I and Experiment II, are executed on the identical computer configuration. What is more, Table 3 shows the computer configuration. After completing the environment configuration and introducing the neural network algorithm, the experiment is divided into two stages to solve the related research problems (solving the problem of gradient disappearance; determining the best wavelet basis function and its decomposition and reconstruction method).
1. Experiment I: The five neural networks, BPNN, ELM, LSTM, SAE LSTM, and SAE Bi LSTM., are horizontally compared, and the algorithm with the lowest prediction error is selected. Try to combine long-term and short-term memory with the autoencoder model to solve the common problem of disappearing gradients in time series prediction.
2. Experiment II: To process discrete signals, the author proposed a discrete wavelet transform method. Through wavelet decomposition, highlight the characteristics of the signal, and train the neural network [20]. A total of 2 sets of 80 experiments were conducted. Finally, the optimal wavelet basis function and the optimal decomposition and reconstruction layer and order are obtained.
In addition to the control hardware platform and software platform unchanged during the experiment, the parameters of the model code should also be considered. The main parameters of the model are the key factors that determine the performance of the model [27], [28]. In commonly used time series forecasting models, the amount of data is often divided into the training set, validation set and test set according to 65%, 15%, and 20% [28].
Since this experiment has multiple horizontal comparisons (that is, the single variable of the experiment is a neural network type), no parameter adjustment will be performed during the investigation. In the absence of over-fitting, the data was divided into 80% and 20% as the training set and test set in Experiment I. The length of the sliding time window is 20. The specific parameter settings of the model are shown in Table 4     The evaluation indicators for the training set and the test set are shown in Table 5. Figure 9 shows the values of the training and test sets for the five training models under different statistical error criteria. In Table 5, the bold black text is the group with the smallest prediction error in the training set, and the bold red text is the group data with the smallest prediction error in the test set. It can be seen that in the 9th CHINARE voyage data, in the training set, the error of the BPNN network is the smallest, MAE = 0.06109volt, RMSE = 0.2135volt, U1 = 0.01965, U2 = 0.03931. The SAE Bi LSTM prediction error index in the test set is MAE = 0.0261volt, RMSE = 0.1322volt, U1 = 0.02029, U2 = 0.04037. In the MOSAIC voyage data, in the training set, the LSTM network has the smallest error, MAE = 0.2516volt, RMSE = 0.4089volt, U1 = 0.1018, U2 = 0.2020. The SAE Bi LSTM prediction error index in the test set is MAE = 0.2434volt, RMSE = 0.3901volt, U1 = 0.3157, U2 = 0.5954. Figure 9 compares the test set, and it is obvious that SAE Bi LSTM has excellent performance under various evaluation standards.

C. EXPERIMENT II: BUILD A NEURAL NETWORK MODEL WITH WAVELET TRANSFORM
In Experiment 2, the selection of the optimal wavelet basis function and the optimal number of wavelet decomposition reconstruction layers and orders is given [29]. The basis functions used in the experiment are all discrete wavelet transform bases [30], [31]. Experiment 2 can be divided into two groups.
The experimental data is displayed in tabular form. Due to a large number of experimental groups, we chose to compare their prediction error indicators as the basis for judgment. The discriminant method is the same as when selecting the model in Experiment I.
All experiments are trained under the same software on the same computer. The CPU resources are the same. The purpose of the investigation through four sets of data is to VOLUME 8, 2020    test the generalization ability of the model. Make this model suitable for analysis of similar situations in this field.
Among them, the meaning of the blue data is the prediction error result corresponding to the basis function with the smallest prediction error when the number of decomposition layers is the same. The meaning of the red data is the group of experiments with the smallest prediction error among 64 groups of experiments when the input data are consistent.
As shown in Table 6, the following comparison can be given: First, compare the RMSE. If the four digits after the decimal point are equal, then compare the MAE. If they are equal, compare U1 and U2 in turn. The evaluation indicators of one layer db5, two layers db1, three layers db2, four layers db4 and five layers db7 in the table are smaller than other wavelet basis functions under the same layer decomposition. After comparing the evaluation criteria of the same layer, the optimal wavelet decomposition method under the input sequence is selected. Among the five decomposition methods of one layer db5, two layers db1, three layers db2 four layers db4 and five layers db7, four layers db4 is the wavelet decomposition method with the smallest prediction error. Combine the data in experiment I Table 5, In the SAE Bi LSTM prediction results, the error evaluation RMSE = 0.0261volt, MAE = 0.1322 volt, U1 = 0.02029, U2 = 0.04037 is higher than the prediction results in Experiment II (1) by 0.00389 volt, 0.0486947 volt, 0.002791, 0.00519, respectively.
As shown in Table 6, the following comparison can be given: First, compare the RMSE. If the four digits after the decimal point are equal, then compare the MAE. If they are equal, compare U1 and U2 in turn. The evaluation indicators of one layer db5, two layers db1, three layers db2, four layers db4 and five layers db7 in the table are smaller than other wavelet basis functions under the same layer decomposition(Except that one layer db4 and one layer db5 are inconsistent with other evaluation standards in the comparison of U 2 parameters, and two layers db1 and two layers db2 are inconsistent with other evaluation standards in the comparison of U 1 parameters. They are mainly based on RMSE evaluation standards). After comparing the evaluation criteria of the same layer, the optimal wavelet decomposition method under the input sequence is selected. Among the five decomposition methods of one layer db5, two layers db1, three layers db2 four layers db4 and five layers db7, four layers db4 is the wavelet decomposition method with the smallest prediction error. Combine the data in experiment I Table 5, In the SAE Bi LSTM prediction results, the error evaluation RMSE = 0.0261, MAE = 0.1322, U1 = 0.02029, U2 = 0.04037 is higher than the prediction results in Experiment II (1) by 0.006066, 0.048103, 0.0046193, 0.00611104, respectively. Table 8 and Table 9 are the data collected by the buoys deployed by MOSAIC voyage, and the prediction error recorded in Table 8 is the data of the wavelet transform contrast experiment when the C.A._1 (MOSAIC) data is used as input, that is, experiment II (3). The prediction error recorded in Table 9 is the data of the wavelet transform contrast experiment when C.A._2 (MOSAIC) data is used as input; that is, experiment II (4). After comparing the C.A. data of the MOSAIC segment, the best prediction accuracy is obtained after decomposing and reconstructing in a db4 mode.
The analysis process is the same as that in Table 6 and  Table 7, and will not be repeated here.

D. DISCUSSIONS
The accurate prediction of environmental parameters is of great significance to many observation activities on the A.O.
Climate prediction in the Arctic region can provide strong support for local scientific research activities. Since the parameters for the sensor body are rarely obtained throughout the year, this is of great significance for equipment evaluation. The method of combining the new machine learning model to predict the parameters of the chlorophyll sensor body is also proposed for the first time in this field, providing a new way for the establishment of climate models. In this article, some advanced prediction models (such as Bi LSTM., SAE LSTM, etc.) are introduced into the experiment process, and the best prediction model is obtained by establishing evaluation indicators (such as RMSE).
We have the following discussion: (1) In Experiment I, we found that among the data predictions of five neural network models, the data collected by the buoys placed by the 9th CHINARE and MOSAIC have the best prediction performance in the SAE Bi LSTM model. But the running time of the five models has not been compared. Experiment I only screened the model with the smallest prediction error. The discussion part will compare the algorithms of Experiment I, and explain the time complexity. The discussion part will compare the training time of each algorithm in Experiment I to reflect the complexity of each model. In different time series of sensor simulations, the prediction accuracy of different voyages is very different due to the differences in the distribution of regional seawater components. Introduce statistical test standard parameters and determine the coefficient (R-squared) to evaluate the final result. When the R-square coefficient is more significant than 0.85, it meets the requirements, and the closer to 1, the better the prediction effect. Figure 10 is a statistical test of model prediction results. The coefficient of determination (R-squared) and the slope of the prediction scatter plot fitting function are commonly used parameters to determine the prediction results. The closer the two parameters are to 1, the better the prediction result. According to this observation, Figure 10(a)-(j) can be obtained: (1) Based on the reason why the 9th CHINARE voyage data is complete (that is, the period includes the two processes of sea ice freezing and melting). The R-squared in Fig. 10(a)-(e) is closer to 1 than Fig. 10(f)-(j). And the slope of the fitted straight line is closer to 1. It shows that the more complete the data and the larger the input data volume, the better the prediction effect. (2) During the horizontal comparison of each voyage, the determination coefficient R 2 of the SAE Bi LSTM model is closer to 1 than the other four models. The slope of the fitting function is also closer to 1. It shows that the prediction error of SAE Bi LSTM is lower than the other four models.  look at the complexity of a single iteration, the complexity of the forward/backward propagation of the neural network is proportional to the parameter quantity, that is, when the Big O analysis method is used, it is O(N), N Is the input data size. Usually, this kind of efficiency analysis method is to publish the code operating environment and the corresponding hardware, combined with the actual running time of each model algorithm, as the criterion. When using the SAE Bi LSTM model, the running time is longer than other models. But the prediction accuracy of LSTM has been significantly improved. Therefore, the improvement of algorithm prediction accuracy inevitably leads to an increase in complexity and training time.
Besides, this research also needs to consider the data size suitable for the algorithm. Since the time interval of the analyzed object is the standard collection frequency (once an hour), the data volume level will not be much different. If a broader data set is used, the performance of the neural network model may show higher computational redundancy and higher operating speed. The analogue voltage value of the chlorophyll sensor has overall differences in different areas. Therefore, the experience value obtained in the training set is only applicable to the data points of the training set. The main reasons for not using a large-scale and large-scale database for training are 1. There are insufficient samples in the data set. 2. The running time of the model is too long, which will lead to phenomena such as overfitting. Among the existing methods, remote sensing is an effective method to predict the concentration of chlorophyll in seawater. This study cannot achieve the same breadth, but in other words, it can be used as a calibration point for providing data services. Thus, the prediction accuracy is critical observations of the ocean buoy sensor data.

V. CONCLUSION
The purpose of this research is to obtain the best sensor parameter prediction model through neural network and wavelet transform model. Draw the following conclusions through the experimental process and experimental results in Section IV: (1) In the algorithm training experiment, based on different error indicators, the prediction accuracy of SAE Bi LSTM is higher than the other four neural networks. SAE combined with the machine learning model solves the problem of the disappearance of the gradient in the time series prediction process.
(2) When predicting the interannual chlorophyll sensor parameters at different depths in the same area, the discrete wavelet transform method is used. The optimal number of wavelet decomposition and reconstruction layers and the order of wavelet bases in the five-layer decomposition are obtained. In this way, the machine learning model can obtain the characteristics of the discrete-time series to obtain more accurate prediction results.
The introduction of artificial intelligence algorithms for ocean sensors can provide more solutions in this field. In future research, if more sample parameters (such as wind speed and air pressure) are introduced to form a multiple-input single-output neural network, more accurate predictions and evaluations can be obtained. He is currently a Senior Engineer of polar geophysical observation and research on cryosphere changes with the Polar Research Institute of China (PRIC). His research interest includes is to apply a variety of geophysical techniques to observe the characteristics of the polar cryosphere, including polar oceans, sea ice, ice sheets, ice shelves, and subglacial geological structures. VOLUME 8, 2020 ZHE YANG was born in Linfen, Shanxi, China, in 1995. He received the B.S. degree in computer science from the Taiyuan University of Technology (TUT), Shanxi, in 2017. He is currently pursuing the M.Eng. degree with the School of Computer Science, The University of Manchester, Manchester, U.K.
He is currently working on swarm intelligence algorithm and machine learning. She is currently an Associate Professor. He studied with the School of Electrical and Power Engineering, Taiyuan University of Technology, in 2019, and engaged in scientific research and work during the period. He committed to the processing and simulation of high-frequency signals of ice detection radars in the north and north poles, and the hardware design of FMCW ice detection radars.

RUINA SUN
YANZHAO HAO was born in Renqiu, Hebei, in 1995. He received the B.S. degree in computer science from the Taiyuan University of Technology (TUT), Shanxi, China, in 2018, where he is currently pursuing the master's degree in electrical engineering.
JIANLONG LIU was born in Harbin, Heilongjiang, China, in 1996. He received the Bachelor of Science degree in electrical engineering from Northeast Agricultural University (NEAU), Harbin, in 2018. He is currently pursuing the master's degree in electrical engineering with the Taiyuan University of Technology (TUT).