Soil Temperature Prediction Using Convolutional Neural Network Based on Ensemble Empirical Mode Decomposition

Soil temperature plays an important role in agriculture, industry and other fields. Accurate soil temperature prediction can help improve productivity and avoid risks in many fields. At present, many machine learning methods have been applied to soil temperature prediction such as support vector regression (SVR), artificial neural network (ANN), long short-term memory neural network (LSTM) and others. In this article, we propose a machine learning model called convolutional neural network based on ensemble empirical mode decomposition (EEMD-CNN) to predict soil temperature. In this model, ensemble empirical mode decomposition (EEMD) is used to decompose original soil temperature series into several intrinsic mode functions (IMFs). After decomposition, the original series are combined with IMFs to get new two-dimension input data as the input of the convolutional neural network (CNN). By comparing the results which is predicted by the trained model with the original soil temperature series and other four models of persistence forecast (PF), backpropagation neural network (BPNN), LSTM and EEMD-LSTM these, the result shows that EEMD-CNN has the better performance than other four models. EEMD-CNN shows good performance not only on predicting next day’s soil temperature but also on predicting several days delay’s temperature also has good performance. It is concluded that the proposed EEMD-CNN model in this study is a suitable tool for soil temperature prediction.


I. INTRODUCTION
Soil temperature acts as important roles in physical, chemical, and biological processes. It is relevant to soil science, agriculture, hydrology, meteorology, environmental science and many other research fields [1], [2]. Soil temperature affects the balance of heat energy between the atmosphere and the land surface [3]. Also, it changes several pivotal processes in soil such as soil ventilation, evaporation and transpiration, root development and plant growth, and the activity of microorganisms. Soil temperature performs an essential role in agriculture, because seeds germinate at the right temperature and in a certain temperature range. Thus, with higher The associate editor coordinating the review of this manuscript and approving it for publication was Yongming Li . soil temperature in the range, the growth rate of crops will be elevated [4]. In addition, in some other fields such as water resources and hydrologic engineering, soil temperature is an important factor. Moreover in the field of atmospheric science, the change of soil temperature have obvious effects on the decomposition of the organic matter, which leads to an increment of the carbon dioxide (CO2) to the atmosphere [5]. Therefore, a model which can forecast soil temperature accurately is widely needed. Soil temperature forecasting at different depths has different contributions. For example, Zeynoddin et al. [6] pointed out that the soil temperature of 5 cm depth is significant for seeds germinations. And the soil temperature of 10 cm depth influences the activities of most vital ecosystems. For the soil temperature of 20 cm depth, it has more important impact on root absorption activities.
Common methods for obtaining soil temperature include the following two ways: directly measure the soil temperature and indirectly get soil temperature through prediction model. The direct measure method is achieved by inserting high-precision temperature sensors in the soil. Unlike direct measurement, indirect method can be achieved by many kinds of prediction models, which were trained by the historical values at any depth and time. Comparing with the direct measure method, predictive models could get the future soil temperature in advance even if there is a partial error with the real value at that time. If the error of the prediction model is small enough and the model has high accuracy, the predicted temperature can be used as a guide. In recent years, many kinds of prediction models aiming on accuracy soil temperature forecasting have been proposed. These approaches can be divided as follow: statistical methods and machines learning models. When using prediction models to forecast soil temperature, different models will use different environmental factors, such as air temperature, solar radiation, relative humidity, atmospheric pressure, wind speed and other surface characteristics [7], [8]. Statistical models, also named Box-Jenkins models [9], are widely used for time series and predicting subsequent series. Therefore, statistical models use historical time-based temperature series to predict future soil temperature. The commonly used statistical models are the Auto-Regressive Moving Average (ARMA) [10] model and the Auto-Regressive Integrated Moving Average (ARIMA) model. These models are often used to predict time-based series. But temperature series is not stationary enough for these models. Even though the data has been manipulated by calculus of differences, these two models are not suitable for long-term soil temperature forecasting.
Recent years, the most wildly used methods to forecast soil temperature are machine learning, including support vector regression (SVR) [11]- [13], random forest (RF) [14], [15], artificial neural networks (ANN) [16]- [20] and other machine learning models [21]- [24]. SVR, developed based on the support vector machine (SVM), is the regression version of SVMs, which is one of the most important applications of function approximation and is widely used for temperature data prediction. RF is a non-linear statistical ensemble method which was proposed by Breiman. It used ''bagging'' to ensemble a collection of decision trees with controlled variance to serve as a prediction model. Gradient boosting decision tree (GBDT) [25] is a similar method to RF, but GBDT model is fit on the residual of the former trees to reduce the biases instead of reduce the variances. And eXtreme gradient boosting (XGBoost) [26] is another tree boosting system which performs better than GBDT and had already succeeded in many prediction cases. Among these models, different structures of ANN has been widely used in the prediction of soil temperature such as backpropagation neural networks (BPNN) [27], multilayer perceptron (MLP), radial basis neural networks (RBNN), multilayer perceptron (MLP), extreme learning machine (ELM) and other neural networks. Feng et al. [28] used GRNN, BPNN, RF and ELM to predict the half-hourly soil temperature at four depths in China. The recurrent neural network (RNN) [29] is neural sequence model and performs well in processing timebased series data [30], [31]. But RNN could not solve the problem of missing information over long time series. As an improvement of RNN, long short-term memory neural network (LSTM) [32]- [36] was proposed. LSTM is improved and applied for the time series forecasting due to its ability of learning long time series without vanishing gradient problem, so LSTM model is used for time series forecasting, especially suitable for longer series predictions. After these models, some ensemble approaches were proposed and succeeded in handling time series prediction [37]- [39]. Zhang et al. [40] proposed a model which added ensemble empirical mode composition (EEMD) to LSTM, called EEMD-LSTM. With the help of EEMD method, EEMD-LSTM performed better than LSTM on forecasting soil temperature.
In this article, we propose a model using EEMD and convolutional neural networks (CNN), called EEMD-CNN, to forecast soil temperature using time-based temperature series to solve the problem of accurately predicting soil temperature in some cases when other environmental factors are incomplete or missing. CNN is usually used for image classification, speech recognition natural language processing and other tasks, the input of CNN usually is two-dimensional or threedimensional features. Therefore, the one-dimensional series is not suitable as input to CNN. EEMD-CNN just solved the problem of input dimension and convolutional kernel extracts more subtle changes. To prove the performance of the EEMD-CNN, BPNN, LSTM and EEMD-LSTM, these machine learning models which are also appropriate for the time-based temperature series, are applied and compared with our model. Comparing EEMD-CNN with BPNN and LSTM, the input data of EEMD-CNN not only includes the original soil temperature series, but also contains the IMFs which are processed by EEMD. And the difference between EEMD-CNN and EEMD-LSTM is the structure of the neural network.
Besides these models, we also used persistence forecast (PF) [41], a simplest possible persistence forecast method, which can be comprehended as today's soil temperature is tomorrow's forecast. For a more comprehensive comparison of our model's performance, we used data from there areas of each has three soil temperature series from different depths. For each depth, we used models to forecast the soil temperature of one day delay, three days delay and five days delay. Mean squared error (MSE), root mean square error (RMSE), mean absolute error (MAE) and r-squared (r2-score) were used as evaluation metrics to evaluate the performance of four models. The objectives of our study are: (1) to use the evaluation metrics to comprehensively evaluate the prediction results of the four models (2) to compare the performances between these models and (3) to demonstrate the superiority of the EEMD-CNN when using time-based soil temperature series to predict the future soil temperature.  The rest of the paper is organized as follows. Section 2 describes the study area, introduces the EEMD-CNN and other models that are used for comparison. The performance criteria are also introduced in this section. Section 3 presents the results and discussion. Section 4 gives the conclusion.

A. STUDY AREA
The study areas of this article are Lägern (47 • The experimental data set is downloaded from FLUXNET (https://fluxnet.fluxdata.org/) on a daily time scale to test the models on predicting soil temperature in three areas. The information about the data is shown in Table 1.

B. CONVOLUTIONAL NEURAL NETWORK BASED ON ENSEMBLE EMPIRICAL MODE DECOMPOSITION (EEMD-CNN) 1) ENSEMBLE EMPIRICAL MODE DECOMPOSITION (EEMD)
Empirical mode decomposition (EMD) was proposed by Huang et al. [42], [43] to adaptively decompose a complex signal into series of intrinsic mode functions (IMFs) according to the signal characteristics. The decomposed IMFs contain local characteristic signals of different time scales of the original signals. The essence of the EMD is to identify all the intrinsic oscillatory modes contained in the signal through the characteristic time scale. EMD is self-adaptability because it is based on the local characteristics of signal sequence time scale.
To decompose the IMFs from the original signal, the process of EMD can be described as follows: 1. Find all the extreme points of the signal x(t).
2. Use the cubic spline fitting to fit the envelope lines e max (t) and e min (t) of the upper and lower extreme points, and find the average value m(t) of two envelope lines, 3. Judge whether h(t) is IMF according to the preset criteria.
4. If not, then replace x(t) with h(t) and repeat the above steps until h(t) satisfies the criterion, then h(t) is the IMF C k (t). 5. Every time an IMF is obtained, it is deducted from the original signal, repeat the above steps until the last part of the signal r n (t) is just a monotone sequence or constant sequence.
In this way, after decomposition by the EMD, the original signal x(t) is decomposed into several IMFs and a residual: EMD has many advantages, but EMD has some problems and shortcomings at the same time, such as mode mixing and endpoint effects. Endpoint effects means that different ways of handling endpoint effects in the EMD decomposition process will bring different results, and mode mixing may cause serious aliasing in the time-frequency distribution leading to degrading the decomposition accuracy. On the basis of EMD, ensemble empirical mode decomposition (EEMD) [44] is proposed. EEMD takes advantage of the uniform distribution of white noise spectrum, adds white noise to the signal to be analyzed, so that the signal of different time scales can be automatically separated to the reference scale that suits them. EEMD mainly adds white noise to the signal to supplement some missing scales and has a good performance in signal decomposition. EEMD can be described as follows: 1. Add normally distributed white noise to the original signal.
2. Take the signal with white noise as a whole, and use EMD to decompose the signal into IMFs.
3. Repeat step 1 and step 2, add different normally distributed white noise sequence to the original signal each time.
4. Integrate and average the obtained IMF as the result. Comparing with fourier transform, wavelet decomposition and other methods, EEMD is intuitive, indirect, posterior and adaptive. And EEMD is an improved algorithm of EMD, which can effectively solve the problems of mode mixing and endpoint effects existing in EMD. Due to the characteristics of zero-mean noise, the added white noise will cancel each other after multiple average calculations, so that the calculation result of the integration and averaging can be directly regarded as the result. Therefore, EEMD is chosen to decompose raw time-series data.

2) CONVOLUTIONAL NEURAL NETWORK (CNN)
CNN [45]- [48] is a kind of feedforward neural network which contains convolution computation and is one of the representative algorithms of deep learning. Usually, the hidden layer of CNN is composed of many convolutional layers and pooling layers. Therefore, CNN can make better use of the input data of the two-dimensional or three-dimensional structure. When the two-dimensional or three-dimensional data is input into CNN model, the convolution kernel will shorten the length and width of the original input data and increase the number of channels of the data to extract the features of the original data. Then reshape the three-dimensional data into one-dimensional, and the one-dimensional data goes through the full-connect layers, we get the output of CNN. There are usually several pooling layers in the CNN model to reduce dimension, reduce computation and reduce memory consumption. However, in our model, the input data used is not complex, so the pooling layer is not used. CNN are often used for image recognition, image classification and natural language processing.

3) EEMD-CNN
In this study, EEMD-CNN is the method we proposed to predict soil temperature. Time-based temperature series is a one-dimensional data which couldn't be the input data of two-dimensional convolutional neural network. But the nonstationary and nonlinear temperature series can be decomposed by EEMD into several IMFs and residue item. The decomposed IMFs can be regarded as a supplement to the original one-dimensional data. Extend the original data with IMFs to make the input two-dimensional data have the same number of rows and columns. The workflow chart of the EEMD-CNN model is shown in Figure 2. The process of training the EEMD-CNN model are as follow: 1. Use EEMD to decompose the soil temperature series into several IMFs and residue item.
2. Combine the 7-days continuous temperature sequence with IMF1 to IMF6 of corresponding time. The obtained 2-dimensional (7 * 7) array as the new input data.
3. Use the next day's temperature of 7-days continuous temperature sequence as the target to train the EEMD-CNN model. 4. Input the test set into the trained model, compare the original temperature series with the predict series. Evaluate the performance of the EEMD-CNN model using several statistical evaluation metrics.

C. COMPARATIVE EXPERIMENTAL METHOD 1) PERSISTENCE FORECAST (PF)
In this study, the time-based temperature series are spaced by days. PF [41] is a kind of simple forecast method which regards first day's temperature is next day's forecast. Since the temperature difference between two consecutive days is not large, it is feasible to predict next day's temperature directly from the temperature of the previous day. This is also the prediction method with the lowest calculation cost. If the prediction results that are obtained by machine learning methods perform worse than the persistence forecast's performance. It would be better to use today's temperature to predict tomorrow's instead of machine learning methods. It can be regarded as the minimum standard to evaluate the feasibility of machine learning methods. That means any soil temperature prediction model needs to beat the PF.

2) BACKPROPAGATION NEURAL NETWORK (BPNN)
Artificial neural networks (ANN) which imitates the structure of animal nervous system is one of the widely applied machine learning models. And backpropagation algorithm is the often-used algorithm to train network. The ANN trained by backpropagation algorithm is called BPNN. Usually, BPNN consists of an input layer, one or more hidden layers and an output layer. Data is input from the input layer passing through the hidden layers. Finally, we get the predicted result in the output layer. The error between the output result and the label updates the weight of each neuron according to backpropagation algorithm. In this study, BPNN utilize timebased temperature series to predict the future temperature.

3) LONG SHORT-TERM MEMORY NEURAL NETWORK BASED ON ENSEMBLE EMPIRICAL MODE DECOMPOSITION (EEMD-LSTM)
The long short-term memory neural network (LSTM) is a kind of recurrent neural network (RNN) which is successfully used on many fields such as natural language processing, speech recognition, handwriting recognition and time series prediction. In RNN, the transfer between the two adjacent cells only includes h-state (hidden state). RNN is not for long VOLUME 9, 2021 sequences because the recent data affects more on the h-state and the long-term data's impacts will be weaken. And LSTM solves this problem with input gate, output gate and forget gate. Three kinds of gates were added into LSTM cell, that makes a c-state (cell-state) run through the whole network. The c-state records a long-term memory and three kinds of gates decide what will be recorded into long-term memory and what will be delivered to next cell as the short-term memory in h-state.
Like EEMD-CNN, IMFs which are decomposed by EEMD will be combined with the original data and be used as the inputs of the EEMD-LSTM to predict the future temperature.

D. MODEL TRAINING AND TEST
In this study, nine temperature series from three areas and three depth are used as the input data to train models to predict the temperatures one day later, three days later and five days later. Each series will be split into two parts. The first part, including 80% data, is used for training models. The remaining 20% data is used as the testing set.
To evaluate the performance of EEMD-CNN and compare EEMD-CNN with other models, several statistical evaluation metrics including mean squared error (MSE), root mean square error (RMSE), mean absolute error (MAE) and r-squared (r2-score) are applied to assess model performance.
The evaluation metrics are defined as follow (y i is the true value of the testing set,ŷ i is the prediction value of the model):

III. RESULTS AND DISCUSSION
In this section, BPNN, LSTM, EEMD-LSTM, persistence forecast and the proposed EEMD-CNN models are used to   forecast temperature of three areas. Each area's result including the data from three depths and the series of each depth are compared in three delays. The performance on Lägern's soil temperature data is shown in Table 2, performance on Oensingen's is in Table 3 and performance on Fluhli's is in Table 4.

A. EXPERIMENTAL RESULTS IN LÄGERN
In the experiment in Lägern, the temperature series used in the model comes from depths of 5 centimeters, 10 centimeters and 30 centimeters underground from Lägern. There are 3782 daily records (08/24/2004-12/31/2014) at each depth. We extract 3500 consecutive data from the series and split 3 series into train set and test set. Process data according to different models and train models with train set. Then use the trained models and test set to forecast temperature, the estimation results are showed in Table 2. As shown in Table 2, when forecasting the soil temperature in the case of one day delay, the models we used all have good performance, and the EEMD-CNN performs best of all models. In the estimation results of three days delay, performances of all models decline, but EEMD-CNN has the least decline in performance. In the performances of five days delay, EEMD-CNN still gets the best performance. Even at three different depths, EEMD-CNN's performances of five days delay are all better than the second good model EEMD-LSTM's performances of three days delay. Figure 4-6 show the comparison of models' prediction results of three depth's in three kinds delay. Figure 7 shows the comparison of three kinds delay on forecasting the soil temperature of the same day. Figure 7 presents that even under the influence of delay, EEMD-CNN has more stable and accurate predictions. EEMD-LSTM is affected more by delay, the longer the delay, the greater the error. For BPNN and LSTM, the delay makes the prediction results of the two models deviate significantly. It can be seen from the figure that two models try to fit the origin temperature, but the delay directly leads to the elevation of error.   It can be achieved that EEMD-CNN has the best performance under the same delay among five models from Figure 8. And with the increase in delay, the accuracy advantage of prediction becomes more obvious.

B. EXPERIMENTAL RESULTS IN OENSINGEN AND FLUHLI
In the experiment in Oensingen and Fluhli. The estimation results of these two areas are showed in Table 3 and Table 4. As shown in the two tables, the five models that are trained from the soil temperature data of these two areas, the performance of the estimation results obtained from these models is roughly the same as the models' performance in Lägern. In Table 3 and Table 4, in these models which are trained by the data at same depth and with the same delay, EEMD-CNN performs best under the same data conditions. This rank is the same as the performance of EEMD-CNN in Lägern.

C. ANALYSIS
In this study, after training all models using data from three areas, it can be seen that EEMD-CNN has the best performance both in the prediction of different temperature series and in the prediction of the temperature of a few days later. EEMD-CNN is the best choice in both cases. This result is probably because of the convolution kernel used in the convolutional neural network. In BPNN, the neuron used is the basic neural network cell. It is a common neuron in the feed-forward artificial neural network architecture. In LSTM and EEMD-LSTM, the neuron used is recurrent cell. This kind of neuron has connections not only between cells, but also along the timeline. Recurrent cell enables the model to inherit some of the previous states during the training process, so that the continuous sequence can perform better on the model. LSTM is an improvement over RNN. LSTM added a record of the state of the entire training process to the traditional RNN, so that LSTM can be better applied to continuous time series. Both BPNN and LSTM performed reasonably well at predicting the temperature of one day after. The reason for that is the last input node in the BPNN is close to the predicted label temperature and assigned a higher weight, with a little correction and then the model could get a prediction with a small error. The same as LSTM, where the input of the last cell accounts for a more important effect, and the state which was produced by the previous cells adjust the final input to produce the final prediction. Then it is an acceptable performance in the case of one day delay. But in the case of three or even five days delays, BPNN still gives high weight to the last input node. However, with delays of three or five days, the input of the last input node has a greater error from the label. That makes the fitting effect of the model decreases with the increase of the delay. As the results shown in the Figure 7, the model tries to fit the results to the label, but the supplied sequence can only be predicted as far as possible based on the input value of the last node. Therefore, BPNN's prediction results in the case of three of five days delay are more similar with the line graph of persistence forecast results. LSTM has the same problem, each input into the recurrent cell is the temperature of 1 day in the temperature series, the input of the last cell influences the predicted of the output more, so when the delay increases, LSTM's result is similar with BPNN's. With the addition of EEMD, the predicted result was improved. In EEMD-LSTM, it also uses recurrent cells, but the input of cells is not just the temperature of one day. The input also has the IMFs which are decomposed by EEMD from temperature series. IMFs can enrich the features input into the network and help the model to fit better. With the help of IMFs, EEMD-LSTM not only performs better than the LSTM in the case of a one-day delay. Even when predicting the temperature of a few days later, EEMD-LSTM fits better than LSTM. CNN is usually used for image data and voice data. These types of data are characterized by more than one dimension, which is just suitable for convolution kernel to extract local features within a range. And the way to extend the temperature series with IMFs is not just to insert an IMFs sequence between the temperatures of two adjacent days. IMFs can also be used to extend the data in longitudinal dimensions, make temperature series extend into two dimensions. And the expanded data just fits the data dimension which is suitable for convolution operation. Any block data in two dimensions is related to adjacent blocks. Therefore, different from those models using one-dimensional data such as BPNN and LSTM, the features extracted by convolution operation of EEMD-CNN can better capture the features that can affect the temperature change between the data. These features become clearer after being extracted by several layers of convolution operation. As a result, the EEMD-CNN model fits best, on both the data of different depths and the predicted temperature after a few days.

IV. CONCLUSION
In this article, EEMD-CNN is proposed for daily soil temperature forecasting. The daily temperature series is one dimensional data. Therefore, EEMD is used to split the temperature series to make the single temperature features into multiple features. The features are combined with the original temperature series to obtain a new two-dimensional feature. Then the CNN is trained with the two-dimensional feature to forecast temperature. During the test, the EEMD-CNN model was tested under three areas, three depths and three different delays. And after comparing the performance with other models, EEMD-CNN has the best performance in all these cases. The tests show that EEMD-CNN may be useful for soil temperature prediction and will be a start for more fields of temperature prediction systems.