Estimating Soil Moisture Over Winter Wheat Fields During Growing Season Using Machine-Learning Methods

Soil moisture is vital for the crop growth and directly affects the crop yield. The conventional synthetic aperture radar (SAR) based soil moisture monitoring is often influenced by vegetation cover and surface roughness. The machine-learning methods are not constrained by physical parameters and have high nonlinear fitting capabilities. In this study, machine-learning methods were applied to estimate soil moisture over winter wheat fields during its growing season. RADARSAT-2 data with quad polarizations and 240 sample plots in the study area were acquired and collected, respectively. In addition to the four linear polarization channels, polarimetric decomposition parameters were extracted to expand the SAR feature space. Three advanced machine-learning models were selected and compared, which were support vector regression, random forests (RF), and gradient boosting regression tree. To improve the performances of the models, three feature-selection methods were compared, which were based on Pearson correlation, support vector machine recursive feature elimination, and RF, respectively. The coefficient of determination (R2) and root-mean-square error (RMSE) were used to compare and assess the performances of those models. The results revealed that polarimetric decomposition parameters were effective in estimating soil moisture, and RF model obtained the highest prediction accuracy (training set: RMSE = 2.44 vol.% and R2 = 0.94; and validation set: RMSE = 4.03 vol.%, and R2 = 0.79). This study finally concluded that using polarimetric decomposition parameters combined with machine-learning and feature-selection methods could effectively estimate soil moisture at a high accuracy, which helps monitor soil moisture across the agricultural field during its growing season.


Estimating Soil Moisture Over Winter Wheat Fields
During Growing Season Using Machine-Learning Methods

I. INTRODUCTION
I N THE earth ecosystem, soil moisture is an important part of the land surface water cycle, which directly affects surface runoff and water energy exchange between the atmosphere and surface [1]- [4]. In agricultural applications, soil moisture is a crucial part of soil fertility and an important factor affecting the crop growth and development, and an early warning information of crops drought disaster [5], [6]. Therefore, it is widely used in crop growth modeling and yield forecast [7].
Traditionally, soil moisture content (SMC) measurement is carried out through field survey, which is time-consuming and laborious, and only limited sampling data can be obtained [8]. More recently, compared with optical remote sensing, microwave remote sensing has stronger penetrability, which enables it to penetrate vegetation, soil, and other surface covers. It is also not disturbed by weather conditions. Therefore, microwave remote sensing is more and more widely used in estimating SMC [9], [10]. Due to its high sensitivity to soil moisture, synthetic aperture radar (SAR) has been extensively used in SMC estimation at a high temporal and spatial resolution [11]- [13]. Moreover, because the polarimetric SAR data can provide multiple polarizations information, it can be used not only to estimate soil moisture over bare soil surface but also over vegetation cover area [14]. To analyze and understand the scattering mechanisms of ground targets, polarimetric decomposition of original SAR to extract hidden physical information has gradually proved to be an effective method [15]- [17]. Huynen [18] first proposed polarization target decomposition theorem in 1978. So far, many researchers have developed other polarization decomposition methods [19], [20]. In consideration of the fact that full polarimetric SAR can provide abundant polarimetric information, and polarimetric parameters obtained by polarimetric decomposition models have been applied in soil moisture inversion [21]- [23].
Hajnsek et al. [8] first investigated the potential of soil moisture This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ inversion under various crops cover at L-band based on model decomposition. In their study, a three-component decomposition method proposed by Freeman and Durden [19] and its modified decomposition methods were used. Huang et al. [24] proposed a self-adaptive two-component decomposition, which took into account scattering from crop surface and canopy volume to estimate soil moisture for C-band RADARSAT-2 SAR. Wang et al. [7] proposed a polarimetric decomposition method based on the C-band, which ignored the dihedral scattering component and takes the attenuation of vegetation into account to simplify the inversion of soil moisture.
Generally, soil moisture estimation based on the SAR data can be realized using the following approaches [25]: 1) empirical/semiempirical models; 2) theoretical electromagnetic models; and 3) machine-learning approaches.
The empirical/semiempirical model approach are based on the statistical laws obtained by means of abundant field experiments at certain study sites [26], such as model of Oh et al. [27] and Dubois et al. [28]. However, they are usually only applicable to a specific range of surface roughness conditions, SMC, radar frequency, and SAR incident angles. In addition, it is difficult for empirical or semiempirical models to solve nonlinear problems. For the second approach, the theoretical model simulates the backscattering coefficient through soil properties (such as dielectric constant and surface roughness), which is the frequently used methods for retrieving soil moisture for known characteristic regions [26]. Fung et al. [29] proposed the integral equation model, which is one of the most popular theoretical models for soil moisture estimation. However, this model is not suitable for vegetated areas and areas without prior knowledge, such as surface roughness information. In recent years, machine-learning method has gained popularity in soil moisture retrieval due to its capability to avoid complex physical relations and efficiency in solving nonlinear problems. Another advantage of machine learning is that the number of required parameters is not limited by the surface parameters [10], [25].
Using machine-learning models, soil moisture has been successfully estimated over both bare land and vegetated areas. In terms of the combination of the active and passive microwave remote sensing data, Pasolli et al. [30] used two machine-learning approaches, namely support vector regression (SVR) and multilayer perceptron neural network, to retrieve soil moisture. Zhang et al. [31] used SVR to estimate soil moisture in bare farmland based on multiband satellite data. Based on the soil moisture active passive brightness temperature, Tong et al. [10] used two machine-learning methods, namely SVR and random forests (RFs), and statistical-based ordinary least squares model to obtain the dynamic change of soil moisture of agricultural land in southeast Australia from 2015 to 2019. However, there are few research articles studying the comparison and analysis of various machine-learning methods combined with polarimetric parameters in estimating SMC during wheat growing season. To fill this gap in the current literature, this study attempts to estimate SMC within the winter wheat growing season based on the polarimetric parameters and three advanced machinelearning methods. The machine-learning methods used in our study include SVR, RF, and gradient boosting regression tree (GBRT). These methods were selected because they have proven to be effective in estimating a variety of ecological parameters, including soil moisture [11], [32]- [35].
There are two main purposes in this study. First, investigating the potential of the multiple polarimetric parameters obtained from polarimetric decomposition model combined with backscattering coefficient for helping soil moisture retrieval in agricultural region. Second, evaluating the performance of three proposed machine-learning models, SVR, RF, and GBRT combined with three feature-selection methods [based on Pearson correlation coefficient, support vector machine recursive feature elimination (SVM-RFE), and RF], in soil moisture estimation.

A. Ground Truth Data Collection
A rain-fed agricultural site located in southwestern Ontario, Canada, was chosen as the study area (see Fig. 1). The common crops grown in this region are corn, soybean, and winter wheat. In our study, only the soil moisture data of the winter wheat field were measured. Therefore, only the winter wheat field was selected for model construction and accuracy evaluation. The study area selected in this study was an L-shaped region with an area of about 27 hm 2 (see Fig. 1). Winter wheat in the study region is typically sown in October every year. With the gradual warming of the weather in the following spring, winter wheat starts its regrowth in April. Winter wheat harvest usually occurs at the end of July or early August depending on the weather conditions. Ground measurements for this study were conducted during the wheat growing season from May to July 2019, and a total of eight field campaigns were carried out. The acquisition date of ground data was coincident with that of RADARSAT-2 satellite overpass.
The soil sampling locations were designed in such an approach that the spatial distance between any two adjacent sampling points was more than 50 m. Surface SMC was measured at the depth of 0-5 cm using a theta-probe soil moisture sensor. To avoid the accidental error in the measurement, SMC was measured for six times for each sampling point, and the final mean value was taken as the actual SMC of the sampling point. At the same time, a global positioning system was used to record the specific locations of sampling points. During the entire growing season of wheat, a total of 240 samples were collected in the study area (see Table I). The number of sites surveyed on each of the sampling dates was 32, except July 10 on which only 16 sample sites were surveyed. Soil moisture readings ranged from 4.17 to 40.20 (vol.%) among all sampling sites throughout the field campaigns.

B. RADARSAT-2 Data Acquisition and Preprocessing
Eight full polarimetric RADARSAT-2 images covering the study area were acquired during the wheat growing season in 2019 (see Table II). The data received from the data provider were in single-look complex format at the fine quad polarization mode. The nominal spatial resolution of these images is about 8 m. The RADARSAT-2 image of May 20th is acquired in ascending path, and the rest of images were all acquired in descending path. The SAR incidence angles ranged from 17.22°t o 50.22 • for all the image acquisitions in our study.
3) Polarimetric speckle filtering using refined Lee filter with 7× 7 window size for noise reduction.

C. Sentinel-2 Data Acquisition and Preprocessing
Sentinel-2 is a high-resolution multispectral imaging satellite with a multispectral imager. To obtain the vegetation description parameters of the sampling sites in our study, Sentinel-2 images as close as possible to the respective sampling date were obtained. In total, six cloud-free Sentinel-2 images were obtained. The details of Sentinel-2 data and the corresponding sampling date are given in Table III.
Sentinel-2 data were processed using the following steps: first, radiometric calibration and atmospheric correction: converting the top of atmosphere apparent reflectance to surface reflectance; second, calculating NDVI of the study area as the vegetation description parameter.

III. METHODOLOGY
In this study, three types of machine-learning models for estimating soil moisture in the agricultural region through integrating polarimetric decomposition parameters and backscattering coefficient were used and compared. Fig. 2 illustrates the workflow of soil moisture estimation used in this study.
2) Different polarimetric decomposition methods were applied to obtain the polarimetric parameters. 3) Feature parameters of the polarimetric decomposition parameters and four linear polarizations of each sample point were extracted. 4) Soil moisture estimation database was constructed based on the feature parameters and measured soil moisture. 5) Training and validation sets were generated. For the sample points of each date, 70% were randomly selected for model training, and the remaining 30% were used for model validation. In total, 165 training samples and 75 testing samples were obtained. 6) A model was established on the training set based on different machine-learning models and feature-selection methods. 7) Performance of different models was evaluated to verify the effectiveness of feature selection.

A. Feature Extraction
A feature space for soil moisture estimation was created according to RADARSAT-2 datasets. Initially, the available parameters of RADARSAT-2 data for the study region were four linear polarization channels. Then, the polarimetric decomposition parameters extracted from the original images were used to extend the SAR feature space.
In this study, coherency matrix T3 [36] and various polarimetric decomposition methods were applied to extracted relevant polarimetric features, and the detailed description of these polarimetric parameters refers to the article presented in [36]. The features extracted in this article are illustrated in Table IV. B. Machine-Learning Methods 1) Support Vector Regression: Support vector machine (SVM) is proposed for the principle of structural risk minimization. The application of SVM to regression prediction is called SVR [40]. Because SVR has fine generalization capability, it has been widely used in the remote sensing inversion of ecological parameters [41]. For solving nonlinear problems, the core idea of SVR is to transform nonlinear problems into linear problems in the high-dimensional space, and then use a kernel function to replace the inner product operation in the high-dimensional space, so as to simplify the calculation [40]. There are four kinds of kernel functions in SVR: polynomial, linear, sigmoid, and Gaussian. When using SVR to estimate surface parameters in remote sensing field, Gaussian kernel function is proved to be effective for the estimation of ecological parameters [30], [34], [42]. Therefore, we selected Gaussian as kernel function in this study. When constructing SVR model, some model parameters have great influence on the estimation results, among which the typical parameters include "gamma" (kernel parameter) and C (penalty coefficient) [34]. Therefore, it is necessary to set the appropriate parameter values of gamma and C.
2) RF Regression: RF is constructed on the basis of multiple decision trees, which is a popular ensemble learning method based on the statistical theory to solve classification and regression problems [43]. When RF is used in regression problem, it is called random forest regression (RFR). The RF model is established through the following steps. First, based on the training dataset, use bagging algorithm to generate a homogeneous subset. Second, apply the classification and regression tree algorithm, then basic decision tree is constructed using each bootstrap dataset. Finally, all decision trees are combined to generate an RF model [33], [34]. Because the result of RF algorithm is the average of all the predicted values of decision trees, the RF has high capability to resist over fitting [43].
3) Gradient Boosting Regression Tree: The core idea of GBRT is gradient boosting algorithm, which was proposed by Friedman in 1999 and is an improvement over the traditional Adaboost algorithm [44]. Each calculation is to reduce the residual, then a new model is built in the direction of the gradient of residual reduction. The GBRT is also considered as a machine-learning method based on multiple decision trees with strong generalization ability, which is an algorithm to regress data by the linear combination of basic functions and reducing the residual error produced in the training process [35], [44]. The advantage of GBRT is that it can deal with many types of data. However, due to the sequential operation mechanism of boosting algorithm, the GBRT can hardly be parallelized. The most important model parameters in GBRT are the number of decision trees ("n-estimators"), the maximum depth of a subdecision tree ("max depth") and the learning rate.
C. Feature-Selection Algorithms 1) Pearson Correlation Coefficient: Pearson correlation coefficient (R) is the simplest method to judge whether there is a linear correlation between two variables, and its value is between −1 and 1. The closer the absolute value of R is to 1, it means that there is a strong linear correlation between the two variables; on the contrary, the closer the R is to 0, it means that there is no linear relationship between the two variables. Its calculation formula can be expressed as where Cov(X, Y) represents the covariance of X and Y, Var(X) is the variance of X, and Var(Y) is the variance of Y.

2) SVM Recursive Feature Elimination:
The recursive feature elimination (RFE) method uses a base model to carry out multiple rounds of training. After each round of training, some features of weight coefficients are removed, and then the next round of training is carried out based on the new feature set.
When SVM is selected as the base model in RFE, it is called SVM-RFE [45]. The core idea of SVM-RFE is to repeatedly build SVM models. After each model construction, all features will be assigned a weight coefficient, and the features with the minimum weight will be eliminated. Repeat the above steps for the remaining features until the number of remaining features reaches the required number [46]. According to the order of feature elimination in the iterative construction process of the model, the feature importance ranking based on the SVM-RFE can be obtained. The feature that is removed first has the least importance to the model.
3) Random Forest: In addition to solving classification and regression problems, RF is also an embedded method of feature selection [47]. An RF algorithm is used for feature selection because it can calculate the importance of each variable in the process of model construction [42]. In RF, a bootstrap aggregation algorithm is used to construct a bootstrap set based on the training data. However, almost a third of the training data are still not used. These training samples are called out of bag (OOB) samples [42]. According to the error rate of the model on OOB samples, the performance of the model can be evaluated. After adding random noise to different features, the error rate is calculated repeatedly, and the importance of the feature to the model construction is judged according to the change of the error rate [42], [43], [47].

D. Cross Validation (CV) and Parameter Optimization
The values of some typical parameters in the machinelearning model will affect the performance of the model. Therefore, it is necessary to adopt the appropriate methods to determine the values of typical parameters in different models. In this study, CV combined with grid search (GS) was used to determine the values of some typical parameters in machine-learning methods.
The main idea of CV is to separate the dataset into K parts in which k-1 parts are the training set and the remaining 1 part is the validation set. In this study, K was set to 12. The GS is a method to adjust the model parameters by exhaustive search, which is usually combined with CV to optimize the model parameters. In the selection of all candidate parameter combinations, the estimation results of model CV under each parameter combination are obtained through loop traversal, and the parameter combination under the best estimation results is the final selected parameters. When using GS and CV to optimize models' parameters, the mean square error of the validation set is often used to assess the estimation performance of the model. In this study, for SVR, the optimized parameters are C and "gamma"; for RFR, the optimized parameters are "n-estimators"; and for GBRT, the parameters to be optimized are "n-estimators" and "learning rate."

E. Performance Assessment
In this study, R [see (1)], R 2 [see (2)], and root-mean-square error (RMSE) [see (3)] were selected as the evaluation indices for the accuracy of soil moisture estimation model. A higher R 2 where yˆi and y i represent the predicted and measured SMC values of the ith sample respectively;ȳ represents the mean value of the measured soil moisture values; and n is the total number of samples used.

A. Feature Selection
From the original images, a total of 30 feature parameter variables were obtained. To improve the performance of the soil moisture estimation models, we tried to reduce redundant features and improve the accuracy of model estimation by using feature-selection methods [48]. Three types of feature-selection methods (R, SVM-RFE, and RF) were compared in this study. Different methods could get different rankings of feature importance. The R between the different feature parameters and the measured SMC was calculated, and the results can be seen in Fig. 3. From Fig. 3, it could be concluded that the individual parameter is not well correlated with SMC, and the absolute R of these parameters and soil moisture is basically below 0.5. The importance of variables based on R was arranged according to the absolute values of R between the individual parameter and SMC.
The importance of the variables based on SVM-RFE was based on the order in which the variables were eliminated when constructing the model, and the feature importance ranking obtained based on the SVM-RFE is given in Table V. For the feature-selection method of RF, the importance score that means the importance degree of different features in the construction of RFR model of each variable can be obtained after building the RFR model using the training set of all variables. The higher the importance score of the feature, the greater the impact of the feature on the prediction results. The results are shown in Fig. 4.

B. Model Training and Performances
The importance ranking of features selected by different methods is different. To compare the performances of various feature-selection methods and machine-learning models, features selected by different methods were applied to each of the three machine-learning models. The specific method was as follows. When constructing different regression models, the number of features was gradually increased according to the feature importance ranking obtained by feature-selection methods. First, only the feature with highest importance ranking was used to construct the model, and then the top two features were used to construct the model. Finally, the number of features was gradually increased until all the features were used. In addition, during the process of model construction, GS and CV were used to determine the typical parameters of different models.
The estimation results of the three machine-learning methods (SVR, RF, and GBRT) in combination with different featureselection methods using the validation set are illustrated in Fig. 5. The horizontal axis represents the number of variables used, and the vertical axis represents the correlation coefficient between the estimated SMC and the measured SMC. Fig. 5 reveals that the fitting accuracy of the three models increased first and then plateaued, and it is especially the case for the RF model and GBRT models. Compared with the other two models, the fluctuation of fitting accuracy of the SVR on the validation set is larger, but the overall trend remains the same.
For SVR-M, the best fit was obtained by using the RF-based feature-selection method (R = 0.81), and the number of features used was eight. The highest prediction performance based on SVM-RFE was R = 0.81 with 19 features. The highest prediction performance based on the correlation coefficient was R = 0.80 with 23 features. It could be seen that SVR combined with the three feature-selection methods has a small difference in the best prediction accuracy, but the number of features used to build the model varies largely, and SVR-M based on RF feature-selection method used much fewer features. RF-M and GBRT-M obtained the similar results. Table VI presents the best fit of the achieved three models and the number of features used.
From Table VI and Fig. 5, it can be seen that when using the RF-based feature-selection method, different machine-learning methods could achieve a higher estimation accuracy with fewer number of features (SVR: 8; RF: 13; and GBRT: 5). When constructing different machine-learning models using featureselection method based on R or SVM-RFE, more features are required. For the feature selection based on R, the number of features used by machine-learning models to achieve the best fitting was as follows: SVR: 19; RF: 27; and GBRT: 27; for the SVM-RFE feature-selection method, the result was SVR: 19; RF: 27; and GBRT: 27. However, comparing with the R-based models, the models using SVM-RFE feature-selection method achieved an acceptable performance when the number of features was small, as shown in Fig. 5.
From the above results, it can be concluded that regression models based on RF feature selection had the highest R and the lowest RMSE on the validation set when a small number of features were selected. The results of different models based on RF feature selection are shown in Fig. 6, and it can also be seen that when using RF feature selection, the three models (SVR-M, RF-M, and GBRT-M) achieved the highest fit with a fewer number of features from Fig. 6. Figs. 7 and 8 show the scatterplots of the estimated SMC and the measured SMC on the training set and validation set when the three machinelearning models achieved the best estimation performance on the validation sets.   RFE-M also achieved the highest fitting accuracy (R 2 = 0.79 and RMSE = 4.03vol.%). Although the performance of GBRT-M on the training set was worse than that of SVR-M, its fitting performance on the validation set was better than that of SVR-M with R 2 = 0.72 and RMSE = 4.28vol.%. SVR-M had the worst fitting performance using the validation set; R 2 and RMSE are 0.66 and 4.48vol.%, respectively.

C. Soil Moisture Dynamics During Winter Wheat
Growing Period Fig. 9 shows how the mean SMC of the sample plots changed throughout the winter wheat growing period. Because the winter wheat in the study area was rain fed, the dynamic change of soil moisture was not affected by irrigation. It can be seen from Fig. 9 that the mean measured SMC reached the maximum on May 9. This is because that it was raining in the study area during the initial hours of the field sampling on this date; therefore, the mean SMC of the sampling points on this date was high. From May 16 to June 2, the mean SMC was approximately equal, which may be due to a dynamic balance between the supply of soil moisture by precipitation, the absorption of water by vegetation, and the water content consumed by soil evapotranspiration. From June 2 to June 16, the mean SMC decreased first and then increased, which was likely due to a large difference in precipitation before Fig. 10. Soil moisture map for the study area on different dates based on the RF model. and after June 9. Compared with other dates, the SMC of July 10 was the lowest. The high temperature in the month of July had led to strong soil evapotranspiration plus there was no appreciable precipitation in the study area several days before July 10 soil sampling date. By comparing the mean measured SMC with the estimated mean SMC of different models, it can be seen that all three models achieved a similar performance of mean SMC on all sampling dates except the last day. Overall, all three models were able to track the dynamic change of soil moisture well.

D. Soil Moisture Map
The three machine-learning methods used for estimating the dynamic change of average SMC yield similar results, as depicted in Fig. 9. Based on the scatterplots of the training set and the validation set when the three machine-learning methods achieved the best estimation accuracy (see Figs. 7 and 8), RF model yielded the highest estimation accuracy. It reveals that the RF model combined with RF-based feature-selection method is the best soil moisture estimation model. Then, the best SMC estimation model was applied to all pixels in the study area to obtain a spatial distribution of SMC on different dates (see Fig. 10). Compared with the measured mean SMC, as shown in Fig. 9, the SMC map in Fig. 10 is in good agreement with the measured values. For instance, the average SMC on June 16 was significantly higher than that of the two adjacent sampling dates, June 9 and July 10, as shown in Fig. 9. It is easy to draw the same conclusion from Fig. 10.

E. Performance of SMC Estimation Under Different Coverages of the Winter Wheat Plants
In this study, NDVI was used as a surrogate for winter wheat biomass. Based on the six Sentinel-2 images of the study area, the mean NDVI of the sampling sites on the respective sampling dates was calculated (see Fig. 11). Fig. 12 illustrates the RMSE between the measured and estimated SMC on each of the six dates using different machine-learning methods.
As can be seen from Fig. 11, the NDVI shows a trend of rapid initial increase followed by a slower increase and a rapid decrease during the late growth stage (the last date of sampling).  The NDVI trend was consistent with the growth process of the winter wheat. During the month of May, winter wheat was experiencing fast vegetative growth, and ground coverage of the winter wheat plants also increased rapidly, corresponding to rapid increase in NDVI. From the end of May to mid-June, the winter wheat continued to grow and reached peak growth with highest green cover and biomass. In July, the winter wheat plants were nearly mature and started to yellow, causing weakened absorption of the red band, exhibiting a rapid downward trend of the NDVI. It can be seen from Fig. 12 that the RMSE of the test set on five sampling dates from May 16 to June 16 was low, less than 5.2 (vol.%). On the last sampling date, however, the RMSE between the measured and estimated values of different models was larger.

V. DISCUSSION
There are many cases using machine-learning models to estimate SMC in the agricultural region. However, the combination of polarization decomposition parameters and machine-learning methods has received little attention [14]. In addition, the comparison and application of feature-selection methods to increase the accuracy of SMC estimation are rarely mentioned. In this study, SVR-M, RFR-M, and GBRT-M were constructed and optimized based on the multitemporal RADARSAT-2 data, GS-CV parameter optimization algorithms, and feature-selection methods. Based on the above SMC estimation results, the following outcomes can be observed.

A. Influence of Different Feature-Selection Methods on Model Estimation Performance
By comparing the models constructed by different featureselection methods, it could be found that the method based on R could hardly improve the performance of the model. Because when almost all the features were used, the estimation accuracy of the model on the validation set achieved the highest. This may be because there is no good linear relationship between the measured SMC and the extracted feature parameters, as shown in Fig. 3. It could be noted that when using the feature-selection method based on SVM-RFE and RF, different models could achieve a better estimation effect using less features. Therefore, these two methods can make full use of the polarization information and reduce the redundancy of features' parameters. The features selected by RF based and SVM-RFE were different, as given in Fig. 4 and Table V. This is because SVM-RFE and RF feature selection are based on different model building principles [43], [46]. However, feature selection based on RF and SVM-RFE is consistent in the ranking of some features. For example, HH, VV, and vanZyl_vol have higher ranking under the RF-based and SVM-RFE feature-selection methods (ranking 1, 2, and 3, and 1, 2, and 4 respectively), T 33 and Pauli_g have lower ranking (ranking 28 and 29, and 27 and 29, respectively). It indicates that some of the features are of high importance to the soil moisture estimation, while some of them are not.
The proposed models could obtain the best estimation performance on the validation set by using RF feature selection. This was shown that the feature selected by RF to construct the model not only performed better on RF-M but also performed well on the other two machine-learning models. It suggested that RF-based feature-selection method is the most effective to improve the accuracy of SMC estimation. However, from Fig. 5, it could also be found that when all the 30 feature parameters are used, different machine-learning models could achieve a satisfactory fitting performance on the validation set. Therefore, it could be concluded that all features can be used to build the model when the slight accuracy loss of the model on the validation set is ignored.

B. Performance of SVR-M, RFR-M, and GBRT-M
Based on the above results, different regression models can achieve the best estimation performance on the validation set based on the RF feature-selection method. Therefore, the performances of three machine-learning models using RF-based feature selection were compared. Among the proposed models, the RFR-M achieved the best estimation result for both the training set and validation set. This was may be RF-based feature selection is essentially calculated by using all features to construct RFR model. Therefore, the features selected by RF have better adaptability in the RF model than the other two models. In the remaining two models, the estimated effect of GBRT-M is better than that of SVR-M, although GBRT-M performed worse than SVR in training set, but it performed better in validation set. It is suggested that GBRT-M had better generalization ability than SVR-M.
Regarding the SVR-M, the performance of this model on the validation set achieved the worst results, which mainly manifested the lowest R 2 and the highest RMSE on the validation set. However, some studies have shown that SVR is effective and robust when the sample size is small [49], [50], and some articles presented in [51]- [53] on the inversion of ecological parameters indicated that SVR was always outperformed other machine-learning methods. This is different from the results obtained in this study. The poor prediction on the validation set using SVR-M may be attributed to the following. First, the characteristics of polarization parameters are not suitable for SMC inversion using SVR. Second, when constructing the model, some important parameters (gamma and C) of SVR were not set properly. Therefore, when using SVR-M to estimate SMC, it is necessary to attempt to reset the model parameters or introduce new characteristic variables to improve the accuracy.
As can be seen from Fig. 8, all models have difficulties in estimating high SMC or low SMC. For extremely high soil moisture values, the prediction SMC was a little lower than the measured SMC, while for extremely low soil moisture values, the result was opposite. This result is consistent with the results obtained from the inversion of surface parameters using machine-learning method in [33], [34], and [48], and similar results were found in their studies. The frequency distribution of SMC at the sample plots is shown in Fig. 13. The SMC of most sample plots was between 20 and 35% (vol.), and the number of sample plots with extremely high or low values was relatively small, which led the poor fitting effect of the models for soil moisture values in this part of the range. Fig. 9 illustrates that the measured mean SMC on July 10 was 9.0 (vol. %), which was not within the range of effective estimation of the models. Therefore, the SMC on the last day was obviously overestimated by different models, as shown in Fig. 9.

C. Impact of Winter Wheat Cover on Soil Moisture Estimation
Over vegetated areas, such as agricultural fields, the presence of vegetation can induce complex volume scattering, which in turn will lead to the reduced sensitivity of the radar signal to soil moisture. The traditional soil moisture estimation method based on the radiative transfer model (RTM) must consider the contribution of vegetation scattering. Therefore, in the inversion method of soil moisture based on RTM, vegetation cover does affect the estimation of soil moisture.
In this study, the performances of the proposed methods under different winter wheat coverages were evaluated, as shown in Figs. 11 and 12. It can be concluded that among the six dates on which NDVI was calculated using Sentinel-2, the SMC was well estimated except for the last sampling date in July. Varying degrees of vegetation cover can affect the received radar signal leading to changes of the polarization decomposition parameters. Since the construction of soil moisture estimation database in this study was based on the soil moisture collected on different wheat growth stages, the different coverage of wheat plants was also considered indirectly, and the estimation accuracy of SMC at five growth dates from May 16 to June 16 was high. As for the last sampling date, the reason for the large RMSE is due to the overestimation of SMC on the last day that has been analyzed in Section V-B.
In general, the SMC can be estimated satisfactorily under varying vegetation cover. Therefore, in the case of dense wheat cover, this method can also provide some support for the estimation of soil moisture in winter wheat fields.

D. Additional Contribution of Machine-Learning Method to Physical-Based Method for Soil Moisture Retrieval
Soil moisture retrieval methods based on the physical models need many parameters, and the application scope of physicalbased methods is limited, so the model parameters must be within the applicable scope. In the experimental results of this study, for different phases and different wheat covers, there was good estimation performance of soil moisture generally. Although machine-learning method is intrinsically a "blackbox" model, its contribution to physical model for soil moisture retrieval is worth discussing.
When there is a lack of prior knowledge of surface parameters, soil moisture can be estimated by machine-learning methods. In addition, soil moisture retrieval based on the physical models and their inversions are substantially ill-posed, which means that different combinations of input parameters of the physical model may get similar backscattering signals. If the machine-learning method is validated to be effective in SMC estimation of the study region, the results of the physical model can be further screened by referring to the results of the machine-learning method, and the soil moisture in line with the real situation of the study area can be obtained. Moreover, the machine-learning method has strong nonlinear fitting capability. If the sensitive parameters in the physical model are taken as the input parameters of the machine-learning method and the output parameters of the physical model are taken as the output parameters of the machine-learning method, the physical model can be simplified.

VI. CONCLUSION
In this article, the potential of machine learning combined with the polarization decomposition parameters in SMC estimation was evaluated. In addition, the SMC estimation results obtained by SVR, RF, and GBRT combined with three feature-selection methods (based on R, SVM-RFE, and RF) were investigated and compared. The following conclusions can be drawn according to the above results and analysis.
1) The polarization decomposition parameters combined with the backscattering coefficient have high-potential soil moisture estimation.
2) The feature-selection method based on the RF or SVM-RFE is effective, and the model can achieve good estimation with only a few features. The proposed machinelearning models achieved the best estimation performance by using the RF selected features. 3) Compared with the other two models, the RFR model achieves the best fitting accuracy both on the validation set and the training set. Therefore, it was selected for soil moisture mapping. Given the encouraging results from this study, there are still some limitations in this study, which should be addressed in future research. For example, although our field sampling covered a great part of the winter wheat growing season, the sampling area was only in one wheat field and the number of samples was small. The spatial coverage of the study area was small and only dealt with a single crop type. For other crop types, the effectiveness of this method for soil moisture estimation needs to be verified in future research. In addition, if soil moisture monitoring and mapping were carried out in a large agricultural area, the effectiveness of this method is worth discussing in the future. Furthermore, when the number of training samples is enough, more advanced estimation methods, such as the deep learning method, can be considered. Although there are limitations, we can conclude that combining the polarimetric decomposition parameters, the backscattering coefficient, and machine learning is effective for SMC modeling in the wheat growing area. To a certain extent, it can provide support for soil moisture monitoring in agricultural areas.