A Polarimetric Decomposition and Copula Quantile Regression Approach for Soil Moisture Estimation From Radarsat-2 Data Over Vegetated Areas

This article proposes a novel framework for probabilistic estimation of surface soil moisture (SSM) based on polarimetric decomposition and copula quantile regression, mainly focusing on solving the low correlation between synthetic aperture radar (SAR) backscattering coefficients and SSM in corn-covered areas. Cloude–Pottier decomposition and adaptive nonnegative eigenvalue decomposition can extract more polarization parameters, explaining the implicit information in polarization data from different theoretical levels. Polarization parameters and the backscattering coefficients for different polarizations constitute predictor variable parameters for estimating the SSM. The dimensionality of the predictor variable parameters is reduced by supervised principal component analysis to derive the first principal component. SPCA ensures a high correlation between the first principal component and the SSM. Finally, the Archimedes copula function simply and effectively constructs the nonlinear relationship between SSM and the first principal component to complete the quantile regression estimation of SSM. Results show that the root-mean-square error range of SSM estimation is 0.039–0.078 cm$^{3}$/cm$^{3}$ and the correlation coefficient (R) is 0.401–0.761. In addition, copula quantile regression constructs an uncertainty range for the SSM estimate, which can be used to judge the reliability of the estimate.

characteristics of high temporal heterogeneity and high spatial variability, which makes it difficult for in-situ measurement to meet the accuracy requirements of precise irrigation and yield estimation [8]. The development of remote sensing technology provides an effective method for SSM estimation with multiscale and high temporal resolution [9], [10], [11]. There are two types of remote sensing satellites: 1) optical remote sensing satellites and 2) microwave remote sensing satellites. Due to the wavelengths involved in optics, it is not possible to estimate SSM in the presence of vegetation directly. Optical remote sensing usually uses vegetation indexes and thermal infrared remote sensing to indirectly estimate soil moisture [12], [13], [14]. Previous studies have shown that passive microwave remote sensing can effectively estimate SSM with high temporal resolution [15]. However, due to the spatial resolution of 10-50 km, passive microwave remote sensing can only be used in climate studies on a global scale [16]. As an active microwave remote sensing with a high spatial resolution [17], synthetic aperture radar (SAR) has excellent potential in the application of SSM estimation at the catchment scale as well as field-scale [18].
In the last three decades, numerous studies have illustrated the potential of SAR data to estimate SSM. Several scholars have developed physical models based on radiative transfer equations, describing the relationship between SSM and SAR backscattering coefficients in different regions [19], [20], [21], [22], [23]. However, the theoretical models are complex forward model expressions, and the analytical equation from the backscattering coefficients to the SSM is difficult to be obtained. Several empirical models have been proposed [24], [25], [26]. The empirical models are established based on measured data and specific scenarios, which depend on specific scenes and are difficult to be applied to different regions. Some scholars have proposed semiempirical models combining theoretical and empirical models [27], [28]. These models have given positive results in bare surfaces or sparsely vegetation-covered areas. For the estimation of SSM in vegetation-covered areas, some vegetation scattering models have developed successively [29], [30], [31]. With the development of SAR and the increase of remote sensing data, some new methods gradually appear in the study of SSM estimation, such as the multiangle method [32], [33], multifrequency method [34], [35], machine learning and artificial neural network method [36], [37], change detection method [38], [39], probabilistic graphical estimation method [40], [41], multipolarization method [42], [43], [44], and so on. However, acquiring multiangle and multifrequency data simultaneously at the same location is challenging. Artificial neural networks (ANN) can be useful but require large amounts of data, which includes measured data or obtained data through theoretical models. Change detection methods usually assume that vegetation and surface roughness are invariant, which is not satisfied during crop cultivation, growing, and harvesting. Multipolarization data can separate volume, surface, and double-bounce scattering, which has obvious potential in SSM estimation. Probabilistic graphical methods statistically establish nonlinear dependencies between two or more variables when a definite relationship between the variables cannot be determined.
The sensitivity of different polarized electromagnetic waves to surface scattering targets differs, which gives polarimetric SAR the potential in SSM estimation [45], [46]. Polarimetric decomposition is the practical method for extracting information from quad-polarimetric data [47], [48]. In recent years, numerous scholars have researched the estimation of SSM using polarimetric data or different polarimetric decomposition methods. For example, in 2017, Jagdhuber et al. [49] used an iterative, generalized hybrid decomposition to estimate SSM from L-band airborne SAR data with high accuracy and a root-mean-square-error (RMSE) from 0.04 to 0.044 cm 3 /cm 3 . In 2016, He et al. [42] used the polarimetric decomposition method to estimate SSM on data from the third soil moisture active passive experiment (SMAPEx-3), with a RMSE from 0.1 to 0.14 cm 3 /cm 3 . In 2019, Shi et al. [43] used the Freeman two-component polarimetric decomposition method to estimate SSM from the data of Gaofen-3 with a RMSE of 0.087 cm 3 /cm 3 . These methods of estimating SSM mainly use polarization decomposition to obtain surface scattering or double-bounce scattering parameters and then use theoretical models to estimate SSM from these parameters. In contrast to other model-based polarimetric decomposition methods [45], [50], [51], [52], the method used by He et al. [42] considers a general volume scattering model independent of reflection symmetry, which can describe vegetation scattering more accurately.
Polarization decomposition extracts the polarimetric parameters in quad-polarimetric or dual-polarimetric SAR data, and SSM estimation is also required to establish the relationship between polarimetric parameters and SSM. Probabilistic graphical models provide another intuitive and efficient approach to the dependence of SSM and SAR polarimetric parameters [53]. Compared to commonly used deterministic methods, probabilistic graphical models can better capture the random characteristics between different measurements and provide an uncertainty range [54]. In a probabilistic graphical approach, the copula function can establish dependencies between variables with joint probability distributions with different marginal distributions. Some scholars have used copula to estimate SSM and received pleasing results [40], [41]. In 2017, Pal et al. [41] estimated SSM from quad-polarimetric RISAT1 data using Archimedean copula function, with an RMSE in 0.076-0.115 cm 3 /cm 3 . Nguyen et al. [40] used the D-vine copula quantile regression to estimate the SSM from the dual-polarimetric Sentinel-1 data in 2021, and the range of the RMSE is 0.028-0.092 cm 3 /cm 3 . Polarimetric SAR data combined with probabilistic graphical methods are advantageous in SSM estimation of vegetationcovered areas. However, few scholars have assessed the impact of polarimetric decomposition parameters on SSM estimation in probabilistic graphical models.
This article probabilistically estimates the SSM of the corncovered area from Radarsat-2 quad-polarimetric data using a polarimetric decomposition and copula quantile regression method. The eigenvalue-based polarimetric decomposition [47] and the adaptive model-based polarimetric decomposition [42] are predominantly employed. The supervised principal component analysis (SPCA) [55], which can ensure the high correlation between the principal components and dependent variables, reduces the dimensionality of decomposition parameters and backscattering coefficients to derive the first principal component. Subsequently, the Archimedean copula function establishes the dependence between SSM and the first principal component. SSM is estimated using the dependency relationship developed by the Archimedean copula function. The SSM estimated at the 50th quantile is considered the SSM estimation. The 20th and 80th quantile values are chosen for the lower and upper bounds of the SSM uncertainty range so that the actual SSM falls within the uncertainty range with 60% probability. The innovation of this article is that polarimetric decomposition with different principles extracts as much information as possible from the polarimetric data. The SPCA further filters out the useless polarimetric information and improves the correlation of SAR-related parameters with SSM. Copula quantile regression can predict the SSM at different quantile points, which can be applied in drought and flood monitoring.
The rest of this article is organized as follows. The study area, Radarsat-2 data, and in situ measurements are detailed in Section II. Section III presents the details of the polarimetric decomposition algorithm, SPCA, and copula quantile regression algorithm. The estimation results and discussions are reported in Section IV. Finally, Section V concludes this article.

A. Study Area
For this study, the experimental data are from the Heihe watershed allied telemetry experimental research (HiWATER), a multiscale hydrological observation experiment in the Heihe river basin (HRB) from 2012 to 2015 [56], [57]. The HRB, located between 97.1-102.0°E and 37.7-42.7°N, is a typical inland river basin in the arid area of northwest China and belongs to the alpine and drought region. The HiWATER arranged sensor networks at different scales in the upstream, midstream, and downstream of the HRB, respectively. A dense soil moisture sensor network has been formed in the artificial oasis area in the midstream. The midstream artificial oasis area is selected as the study area for this experiment, which is the most extensive corn seed production base in China. The local average annual rainfall and temperature are 198 mm and 7°C, respectively, belonging to a typical temperate arid climate. The irrigation infrastructure is complete, mainly river irrigation and well irrigation. The experiment region covers a 5.5 × 5.5 km area in HRB, including villages, roads, orchards, and corn paddy fields, which can represent the crop structure and farming practices in the midstream irrigation area. The geographical location of the HRB is shown in Fig. 1. Due to the complex topography and climate of the HRB, DEM information and Köppen climate classification [58] are also given in Fig. 1. The experimental area is located within the red rectangle in Fig. 1, which belongs to the temperate arid climate with an average elevation of 1500 m. The land-use type of the experimental area is, as shown in Fig. 1. Additional information about the test area, such as soil texture, field capacity, and other information, can be found in literature [59] and [60].

B. In Situ Measurements
The HiWATER measured vegetation height, soil texture, and field capacity in the midstream study area [61], [62]. The experiment arranged three sensor networks, BNUNET, WATERNET, and SoilNET, which output SSM and surface temperature measurements every 10 min [60], [61], [62]. The measurement depth of SSM is 4 cm below the ground surface. Each sensor network contains several soil moisture sensors, with specific locations displayed as solid black dots in Fig. 1. Two types of soil moisture sensors, SPADE and Hydra Probe II, are available in the three networks. The sensors have been validated using a two-point calibration method, one of which is a measurement in the desert sand after air seasoning; the other point is an observation of saturated soil in local agricultural fields. Instrumentation errors of the two soil moisture sensors are 0.032 and 0.011 cm 3 /cm 3 , respectively [59]. The previous study selected SSM and Terra-SAR data in May and June as the study objects [60]. In July, the vegetation height exceeded 1 m, so the correlation between the backscattering coefficients of X-band and SSM is extremely low. This study uses Radarsat-2   Table I. The minimum value of 0.22 cm 3 /cm 3 and the maximum value of 0.54 cm 3 /cm 3 for the measured SSM indicate a wide range of SSM variability. Since only one view of Radarsat-2 data was acquired in the whole HiWATER, this experiment only targets the area with SSM greater than 0.22 cm 3 /cm 3 . The HiWATER also measured vegetation canopy height data [63]. The vegetation height data for dates close to the Radarsat-2 was presented, as shown in Table II. From Table II, it can be concluded that the average height of corn on July 6 was between 135.5 and 149.0 cm.

C. Remote Sensing Data
The SAR data considered in this study is a Radarsat-2 stripmap quad-polarimetric SAR image (HH, HV, VH, VV) acquired on July 6, 2012, at the artificial oasis in the midstream of the HRB. The acquisition mode of the image is fine quad polarization, and the beam is Q8. With operation in the C-band, the Radarsat-2 SAR sensor has a wavelength of 5.5 cm, with minimum and maximum incidence angles of 22.6°and 28.4°, respectively, in descending orbit. The resolution of the singlelook complex (SLC) image is about 4.73 m in the range direction and 4.77 m in the azimuth direction. The incident angle range of the overlapping part of the SAR image and the experimental area is 27.1°to 27.5°, so the influence of the incident angle on SSM estimation can be ignored. The data, including field measurements and Radarsat-2 data, can be found on the digital black river website (http://heihe.tpdc.ac.cn/zh-hans/).
During the Radarsat-2 image acquisition, the most experimental area was covered with corn, with a canopy higher than 1 m. Therefore, the sensitivity between the backscattering coefficient and SSM is analyzed before estimating SSM to ensure the correlation between SSM and the backscattering coefficient. The sensitivity analysis between the measured SSM and the backscattering coefficients of the four polarimetric modes of Radarsat-2 is shown in Fig. 2.
From Fig. 2, the correlation between the backscattering coefficients of the four polarizations and SSM is relatively low. There is no correlation between SSM and backscattering coefficients for HV and VH polarization. It is impossible to use HV and VH alone to estimate the SSM in corn-covered areas. The backscattering coefficients of both HH and VV polarization showed a weak correlation with SSM. According to the above analysis, it is difficult to achieve the accuracy of SSM estimation for precision irrigation by directly estimating SSM using a single backscattering coefficient. Therefore, extracting meaningful information from the polarimetric data is necessary to increase the correlation.
The preprocessing of Radarsat-2 images is accomplished by the Sentinel application platform toolbox (SNAP) and polarimetric SAR data processing and education toolbox (PolSARpro) provided by the European Space Agency (ESA). The preprocessing steps for extracting the backscattering coefficient include multilook, 7×7 refined Lee filtering to reduce the speckle noise of the images [42], and radiometric calibration to obtain the backscattering coefficients of the SAR image. Finally, the SAR images are geometrically calibrated and geocoded using a digital elevation model (DEM).

III. METHODS
The overall workflow of the proposed SSM estimation algorithm is shown in Fig. 3. The complete algorithm has several steps: decomposition of quad-polarimetric images, data normalization processing, dimensionality reduction with SPCA, construction of the joint distribution with Archimedean copula functions, acquisition of conditional distribution, estimation of SSM, and algorithm evaluation.
1) Cloude-Pottier Decomposition: The Cloude-Pottier decomposition is an eigenvalue decomposition of the coherence matrix (T) of the quad-polarimetric SAR data to produce three eigenvalues in descending order of magnitude. Subsequently, the eigenvalues are combined and calculated as three polarimetric parameters, which are alpha angle (α), entropy (H), and anisotropy (A). The expressions of the three parameters are (1), (2), and (3), respectively [47] where 3 are the eigenvalues of T, 3 k=1 P k = 1, and cosα i is the magnitude of the first component of the ith eigenvector e i of T.
The significant advantage of the Cloude-Pottier decomposition is that it does not depend on a specific scattering mechanism, and the eigenvalues do not transform with the antenna coordinate system. PolSARpro software is used in this experiment to complete the Cloude-Pottier decomposition of quad-polarimetric SAR data, which mainly consists of the following steps: Extraction of scattering matrix from polarimetric data, converting the polarimetric data into a coherent matrix, performing a refined Lee filter of 7 × 7 on the coherent matrix, geocoding the filtered coherent matrix, finally decomposing the coherent matrix to get the required three parameters.
2) Adaptive Nonnegative Eigenvalue Decomposition: The adaptive nonnegative eigenvalue decomposition is specifically designed to overcome the powers of surface and double-bounce scattering components in model-based decomposition may become negative due to overestimation of the volume scattering power [42], [64]. By adaptively solving for the vegetation scattering component, adaptive nonnegative eigenvalue decomposition decomposes T into four components, which are volume scattering (T v ), surface scattering (T s ), double-bounce scattering (T d ), and the residual components (T r ) after removing these components [42], [64]. Traditional model-based polarization decompositions are established under the assumption of scattering symmetry. However, Li et al. found that the scattering symmetry assumption is not necessarily satisfied even on natural surfaces [65]. Therefore, the adaptive nonnegative eigenvalue decomposition uses a general volume scattering model that does not need to satisfy the scattering symmetry assumptions (4) For volume scattering, Arii et al. [66] proposed a general model which does not require the assumption of scattering reflection symmetry, as shown in (5). This generalized model approximates the vegetation canopy as a thin cylindrical cloud determined by the average orientation angle θ 0 and the degree of randomness σ around the average orientation angle. The parameter θ 0 is from 0 to π, while the parameter σ is from 0 when the thin cylinder is delta distributed to 0.91 when the thin cylinder is randomly uniformly distributed [42] T where T α , T β (θ 0 ), and T γ (θ 0 ) are given in (6)- (8). p(σ) and q(σ) are sixth-order polynomials of σ, as shown in (9) and (10) [67], p(σ) = 2.0806σ 6 − 6.3350σ 5 + 6.4864σ 4 − 0.4431σ 3 − 3.9638σ 2 − 0.0008σ + 2.0 (9) q(σ) = 9.0166σ 6 − 18.7790σ 5 + 4.9590σ 4 − 14.5629σ 3 − 10.8034σ 2 + 0.1902σ + 1.0.
From (4), the remaining scattering (T rem ) after subtracting the volume scattering from the total coherence matrix includes surface scattering, double-bounce scattering, and residual scattering, which can be written as In volume scattering, it is necessary to determine three parameters (f v , θ 0 , σ), two of which are already in the determined range (θ 0 , σ). An adaptive algorithm can be employed to determine these parameters. The θ 0 and σ are gradually increased from their minimum values, and for each pair of θ 0 and σ, finding the f v that maximizes the T rem when all eigenvalues of T rem are positive. Subsequently, a set of parameters that minimizes the T r among the combination of parameters obtained in the previous step is selected as the parameters of volume scattering.
After removing the volume scattering component, the adaptive nonnegative eigenvalue decomposition assumes that the other components satisfy the scattering symmetry [42], [64]. According to the relationship between eigenvalue decomposition and model decomposition proposed by Van et al. [64], the expressions of the remaining three terms can be written as [42] where α and φ represent the scattering mechanism angle and scattering phase angle, respectively, while the subscripts s and d represent surface scattering and double-bounce scattering. f r denotes the intensity of residual backscattering. These parameters can be acquired by eigenvalue decomposition of the T rem matrix, as described by Van

B. Supervised Principal Component Analysis
All parameters and SSM are first normalized to minimize the effect of different parameter magnitudes on SPCA, and the normalization is done as where X i and X n i represent the polarization decomposition parameters, backscattering coefficients, and SSM before and after normalization is used, respectively. μ(X i ) is the average of X i , and σ(X i ) is the standard deviation of X i .
Principal component analysis (PCA), which projects data into a low-dimensional space while maximizing the preservation of inherent features contained in the data, has been used in soil moisture estimation [68]. As an improved PCA, SPCA is principally suitable for regression problems in which there are many predictor variables, but just a small number of them are correlated with the dependent variable [55]. The p-vector of standardized regression coefficient (s) between each predictor variable X = [X 1 , X 2 , . . . , X p ] normalized to the center of 0 and the normalized dependent variable y is first calculated, as [55] With ||X j || = X T j X j . X t represents the combination of X j that satisfies s j ≥ t as the input data for the PCA. The t can be obtained using cross-validation. As shown in (17), a singular value decomposition is done for X t to get the matrix U t = [µ t,1 , µ t,2 , . . . , µ t,m ]. µ t,1 is called the first principal component of X t In this article, the predictor variables are the parameters of two polarimetric decompositions and four different polarimetric backscattering coefficients, and the dependent variable is SSM. SPCA directly removes predictor variables that have a very low correlation with SSM and combines the remaining predictor variables into a new variable. This new variable is involved in subsequent SSM estimates.

C. Copula Quantile Regression
Copulas are functions that model the marginal distributions of arbitrary two or more variables using a few parameters combined to a fully dependent structure to produce a joint probability distribution between random variables [69]. In order to construct the dependence between two random variables, two main steps need to be completed: 1) estimating the marginal distribution of a single random variable and 2) selecting the optimal copula function. Estimating the probability distribution of the population from the sample mainly includes the parametric method and the nonparametric method. Because the limited parameters in the parameter method cannot accurately capture the characteristics of the data, this experiment selects the nonparametric estimation method, namely, the kernel density estimation (KDE) method. For the sample X = (x 1 , x 2 , . . . , x n ) in random variables, the K(•) is the kernel function; the Gaussian kernel function is chosen for this experiment. For the Gaussian kernel function, the optimal window widthĥ is shown aŝ σ can be replaced by the standard deviation of the sample. The probability distributions of the first principal component of SPCA and SSM estimated using KDE are denoted as u spca = G spca (x spca ) and v mv = F mv (mv), respectively. The dependence between the first principal component of the SPCA and SSM is modeled using the Archimedean copula functions, which capture the dependence between hydrological variables flexibly [70]. The expressions, generator functions, and parameter ranges of the three commonly used Archimedean copula functions are shown in Table III [40], [41].
The parameter θ can be calculated by fitting each Archimedean copula using the canonical maximum pseudolikelihood method [71]. Subsequently, the optimal copula function is selected using the magnitude of the distance between the theoretical copula and the empirical copula [40], [41]. For the data of this experiment, the Gumbel copula function is selected because of the best fit. In order to find the quantile of SSM from the joint distribution, the marginal distribution of SSM (mv) conditioned on the first principal component of the SPCA (x spca ) needs to be derived first, with the expression shown as follows [40], [41]: At this point, the qth quantile of SSM (mv q ) can be estimated using the inverse function of the probability distribution of SSM and the conditional distribution of SSM on the first principal component of the SPCA, respectively, as shown in (21) [41]. The 50th quantile is calculated as an estimation of SSM. The 20th and 80th quantiles are calculated as the lower and upper bounds of the uncertainty range of the SSM, respectively.

D. Assessment Method
The evaluation of the estimation results of the SSM is performed in two cases, one is to evaluate the error between the 50th percentile estimation value and the observation value, and the other is to discuss the upper and lower bounds of the uncertainty range and the observation value of SSM. For the 50th quantile SSM estimation, the mean relative error (MRE) and RMSE are selected to evaluate the observed and estimated values. Linear and nonlinear correlations between estimations and observations of SSM are assessed using Pearson's correlation coefficient R and Kendall's correlation coefficient τ , respectively. The relative average deviation amplitude (RDA) is applied to evaluate the uncertainty range, which primarily uses the actual difference between the quantified observation value and the intermediate value of the prediction bound to assess the estimation [40]. The expressions of these evaluation parameters are shown as follows: (26) where N is the number of samples, mv o is the observed SSM, mv p is the estimated SSM, q u is the upper bound of an uncertain estimate, q l is the lower bound of an uncertain estimate, subscripts i and j are the order of samples, and i = j, P [•] denotes the probability that the two variables are harmonious and discordant. In addition, the experiment uses the Kling-Gupta efficiency (KGE) to evaluate the efficiency, with the expression as in (27).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

A. Polarization Decomposition
The most crucial difficulty of SSM estimation in vegetationcovered regions is to separate the contributions of vegetation and surface scattering to the backscattering coefficients. From Fig. 2, the correlation between the different polarimetric backscattering coefficients of Radarsat-2 and SSM is relatively weak due to the influence of vegetation. Polarimetric decomposition can extract more information implicit in SAR polarimetric data, thereby improving the ability of SAR data to estimate SSM. Different polarimetric decomposition methods explain the polarimetric data from different perspectives. The three parameters H, α, and A of the Cloude-Pottier decomposition primarily describe the scattering mechanism of targets. The physical mechanism between these three parameters and SSM and surface roughness were explained in the article by Hajnsek et al. 2003 [45]. α is independent of surface roughness and correlated with SSM and local incidence angle, A is highly correlated with surface roughness, and H is correlated with both. In contrast, the adaptive nonnegative eigenvalue decomposition mainly describes the scattering power of different scattering mechanisms, such as surface scattering (Odd), volume scattering (Vol), and double bounce scattering (Dbl). The linear correlation coefficients (R) and nonlinear correlation coefficients (τ ) between the decomposition parameters (H, α, A, Odd, Vol, and Dbl), backscattering coefficients (HH, HV, VH, and VV), and SSM are shown in Figs. 4 and 5.
From Figs. 4 and 5, the correlations between all parameters and SSM are sufficiently low. The correlation between SSM and Vol is higher than that between SSM and Odd, indicating that the vegetation in this study is not transparent at the C-band microwave. Typically, SSM is mainly related to soil surface scattering Odd, and Vol is mainly related to vegetation scattering. The area of this experiment is densely vegetated, and the surface scattering component is minimal relative to the vegetation scattering, so Odd is not sensitive to the change of SSM. Meanwhile, the variation of Odd saturates with the increase of SSM in the study area. This situation has also appeared in the study by Bai et al. [72]. Ma et al. [73] suggested a correlation between SSM and vegetation water content in this study area. The quantity of water inside the vegetation depends on the humidity of the soil. The less soil moisture, the more vegetation hydric stress would be present. Therefore, there is a certain correlation between Vol and SSM. Vol is also correlated with many parameters in addition to vegetation water content, including density, structure, and species of vegetation, which makes the correlation of Vol with SSM only 0.319. In addition, the correlation between α and SSM is consistent with the physical interpretation of α by Hajnsek et al. 2003. There are significant correlations between some different polarimetric decomposition parameters. For example, the R between H and α is 0.894, and the τ is 0.739, indicating that the information contained in H and α are highly consistent [74]. Based on the above analysis, it is necessary to find an algorithm to improve the correlation between parameters and SSM further while reducing the number of predictor variables; therefore, an SPCA algorithm satisfies both objectives.

B. Comparisons of SPCA Results
To illustrate that the SPCA produces the first principal component with a higher correlation with SSM than PCA, the linear and nonlinear correlation coefficients between SSM and the first principal components derived from the two algorithms are analyzed. Simultaneously, the correlation between SSM and the first principal component of different combinations of polarimetric decomposition parameters and backscattering coefficients is analyzed to compare the effects of polarimetric decomposition. The combinations of different polarimetric parameters and backscattering coefficients mainly include four polarimetric backscattering coefficients (σ pq ), backscattering coefficients with Cloude-Pottier decomposition parameters (σ pq − C), backscattering coefficients with adaptive nonnegative eigenvalue decomposition parameters (σ pq − A), and backscattering coefficients with all parameters of the two decompositions (σ pq − AC). The parameter combinations are shown in Table IV. After processing σ pq , σ pq − C, σ pq − A, and σ pq − AC, the R, and τ between SSM and the first principal components of SPCA and PCA are shown in Tables V and VI, respectively.  As summarized from the tables, the correlation between the first principal component of SPCA and SSM is higher than that of PCA for the four-parameter combinations. This phenomenon occurs because SPCA eliminates HV, VH, and other parameters that are weakly or uncorrelated with SSM parameters, while PCA can only reduce dimensionality and cannot guarantee the correlation with the dependent variable. The comparison of σ pq , σ pq − C, σ pq − A, and σ pq − AC demonstrates that the utilization of polarization decomposition improves the correlation between SSM and the first principal component. With the data used in this experiment, σ pq − C performs slightly more effectively than σ pq − A. From Figs. 4 and 5, it is apparent that the correlations between the parameters obtained from the Cloude-Pottier polarization decomposition and adaptive nonnegative eigenvalue polarization decomposition are not significant, which implicates that the parameters obtained from the two polarization decompositions contain different information. More different information can interpret the data from several aspects, thereby eliminating the influence of vegetation on SSM estimation to a greater extent. Therefore, combining the two polarimetric decomposition methods improves the correlation between the first principal component and SSM.

C. Estimation of SSM at In-Situ Measurement Points
The applicability of the proposed algorithm for SSM estimation is demonstrated in both out-of-sample and in-sample situations. σ pq , σ pq − C, σ pq − A, and σ pq − AC are employed to articulate the effectiveness of the proposed method. In the out-of-sample, the SSM data and the first principal component calculated by SPCA are divided into a training dataset and a validation dataset, with a sample size ratio of 4:1. First, the order of the data is shuffled, and then the data are randomly divided into five disjoint parts with similar numbers. Each part takes turns as the validation dataset, and the remaining four parts are the training dataset. The training dataset is applied to create the copula model, and the validation dataset is employed to evaluate  Table VII. From Table VII, it can be concluded that the average MRE and RMSE between the SSM observed by the sensor and the SSM estimated by using σ pq − AC is relatively lower than that using σ pq , σ pq − A, and σ pq − C. The RMSE using σ pq − A and σ pq − C are approximately equal, and both are lower than using σ pq , which implies that using polarimetric decomposition reduces the RMSE estimated by the SSM to a certain extent. From the perspective of correlation, the average R of σ pq − A, σ pq − C and σ pq − AC is 0.158, 0.182, and 0.333 higher than σ pq , respectively. The average nonlinear correlation τ is 0.141, 0.104, and 0.222 higher. RDA is not significantly different in four cases. In addition to this, after using the parameters of polarimetric decomposition, the KGE also improves. From the maximum value of the MRE and RMSE parameters and the minimum value of the R and τ parameters, the SSM estimation method combining two polarimetric decompositions and copula quantile regression is stable and robust. In conclusion, combining two polarimetric decomposition parameters and different polarimetric backscattering coefficients has significant potential in estimating SSM. The scatter and uncertainty range of SSM estimation for the randomly selected group of one validation dataset are presented in Fig. 6 to visualize the results of SSM estimation. Fig. 6(a), (c), (e), and (g) describe the scatter plots between the estimated SSM and the observed SSM in the case of σ pq − AC, σ pq − C, σ pq − A, and σ pq . In addition to estimating the value of SSM, Copula quantile regression can provide the uncertainty range of SSM estimation, which are given in  Due to the limited amount of data and the small amount of verification data for out-of-sample experiments, it is difficult to analyze the error distribution of the estimation method, so the experiment also uses in-sample verification. In-sample means that the measurements from all SSM sites and the first principal component calculated with the SPCA are used as the training dataset for the copula model, while the same dataset is applied as the validation dataset to evaluate the validity of the model [40]. The evaluation parameters obtained by estimation of SSM insample are given in Table VIII. It can be concluded from Table VIII that the application of polarimetric decomposition in SSM estimation significantly improves the linear and nonlinear correlation between the estimated value and the measured value. The comparison of other evaluation parameters is also consistent with the out-of-sample. Fig. 7 shows the scatter plot and error histogram of the estimated and observed SSM values when σ pq − AC is used. Fig. 7(a) shows the scatter of SSM observations and estimations points on the 1:1 line. When the actual SSM is high, the estimation is lower than the observation. When the observation SSM value is low, the estimation is higher than the observation, consistent with Pal et al. [41]. The histogram of error distribution Fig. 7(b) suggests that the estimation error is between -0.15 and +0.1 cm 3 /cm 3 , which indicates the robustness of the SSM estimation algorithm proposed in this article. The combined analysis of Fig. 7 suggests that the significant errors primarily occur when the SSM is greater than 0.4 cm 3 /cm 3 , caused by the reduced sensitivity between backscattering coefficients and SSM when the SSM is relatively high. Fig. 8 compares the observed and estimated SSM values with the uncertainty range when σ pq − AC is used. It can be concluded from Fig. 8 that  66.4% of the observations of SSM are scattered within the uncertainty range, and 33.6% of the points are outside the uncertainty range.
Ma et al. [73] also studied the SSM in this experimental area. They selected the TerraSAR data on June 26 and July 7 to estimate the SSM of these two days. The estimated RMSE ranges from 0.047 cm 3 /cm 3 to 0.092 cm 3 /cm 3 with different water cloud model parameters. The SSM range estimated in this experiment is slightly better than that of Ma et al. Meanwhile, the method proposed in this experiment does not require the knowledge of a priori information, such as vegetation and surface roughness.

V. CONCLUSION
This study proposes a novel SSM estimation method combining polarimetric decomposition and copula quantile regression to overcome the challenge of low correlation between SAR backscattering coefficients and SSM in vegetation-covered areas. The algorithm first extracts more information from the quad-polarimetric SAR data by employing Cloude-Pottier decomposition and adaptive nonnegative eigenvalue decomposition. Subsequently, the utilization of SPCA reduces the predictor variables from ten parameters (backscattering coefficients and polarimetric decomposition parameters in Table IV) to the first principal component. The use of SPCA improved R and τ between the first principal component of σ pq − AC and SSM from 0.315 and 0.170 to 0.443 and 0.252, respectively. Finally, the dependency between the first principal component and the SSM is established using the Archimedes copula function. The experimental results show that the method proposed in this article can effectively estimate the SSM in the vegetationcovered area. In addition, the quantile regression based on the Archimedean copula function can give the uncertainty range of SSM estimation, which can be used to judge the reliability of the SSM estimate. Furthermore, beyond that, this article also has limitations. As shown in Fig. 7, there is a phenomenon that high SSM is underestimated. This phenomenon has appeared in many previous studies [41], [42], [50]. The main reason is that when the SSM increases, the sensitivity between the backscattering coefficients and the SSM will decrease. It is one of the unresolved difficulties of this experiment. The subsequent study will improve the high SSM underestimation by combining soil texture information, precipitation, and irrigation. In addition, the HiWATER only collected a Radarsat-2 quad-polarimetric image, so the SSM range of the experiment was small. Also, the dominant vegetation in the study area is corn, and it was impossible to verify the applicability of the proposed method in different vegetation-covered areas. Therefore, subsequent experiments will collect large datasets of areas covered by different vegetation and investigate SSM estimation methods in areas where SSM varies widely.