Comprehensive Evaluation of Multisource Soil Moisture Products in a Managed Agricultural Region: An Integrated Hydrologic Modeling Approach

Soil moisture (SM) is important in understanding Earth's hydrologic cycles. However, an overall performance of multisource SM products is still unclear due to a lack of comprehensive validation. Using SM simulated by hydrologic models as a reference to perform validation is a promising alternative since SM simulation is not restricted by its coverage or scale. In this study, an integrated hydrologic model (ParFlow-CLM) forced by hydrometeorological and agricultural water use reanalysis data is built in the middle reaches of the Heihe River Basin (HRB), a typical managed agricultural region in Northwestern China. Using the ParFlow-CLM simulated SM data as the validating reference, ten SM products, including four single-source RS SM, three merged SM, and three assimilated SM products are systematically assessed by a comprehensive evaluation framework composed of fifteen statistical performance indicators. For validation, the results show that ParFlow-CLM SM simulations agree with the time domain reflectometry probe and cosmic ray neutron probes observations. The merged SM and assimilated SM outperform the single-source RS SM products if compared with ParFlow-CLM simulated data. Results from the intercomparison demonstrate that hydraulic conductivity and leaf area index are the dominant factors in SM spatial variations based on the generalized additive model. The statistical linkage indicates that mean absolute deviation, uncertainty at 95% (U95), Nash–Sutcliffe's efficiency, and combined performance index serve as substitutes for quantifying the relative uncertainty of SM. This study paves a way for model-data intercomparison of SM products in the HRB as well as in other arid and semiarid basins.


I. INTRODUCTION
S OIL moisture (SM) is a key element linking the atmosphere and terrestrial ecosystems [1]. Comprehensive validation and intercomparison of SM products could help improve many hydrometeorological applications, such as precipitation forecasting [2], flood and drought monitoring [3], and climate change studies [4]. Currently, SM estimates are primarily obtained by in situ measurements, satellite retrievals, data assimilations, and hydrologic model simulations.
In recent decades, microwave remote sensing (RS), especially passive microwave approaches, has become an effective technology for SM global mapping owing to its high sensitivity to surface SM [5], [6], [7]. The passive microwave SM products include advanced microwave scanning radiometer (AMSR-E) on the Earth observing system (EOS) Aqua satellite [8], subsequent advanced microwave scanning radiometer 2 (AMSR2) onboard the global change observation mission (GCOM-W1) spacecraft [9], and microwave imaging radiometer with aperture synthesis onboard the soil moisture and ocean salinity (SMOS) satellite [10]. These satellite SM products, which have global coverage at a certain revisit time step, are feasible in global-scale applications, as they offer a snapshot of the evolving land surface processes at a specific local time [11].
However, single RS-source SM suffers from many deficiencies in practice, such as discontinuity in time series and higher uncertainty applied in dense vegetation cover areas [12]. The European Space Agency (ESA) Climate Change Initiative (CCI) project considers an integration of active and/or passive SM products and provides three effective merged SM products, which aims to overcome the drawbacks of the single RS-source SM [13]. The CCI SM dataset consists of three SM products by different blending techniques [14].
Additionally, as a multidisciplinary approach that considers several land surface processes, SM products by data assimilation are an effective tool for providing spatiotemporally continuous multilayer SM at different scales. Two of the widely used assimilated SM products are NASA's Global Land Data Assimilation System (GLDAS) SM product [15] and the Global Land Evaporation Amsterdam Model (GLEAM) SM product [16]. For better understanding and utilizing the above-mentioned SM products, an effective validation of SM products is commonly performed by comparisons against reference data, aiming at obtaining systematic and random errors or deviations in SM products and accessing quantitative statistics on the SM quality [17].
One key issue in validating the SM products is the reference data selection. The reference data, which are assumed to be reliable in representing the "validating referenced" SM value, are commonly provided from three sources: field campaigns, in situ networks, and model simulations [18], [19]. in situ measurements (e.g., time domain reflectometry probes or ground penetrating radar probes) that arguably provide the most precise SM estimates are usually limited in spatial expansion [20]. However, the mismatch of spatial-temporal scales in targeted RS SM data between field campaigns and in situ networks are inevitable [18], [21], [22]. Using SM simulated by hydrological or land-surface models as a reference to perform validation is a promising alternative since SM simulation is not restricted by its coverage or scale [18]. Model simulations can provide regional SM maps with spatial-temporal continuity [23], [24]. Another advantage of using model simulations is that the model can output several relevant hydrological variables which can be used to analyze the corresponding responses in SM variations, such as evapotranspiration (ET) and infiltration (I) [25]. Usually, precipitation (from Tropical Rainfall Measuring Mission and Global Precipitation Measurement precipitations datasets) is utilized as an auxiliary variable alone to help analyze water balance in SM validation [26]. However, clouds or atmospheric distortions usually cause the time series discontinuity of precipitation, and thus spatial-temporal distributions of I cannot be directly obtained [27]. Vertical water exchange between the atmosphere and the surface soil layer is equal to I-ET, which is actually the net downward flux [28]. Therefore, it is important to obtain the correlation between SM responses and I-ET variations in the same framework.
Another important issue in validating the SM products is the evaluation method and assessment indicators. According to the work in [26] and [29], methods in SM validation are generally classified into two categories: direct and indirect evaluations. Direct evaluation usually uses statistical performance indicators calculated by comparing reference data against targeted SM products. Indirect evaluation is mainly based on cross-validation and consistency analysis of spatiotemporal trends using impact factors [30], [31]. It is noted that both direct and indirect evaluation methodologies need to improve. For direct evaluation, the shortcoming is that most SM validation research usually uses limited statistical performance indicators to assess the comparison results. Gruber et al. [18] summarized the current direct validation methods for SM products. Gueymard [32] provided an important review of validation methodologies and statistical performance indicators in solar project fields. Both reviews not only provided a theoretical background about SM validation studies but also proactively and effectively formed a validation framework. It is then feasible to transform the methodology system from Gueymard [32] to SM product validation. For indirect evaluation, the weakness is neglecting the linkage between cross-validation methods and indices in the direct validation. Liu et al. [31] reported that the three-cornered hat (TCH) based relative uncertainty (RU) and root mean square error (RMSE) indices have good consistencies. However, the correlations of RUs and director performance indicators as well as the uniformities in direct and indirect results are still unknown [33]. It is therefore important to obtain their mathematical relationships.
The novelties of the current study include: 1) providing a reference SM from an integrated hydrological model with superior accuracy and high spatial-temporal resolution over a modern agricultural area, 2) developing a validation methodology based on fifteen statistical indicators and linking the SM RU and other statistical performance indicators, and 3) diagnosing the impact degrees of soil and vegetation environmental factors on SM products. Without any doubt, SM estimations are critical in arid and semiarid regions and are heavily affected by irrigation [7], [34]. With respect to the water budget, irrigation increases SM and in turn ET while decreasing runoff and depleting river and irrigation channel flows as well as groundwater, depending on the agricultural water use strategies [35]. From a hydrological perspective, irrigation strategies have huge impacts on local hydrological dynamics such as groundwater flow systems and groundwater-surface water storage changes, which would further affect streamflow, ET, and SM variations [36], [37]. Therefore, it is important to consider the complex water source use in SM validation in model simulations. Numerous validation studies based on simulated SM have been widely carried out, e.g., in the Rur and Erft catchments in Germany [38], over the US continent [39], on the Tibetan Plateau in China [40], and globally [41], [42]. In this study, we provide a comprehensive validation and intercomparison of ten SM products, namely, JAXA, LPRM-C, LPRM-X, SMOS-IC, CCI-A, CCI-P, CCI-C, GLEAM-A, GLEAM-B, and GLDAS SM products, based on outputs from a high-resolution integrated hydrologic model (ParFlow-CLM) in a typical managed agricultural region, the Heihe River Basin (HRB) in Northwestern China [ Fig. 1(a)].
The objectives of this study are: 1) to evaluate ten SM products using integrated hydrologic model simulations based on fifteen statistical performance indicators, 2) to diagnose the impacts of eight environmental factors (DEM, clay content, sand content, porosity, saturated hydraulic conductivity, organic matter content, leaf area index, and root abundance) on SM products, and 3) to find the precise correlations between the relative uncertainties and other statistical performance indicators. The fifteen statistical performance indicators employed in this study include the mean absolute difference (MAD), mean bias difference (MBD), RMSE, standard deviation of the residual (SD), uncertainty at 95% (U95), the t-statistic (TS), Pearson's correlation coefficient (R-Pearson), Spearman's Rho correlation coefficient (R-Spearman), slope of best-fit line (SBF), Nash-Sutcliffe's efficiency (NSE), Legates's coefficient of efficiency (LCE), Willmotts's index of agreement (WIA), the Kolmogorov-Smirnov integral (KSI), a combined performance index (CPI), and the TCH-based RU.
The rest of this article is organized as follows. Section II introduces the study area, the integrated hydrologic model, multisource SM products, and other auxiliary data. Section III provides the validation methodologies and explains the calculation approaches of impact quantifications and factor contributions. Section IV presents the assessment results, and Section V provides a further discussion on SM RU. Finally, conclusions are summarized in Section VI.

A. Study Area and In Situ Observations
The Heihe River, the second largest inland river situated in arid-semiarid region of Northwestern China, originates from the alpine region of the Qilian Mountains, flowing north and terminating in the Juyan Lakes in the Gobi deserts [43]. The study area is located in the middle reaches of the HRB (hereafter referred as the mHRB, 38.5°-39.5°N and 100°-101°E), which is controlled by an endorheic climate with annual precipitation ranging from 100 to 200 mm and potential evaporation within the range of 1200-1800 mm [22], [44]. Being famous for its grain production, the mHRB is covered by a large amount of irrigated farmland [ Fig. 1 The in situ SM measurements within the mHRB were established during the HiWATER project, which was initialized in 2012 and designed to be a comprehensive ecohydrological experiment [21]. In this study, the SM data collected using the time domain reflectometry probe (TDRP) and cosmic ray neutron probes (CRNP) [22] were used for validation [ Fig. 1(d)]. The TDRP automatically collected SM data at a depth of 2 cm and 10-min intervals since September 2012. The CRNP was installed in October 2012 and measured the intensity of ambient low-energy tertiary neutrons to calculate surface SM with a radius of 360 m and at an original 10-min interval. All the in situ data were resampled to daily time steps after standard quality control processes [45].

B. Multisource SM Products
Ten global surface SM products were considered in this study for intercomparison, including AMSR, SMOS, and ESA CCI, as well as the land data assimilation products GLEAM and GLDAS. Although these SM products were sourced from Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. different technologies (i.e., retrieval, reanalysis and merge, and assimilation), here they were collectively referred to as "RS SM products" and distinguished as single RS-source SM, merged SM, and assimilated SM, respectively. Information on the SM products is shown in Table I.

1) AMSR (AMSR-E/AMSR2):
The AMSR SM products contain two series of products: the pioneer AMSR-E and its successor, AMSR2. The AMSR-E on the EOS Aqua satellite was launched by NASA on May 4, 2002 [8], while the AMSR2 onboard the GCOM-W1 satellite was launched by JAXA on May 18, 2012 [9]. As the ascending and descending overpasses of AMSR2 are identical to those of AMSR-E, the term "AMSR" represents the combination of both products.
Here, we utilized the three most widely used SM products from the AMSR series: JAXA SM product (version 8 for AMSR-E and version 3 for AMSR2, hereafter referred to as JAXA products) [9], [46], [47] and two LPRM SM products (generated from the C-band and X-band, hereafter referred to as LPRM-C and LPRM-X products, respectively) [48]. The JAXA SM data are available at https://www.eorc.jaxa.jp/AMSR/index_en. html, while LPRM SM data can be found on the NASA Earth Observation Data website (https://search.earthdata.nasa.gov/).
2) SMOS IC: The SMOS satellite was launched on November 2, 2009, and has since become an invaluable tool for monitoring key components of the global water cycle. SMOS provides global coverage at the equator every three days, with an ascending orbit at 06:00 local solar time and a descending orbit at 18:00 local solar time [49]. The L-band microwave emission of the biosphere (L-MEB) model, based on a zero-order radiative transfer equation, is the forward model used for SMOS SM retrieval [50]. The newly developed SMOS-IC product (version 1.d) is a recently improved product that corrects some errors in the initialization of vegetation optical depth (version v.105) [51] The data can be accessed from https://data.catds.fr/cecsm/ Land_products/.
3) ESA CCI: This study utilized three ESA CCI SM datasets: the active RS product (referred to as the CCI-A product), which is generated by fusing four scatterometers based on C-band observations, the passive RS product (referred to as the CCI-P product), generated from ten radiometers, and the combined product (referred to as the CCI-C product), which is a blended product based on CCI-A and CCI-P results [52]. These datasets were originally developed by the University of Vienna and can be accessed from https://www.esa-soilmoisture-cci.org/. 4) GLEAM: Two GLEAM SM datasets (version 3.6), GLEAM-A and GLEAM-B, were used in this study, which are generated from different forcing data in the surface layer (0-0.01 m) [53]. GLEAM-A uses radiation and air temperature datasets from the reanalysis ERA-Interim, while GLEAM-B is driven by RS datasets [16]. Furthermore, precipitation, snow water equivalent, vegetation optical depth, and vegetation fractions are largely driven by RS data for both datasets. Additionally, both GLEAM-A and GLEAM-B assimilate the CCI-C SM product. The dataset utilized in this study was obtained from Vrije University Amsterdam (VU) and can be accessed from https://www.gleam.eu/.

5) GLDAS:
The GLDAS was provided by a joint effort of NASA's Goddard Space Flight Center and NOAA's National Centers for Environmental Prediction (NCEP). Among the four available land surface models, the level-4 Noah model [54] was selected. The GLDAS2 Noah SM data with 0.25°spatial resolution and a temporal resolution of 3 h were obtained, and the top layer (0-0.01 m depth) of the output SM was used. Daily averaged values were employed in this study.

C. Auxiliary Data
The DEM and landcover maps are shown in Fig. 1. Spatial variation analysis was conducted using eight environmental factors, namely: 1) DEM; 2) clay content (hereafter referred to as clay); 3) sand content (hereafter referred to as sand); 4) porosity (Por); 5) saturated hydraulic conductivity (Ks); 6) organic matter content (OMC); 7) leaf area index (LAI); and 8) root abundance (RA). The LAI data were sourced from the "30 m month compositing leaf area index (LAI) product of the Heihe River Basin" [56] and can be downloaded from the National Tibetan Plateau/Third Pole Environment Data Center. Soil texture data utilized in this study were obtained from [57]. The Por, Ks, OMC, and RA data were sourced from two datasets, namely, "The Global Dataset of Soil Hydraulic and Thermal Parameters for Earth System Modeling" [58] and "The Global Soil Dataset for Earth System Modeling" [59] (http://globalchange.bnu.edu.cn/research/data). These four data were also resampled to a spatial resolution of 0.005°to fit the ParFlow-CLM grid.

A. ParFlow-CLM Model
The reference SM in this validation study was simulated by the integrated hydrologic model ParFlow-CLM [60], [61], which couples the ParFlow model for simulating groundwater [62], [63] and the Common Land Model (CLM) [64], and thus feasible for simulating regional SM with high-resolution. ParFlow-CLM solves the three-dimensional variably saturated groundwater flow and surface flow in a globally implicit manner; the land-surface formulation includes radiation budget, plant water use, land-energy balance, and snow processes. More information on the ParFlow-CLM can be found in Maxwell and Condon [65], and here, we only provide the water budget calculation as follows [28]: where P rain , P snow , R irri , R in , R out , ET veg , and E dir represent rain, snow, anthropogenic irrigation, inward overland flows, outward overland flow, subsurface influx, subsurface outflux, transpiration, and evaporation from barelands, respectively. Q in and Q out denote groundwater flux to deeper reservoirs and downstream from the mHRB. ΔS soil , ΔS surf , ΔS gw and ΔS snow represent the soil reservoir, surface water ponding, groundwater reservoirs, and snow water equivalent, respectively. All the components of the water budget in (1) are in unit volume per time.
In this study, the ParFlow-CLM model was horizontally discretized at a 0.005°(−500 m) spatial resolution with 8 vertical layers with a total thickness of 102 m. The top layer was 0.045 m, and a free overland surface boundary condition was employed for continuity of pressure and flux at the surface-subsurface interface. The DEM data used to calculate x-and y-direction slopes was derived from the "30 m Aster-gdem Data in Qilian Mountain Area" [66]. The land cover map was obtained from the "Landcover dataset at Qilian Mountain area (V1.0)" [67]. The original spatial resolutions of both datasets were 30 m and then were resampled to the same criteria as those of the ParFlow-CLM. The model parameters, including van Genuchten parameters, Manning coefficients, subsurface geology properties, and vegetation properties, are listed in Table II.
The forcing data consists of the hydrometeorological data and agricultural water use data. The required atmospheric forcings of ParFlow-CLM (i.e., temperature, precipitation, specific humidity, longitudinal and latitudinal wind speeds, and downward longwave and shortwave radiations) were obtained from the gridded outputs by Weather Research and Forecasting (WRF) model simulations [68]. Considering the extensive usage and the associated effects of irrigation on the regional water and energy budgets, the agricultural water use including both surface water and groundwater irrigation, was derived from the Irrigation Datasets with 30 s spatial resolution over the entire HRB [69], [70]. Subsequently, irrigation is applied to the surface as spray irrigation. The irrigation schedule was made on an hourly basis throughout the growing seasons according to the Zhangye Water Affairs Bureau (www.zhangye.gov.cn/swj/). All the data could be accessed from the National Tibetan Plateau Data Center/Third Pole Environment Data Center (http://data.tpdc. ac.cn/). The ParFlow-CLM simulation began with a 3-year spin-up (2008-2010) followed by a 3-year simulation (2011-2013). The initial time step was hourly based but outputted at daily scale. Differences between ET and I were calculated as well to obtain the response of SM products to I-ET.

B. Validation Methodologies (Statistical Performance Indicators)
As previously noted, assessments and quantitative analyses of SM products and their behaviors are crucial for understanding the terrestrial water cycle. Previous studies have tended to focus on a particular aspect, such as qualitative analysis of deviation against observations. For decades, RS SM validation has relied on simple conventional statistical indicators, such as MAD, RMSE, and coefficient of determination [26], [29]. Comprehensive assessments are absolutely necessary by providing a multiclass evaluation. Therefore, in this study, we proposed an overall statistical indicator validation methodology to validate the SM [32].
The overall statistical indicator validation methodologies were first introduced in Gueymard [32] in solar product fields. Here, we revised the forms of validation indicators to better understand the overall performance of ten SM products. The statistical candidate indices and index metrics commonly employed in performance assessment can be classified into four categories as follows: 1) Indices representing the deviation (as for other model data) or error (as for in situ data) of individual points, of which the value would be 0 for a perfect match or ideal model. 2) Indices representing holistic performance (as for timeseries-self-correlation) or mutual coherence (as for crosscorrelation), of which the value would be 1 for a perfect match. 3) Indices representing distribution conformity, of which the value would be 0 for a perfect match. 4) Indices exploring the relative uncertainties. More information about statistical indices and formulas can be found in [31] and [32].
In what follows, the ith simulated SM data (reference) point is noted as s i , and the ith RS estimated SM data (target) point is noted as t i . The average values of simulated SM data and RS estimated SM data are noted s m and t m , respectively. N is the number of the data points. The ith estimated-modeled difference (or error) is t i − s i . Unlike the absolute units used in some validation studies (e.g., [17], [30], [31], and [73]), the stating expression of the s m value was converted back to the percent form, which is consistent with [32].
1) Class 1-indicators of dispersion: MAD, which is also referred as the mean absolute deviation or mean absolute error MBD, which also refers to the mean bias deviation or mean bias error RMSE is also referred to as the root mean square difference SD is calculated as follows: U95 is calculated as follows: TS is calculated as follows: MAD, MBD, and RMSE are fundamental in RS retrieval and validation, while U95 and TS are less common in the RS field than former ones. Similarly, they convey relatively similar information, with the cosmetic advantage that a smaller value indicates a better SM retrieval.
2) Class 2-indicators of overall performance: R-Pearson is calculated as follows: R-Spearman is calculated as follows: SBF is calculated as follows: (10) NSE is calculated as follows: LCE is calculated as follows: WIA is calculated as follows: R-Pearson, R-Spearman, and SBF vary between 1 for a perfect positive match and −1 for a negative match, whereas 0 is total disagreement. LCE and NSE vary between 1 for perfect agreement and −∞ for complete disagreement. The WIA varies only between 1 and 0.

3) Class 3-indicators of distribution similitude:
Here, the goal is to compare one or more cumulative frequency distributions of modeled data to those of a reference dataset. The KSI and CPI were utilized.
KSI is calculated as follows: where D n is the absolute difference between the two normalized distributions within irradiance interval n. X max and X min are the maximum and minimum values of the binned reduced irradiance, x, and A c is a characteristic quantity of the distribution where the critical value D c is a statistical characteristic of the reference distribution, defined as a function of its number of points, N where Φ(N ) is a pure function of N, for which an accurate numerical approximation has been obtained. Here, we assume that Φ(N ) is a constant Φ (N ) = 1.63. CPI is defined as follows: where all values are expressed in the same unit as that of SM. KSI and CPI both vary between 0 for perfect agreement and ∞ for complete disagreement. The cons and pros of the evaluation indicators and detailed guidelines can be found in [32].

4) Class 4-indicators of relative uncertainties:
The TCH method facilitates the solution of estimating SM uncertainties with no prior knowledge. Fundamentally, the TCH method allows more than three SM products to take part in the calculation simultaneously, with a tolerance to the error cross-correlation. At a certain RS grid, TCH splits each SM time series of the observations into two terms where P 0 is the true SM value at the grid scale and ε i represents the corresponding error. The TCH-based RU is where r ii is the unknown diagonal element of the ε i covariance. Detailed methodologies of both TCH theory and TCH-derived RU equations are well documented in [31].

C. Definition of the Variation Coefficient
The variation coefficient (CV) is defined as follows: where v i andv are the ith, averaged value of SM and other environmental factors of 0.005°variables within a 0.25°grid. n is the sample number of the subgrids. Specifically, SM, DEM, Clay, Sand, Por, Ks, OMC, LAI, and RA participated in the CV calculations.

D. Contribution of Environmental Factors to SM Uncertainty
The generalized additive model (GAM) was utilized to quantify the contributions of each factor to the total deviance of the ten SM RUs. GAM has been widely used in practice with its availability of statistically well-founded smoothing parameter estimation methods that are numerically efficient and robust [74]. Provided that g i (RU i ) can be represented via several basis expansions of environmental factors (ψ i ), the form of GAM is written as follows [75]: where RU i is the TCH-based RU and g i is the link function. M (M = 7 + 1) is the number of environmental factors along with the sample size. α and β i are unknown coefficients, and f i are unknown basis functions used for smoothing environmental factors ψ i . ε is the total error term. The GAM model (version 1.8-40) can be accessed from https://rdocumentation.org/packages/ mgcv/versions/1.8-40. For more details on the GAM algorithm, please refer to [74] and [75], while two examples in calculating contributions to RS SM can be found in [31] and [76].

A. Validation of the ParFlow-CLM SM Simulation
The ParFlow-CLM simulated SM was first validated against two groups of in situ observations from the Daman meteorological superstation. The TDRP only probe a very small soil volume while the CRNP is better suited for the validation of satellite-based SM products due to larger footprint area (with a horizontal radius of about 360 m in the mHRB [45]). Fig. 2 displays the comparison of the ParFlow-CLM simulation with TDRP and CRNP SM observations for the Daman superstation. The observed SM data are the reference (s i ), and the simulated SM data are the target (t i ). The low RMSEs (< 0.06 m 3 /m 3 ) and relatively high NSE indicated that the ParFlow-CLM simulation and observations are consistent in SM variations. The R-Person values were 0.83 and 0.93, while the R-Spearman values were 0.73 and 0.87 against TDRP and CRNP, respectively, demonstrating that the three SM time series were in good agreement. It is noted that ParFlow-CLM simulated results were almost equally distributed around the 1:1 line with TDRP and CRNP SM observations [ Fig. 2(b)], demonstrating that the accuracy of the simulated SM. All the three SM time series can capture rainfall events with different magnitudes. Moreover, effects of irrigation on SM were identified through SM time series fluctuations from June to September. In winters, ParFlow-CLM SM continuously decline similar to observations. This showed that ParFlow-CLM simulation captured the temporal variation of both TDRP and CRNP SM measurements. All these revealed that ParFlow-CLM SM simulations agreed well with TDRP and CRNP observations and could fully reflect seasonality features of SM.  Compared with the TDRP SM time series data, the ParFlow-CLM SM had a tendency similar to that of CRNP, owing to different spatial scales. According to Montzka et al. [77], the CRNP approach is better suited for evaluations of satellite-based SM products because it integrates small-scale spatial variations in SM [78]. The validated results demonstrated that the SM simulated by ParFlow-CLM exhibited a higher degree of similarity to the SM dynamics obtained from the CRNP, along with better correlation. This suggests that the ParFlow-CLM SM simulation is capable of representing SM fluctuations at kilometer scale, thus providing an enhanced reference dataset for validating single-source RS, merged, and assimilated SM products. Notably, the time-series of the TDRP SM presented a discontinuity period, which could be attributed to the inclusion of extra hydrogen sources. This observation reinforces the potential of using hydrologic model simulations as a reliable reference dataset, exhibiting improved time-series continuity, for validating SM products [79], [80].

B. Performance of Ten SM Products
All available SM estimates for each product during the whole period (2011-2013) were evaluated to obtain the accurate statistical indicators for the purposes that users can utilize these SM products directly [17]. First, AMSR and SMOS RS SM data were resampled to daily time steps after quality control processes. GLDAS 3-h SM data were also merged to daily scale. All the results of the statistical performance indicators are listed in Table III.

1) Time-Series Comparisons and Performance Statistics:
The time series for all SM products are plotted in Fig. 3(a). It is observed that the JAXA SM product exhibited a significantly limited dynamic range. Both LPRM-X and LPRM-C SM exhibited larger dynamic ranges compared with ParFlow-CLM simulations. Bindlish et al. [81] found that LPRM SM products have huge anomalous SM retrievals that are greater than the soil porosity over the Little Washita watershed, which was consistent with our study in the mHRB. Lu et al. [73] attributed this phenomenon to the erroneous estimation of the vegetation optical thickness. SMOS-IC SM had the largest variations with several discontinuities after quality controls. The three CCI SM products tended to underestimate the ParFlow-CLM SM to different extents. However, the distribution of CCI-C SM was close to that of ParFlow-CLM SM. The SM time-series data pairs of GLEAM-A, GLEAM-B, and GLADS were the same as ParFlow-CLM [see Fig. 3(d)] because they could all be regarded  as "modeled SM" [17]. Additionally, the averages and variances of these three SM products were similar to those of the ParFlow-CLM SM, with smaller ranges. The overall performance shown in Fig. 3(b) was based on R-Pearson calculated using all nine RS grid data. The LPRM-X, LPRM-C, GLEAM-A, GLEAM-B, and GLDAS SM retrievals are near-equally distributed around the 1:1 line against the ParFlow-CLM data. Table III shows the overall statistical indicators for the ten SM products against the ParFlow-CLM SM. For the deviation or error of SM time-series statistics (Class 1), GLDAS SM For the TCH-based RU results (Class 4), GLDAS had the smallest value, while SMOS-IC obtained the largest value. Considering that GLEAM-B and GLDAS participated in RU calculations, the CCI-C SM gave the lowest RU value (13.18%) except for these two SM products involved. In summary, merged SM and assimilated SM products outperformed single RS-source SM products. These results were similar to SM validation using in situ networks (e.g., [82], [83], [84], [85], [86], and [87]). All the validation results concluded that JAXA SM shows underestimations, and LRPM SM shows overestimations in the mHRB. Additionally, the merged and assimilated SM outperformed the single-source RS SM products. In general, the GLEAM-B and GLDAS SM products exhibited higher levels of accuracy. Furthermore, assimilated or merged products demonstrated superior performance compared with individual RS-based SM products. The GLDAS SM product demonstrated the highest level of accuracy, which can be attributed to its extensive calibration using ground-based observations (https://disc.gsfc.nasa.gov/datasets/ GLDAS_NOAH025_3H_2.1/). Additionally, the ESA and VU continuously release new versions of CCI and GLEAM SM products to achieve better precision. Gruber et al. [52] stated that one of the goals of the ESA CCI SM merging algorithm developments was to improve the performance statistics. Therefore, the single RS-source SM products need to be improved using more in situ observations and high temporal-spatial resolution simulated results (such as ParFlow-CLM) for evaluation and calibration.
In summary, for Class 1 (indices representing the deviation), CCI-C SM had the smallest SD value while GLEAM-A had the smallest MBD value. The TS value of LPRM-X was the smallest. The MAD, RMSE and U95 values of GLDAS SM were quite minimum. In indicators of Class 2 (representing holistic performance), the NSE, LCE, and WIA values of GLEAM-B SM were the largest. The R-Person and R-Spearman values of GLDAS SM were the largest while the CCI-A SM had the largest SBF value. In indicators of Class 3 (representing distribution conformity), LPRM-X SM had the smallest KSI and CPI values. Lastly for the Class 4 (TCH-based RU results), GLDAS SM had the smallest value.
2) Response of SM Products to I-ET: In addition to the temporally compared results presented in Fig. 3(a), correlations between SM and I-ET (infiltration -evapotranspiration) were investigated with daily step simulations. As shown in Fig. 3(a), ET varies significantly in the mHRB region, and the largest ET values occur during the monsoon season (from June to September). It was usually generated after precipitation events. We found that almost all the RS time-series SM products respond to variations in I-ET to different extents. Only LPRM-X and LPRM-C SM showed poor agreements with I-ET. Fig. 3(c) shows the correlations between ten SM products and ParFlow-CLM I-ET simulation. JAXA had the highest R-Pearson/R-Spearman (0.423/0.496) among all the validated SM products, showing that it is the most sensitive to vertical moisture changes. Although the statistical performance indicators of JAXA showed poor agreement with ParFlow-CLM SM variations, they were relatively more linked to temporal I-ET variability. Therefore, JAXA SM has strong application potential in moisture memory calculations and drought identifications [34], [88], such as a pulse-reserve indicator in the soil-plant continuum water balance [89]. Both LPRM-C and LPRM-X SM showed negative correlations with I-ET, meaning that LRPM SM products need to be improved in capturing the dynamics of vertical moisture movements. For the rest, all the merged or assimilated SM products (namely, CCI-A, CCI-P, CCI-C, GLEAM-A, GLEAM-B, and GLDAS) had a weak positive correlation with I-ET, which met basic expectations [6]. In summary, JAXA had high sensitivity to I-ET while LPRM-X and LPRM-C failed to capture the dynamics of I-ET.

C. Impacts of Environmental Factors on SM Products
Ks and porosity play an important role in the propagation of the head potential fluctuation in subsurface flow simulations which strongly affect the SM [90]. The soil texture and organic matter content factors are also sensitive to SM changes [81]. Meanwhile, the LAI and RA are significant vegetation factors in SM retrievals and assimilations [83], [91]. To understand the impact of each parameter on SM products, the spatial variations in the eight environmental factors (i.e., DEM, sand, clay, Por, Ks, OMC, LAI, and RA) were calculated, quantitatively estimated, and qualitatively analyzed. Fig. 4 maps the spatial distribution patterns of eight environmental factors. The null value was neglected in calculating the variation coefficients. Fig. 5 presents the CVs of SM and environmental factors (b), together with the biases of SM products against ParFlow-CLM SM (a). Here, the term "Max|10bias|" denotes the maximum value of the absolute bias of ten SM products. This value is plotted to better detect the connections between the environmental variable heterogeneities and SM biases. Overall, the largest bias of the ten SM products against the ParFlow-CLM SM occurs in grids 2 and 5 [ Fig. 5(a)]. Additionally, the greatest values of CVs in SM are generated in grids 2 and 5, indicating that the spatial variation is a nonnegligible element in RS SM. Overall, according to the impact patterns, we can classify the environmental factors into three categories (ranking from lowest to highest): 1) unnecessary and insufficient conditions, such as sand and clay; 2) unnecessary but sufficient conditions, such as DEM, Por, OMC, and RA; and 3) necessary and sufficient conditions, such as Ks and LAI. Previous studies (e.g., [82] and [84]) declared that the impacts of vegetation-related indices are generally larger than those of soil texture and that terrain complexity has an intervening impact. We expanded the involved environmental factors and provided their connections with SM products. Here we found that the variation coefficients of Ks and LAI had a strong correlation and consistency with the SM variation coefficient. This indicated that Ks and LAI, representing the contributions of soil physical properties and overlying vegetation properties, were dominant factors in SM spatial variations.

A. Relationship Between SM RU and Environmental Factors
In this study, we provide a comprehensive validation of ten multisource SM products against ParFlow-CLM simulations. A set of fifteen direct statistical performance indicators along with the TCH-based RU were utilized for the evaluations. These indicators have been commonly used in RS SM assessments during the past decades (e.g., [17], [20], [92], [93], [94], and [95]). However, the relative uncertainties were newly introduced to RS SM assessments [33], [96], [97], [98]. Subsequently, the key drivers for spatial variations of RS SM estimations were explored. Moreover, the linkages between SM RUs and other statistical performance indicators were discussed.
The drivers of eight environmental factors for TCH-based RU were explored using the GAM models. First, for a better balance in quantifying environmental factors, the relative elevation was picked by subtracting the minimum value of elevation. Then, the SM products were spatially resampled to obtain an SM RU matrix in ParFlow-CLM. Several screening procedures were conducted because the effects of sample size and data continuity must be carefully treated [31], [75]. The statistics of the first and second maximum contributing environmental factors are shown in Table IV. The GAM model was used to carry out the deviance explanation, which is related to ten SM relative uncertainties. The adjusted R 2 in each GAM model is also shown in Table IV. Both LPRM SM products are not included due to their adjusted Almost all the merged and assimilated SM products are mainly impacted by LAI, possibly due to the anthropogenic influences, which include grain production, plantations, and wetland park development. All these human activities can strongly change LAI and therefore perturb SM variables. Therefore, in-depth analysis is needed to further explore the quantitative impacts of LAI on the TCH-based SM RU. Fig. 6 displays the impacts of LAI on the TCH-based SM RUs, (a) for single RS-source SM products and (b) for merged SM and assimilated SM products. Specifically, the RU of JAXA initially decreased slightly and then exhibited an undulating increase until LAI reached 0.6. The RU of SMOS showed a decreasing trend before LAI reached 0.2, followed by a continuous increase thereafter. The RU of CCI-A exhibited a slight decrease and then remained constant, while the RU of CCI-P displayed a sharp decrease before starting to increase from LAI = 0.5. For CCI-C, the change pattern was similar to CCI-P with smaller magnitudes. In contrast, for assimilated SM products [GLEAM-A in blue, GLEAM-B green, GLDAS in pink in Fig. 6(b)], the RUs exhibited a monotonic decrease as LAI increased. This indicated that the impacts of LAI on these SM products were weaker than those of single RS-source SM products. This conclusion was similar to that of Liu et al. [31], which was obtained over the Qinghai-Tibet Plateau. Therefore, we can conclude that the impacts of upper vegetation layers on assimilated SM products are small. However, we can see that the RUs of GLEAM-A and GLEAM-B are larger than 10% when the LAI was smaller than 0.1 (under sparse vegetation or bare land).

B. Linkage Between SM RU and Other Statistical Performance Indicators
Liu et al. [31] reported that the RUs and RMSE indices had good consistencies, which showed that the RUs share some mathematical connections with statistical performance indicators. Here, we performed a comprehensive analysis to find the precise relationships between the relative uncertainties and other statistical performance indicators. Fig. 7 maps the comparisons of indicators of Class 1, Class 2, Class 3, and RU. Scatter plots of TCH-based RU and fourteen other statistical indicators of ten SM products, which were obtained in performance validation, are used here to draw the panels. Here, we define P-Person ≥ 0.8 or R 2 ≥ 0.6 as an intrinsic strong correlation relationship. As shown in Fig. 7, MAD, U95, NSE, and CPI showed different types consistent with TCHbased RU. The MAD and U95 displayed a strong positive linear correlation with RU, while the NSE showed a strong negative linear correlation. Furthermore, the CPI exhibited a nonlinear correlation relationship with RU with polynomial fitting. This indicates that in a statistical sense, MAD, U95, NSE, and CPI can be utilized as substitutes for quantifying the SM RU.

VI. CONCLUSION
This study aimed to perform a comprehensive validation of ten SM products (namely, JAXA, LPRM-C, LPRM-X, SMOS-IC, CCI-A, CCI-P, CCI-C, GLEAM-A, GLEAM-B, and GLDAS) using a set of statistical performance indicators in the middle reaches of the HRB. Forced by hydrometeorological and agricultural water use reanalysis data (including both surface water and groundwater irrigations), an integrated hydrologic model (ParFlow-CLM) was built to provide the reference SM data because of its superior performance in simulating SM [28], [99], [100]. Ten SM products were evaluated against the ParFlow-CLM simulations by a comprehensive evaluation framework composed of fifteen statistical performance indicators, which was classified into four categories as indicators of dispersion, overall performance, distribution similitude, and RU. Moreover, eight environmental factors (DEM, clay content, sand content, porosity, saturated hydraulic conductivity, organic matter content, leaf area index, and root abundance) were utilized to explore the drivers for spatial variations of these SM products. The precise correlations between the relative uncertainties and other statistical performance indicators were also investigated. The main findings are summarized as follows: 1) The ParFlow-CLM simulation and SM observations were observed to be in reasonable agreement for a holistic performance. The R-Person values were 0.83 and 0.93, while the R-Spearman values were 0.73 and 0.87 against TDRP and CRNPs, respectively, indicating that the simulation and observations match well with the magnitude of the SM variations. Moreover, compared with TDRP SM time series, ParFlow-CLM SM was closer to that of CRNP, owing to different spatial scales. This indicated that ParFlow-CLM SM simulation can capture SM variations at spatial scale and is a better reference data for validating SM products. Overall, the single RS-source SM products need to be improved using more in situ observations and high temporalspatial resolution simulated results for evaluation and calibration. 4) JAXA has the highest R-Pearson/R-Spearman (0.423/ 0.496) among all the validated SM products, showing that it is the most sensitive to vertical moisture changes. Although the statistical performance indicators of JAXA show poor agreement with ParFlow-CLM SM variations, they are relatively more linked to temporal infiltration-ET variability. Therefore, JAXA SM has strong application potential in moisture memory calculations and drought identifications. 5) The variation coefficients of saturated hydraulic conductivity (Ks) and leaf area index (LAI) have a strong correlation and consistency with SM variation coefficient. This indicates that Ks and LAI, representing the contributions of soil physical properties and overlying vegetation properties, are dominant factors in SM spatial variations.
In particular, based on the GAM, the LAI is the most dominant factor with the largest contribution fraction to SM RU, possibly due to the anthropogenic influences. Moreover, deviances explained by LAI related to SM relative uncertainties in merged SM and assimilated SM are relatively small, demonstrating that the single-source RS SM products need to be improved over dense vegetated regions. 6) Comparisons of indicators of Class 1 (representing the deviation), Class 2 (representing holistic performance), Class 3 (representing distribution conformity), and SM RU show that the MAD and U95 have strong positive linear correlations with RU, while the NSE shows a strong negative linear correlation. Additionally, the CPI depicts a nonlinear correlation relationship with RU with polynomial fitting. In total and in a statistical sense, MAD, U95, NSE, and CPI can be utilized as substitutes for quantifying the SM RU. The study domain (mHRB) was a small-sized, semiarid subbasin, with an alluvial aquifer that serves as a representative example of other arid and semiarid basins. Although the current study might be transferable to other regions, future work is essential to simulate larger climatic and hydrologic areas to better understand the performances of SM products. Future simulations of enlarged domains would also facilitate further analysis of the impacts of soil and vegetation factors on SM products as well as the linkage between SM RU and other statistical performance indicators.