Triple Collocation Analysis and In Situ Validation of the CYGNSS Soil Moisture Product

Cyclone Global Navigation Satellite System (CYGNSS) soil moisture (SM) product is characterized by high temporal resolution, but the relative strengths and weaknesses of this new product are unknown. In this article, we analyze the performance of CYGNSS SM product across varied land covers and climates, using the triple collocation (TC) analysis and in situ validation. The Soil Moisture Active Passive, Advanced Microwave Scanning Radiometer 2 Land Parameter Retrieval Model, and European Space Agency Climate Change Initiative Active SM products were used as references as well as data alternatives to calculate TC-based standard deviation (SDTC), correlation (RTC), and in situ validation Pearson's correlation coefficient (R), unbiased root-mean-square error (ubRMSE). The TC analysis indicated that CYGNSS had a relatively low median SDTC of 0.024 m3/m3 and RTC of 0.419. Validation based on 251 in situ SM stations showed that CYGNSS obtained a relatively low median ubRMSE of 0.057 m3/m3 along with a low median R of 0.414. Both interproduct comparisons of triple collocation (TC) analysis and in situ validations revealed that the CYGNSS product was characterized by small TC-based standard deviation (SDTC) and unbiased root-mean-square error (ubRMSE) but performed poorly in capturing SM temporal variability. Additionally, the performance degradation for CYGNSS capturing the SM temporal variability over the barren areas including in Northern Africa, the Arabian Peninsula, and Central Australia with arid/semiarid climates, and forested regions including in eastern South America, the Indo-China Peninsula, and Southeastern China with temperate/tropical climates. This suggests that capturing SM temporal variations over barren and forests regions is a key priority to improve CYGNSS SM algorithms.

Similar to the Soil Moisture and Ocean Salinity (SMOS) [18] and Soil Moisture Active Passive (SMAP) [19] missions, the CYGNSS signals operate at the L band, which is capable of penetrating cloud cover and is sensitive to changes in the SM content. Unlike the SMOS and SMAP missions that rely on radiometers that are to estimate SM, the CYGNSS reflectometry technique exploits delay-Doppler maps (DDM) of the Global Navigation Satellite System (GNSS) multipath delay and earth's surface parameters with the assistance of ongoing knowledge of SM [12], [20]. A DDM Imager (DDMI) on board the CYGNSS satellite receives the bistatic specular reflection tracks from GNSS satellites. In theory, all the reflected signals from the multiple GNSS constellations can be used for earth remote sensing [17]. This is one of the most appealing features of spaceborne GNSS reflectometry, and the multisource GNSS reflected signals and low-cost passive instrumentations reduce the size and complexity of sensors and enable the employment of small satellite constellations in remote sensing.
The CYGNSS SM product was released on November 13, 2020 and developed by the University Corporation for Atmospheric Research and the University of Colorado at Boulder (CU). It is the first SM product retrieved from GNSS bistatic radar that spans the field scale to the quasi-global scale. However, the CYGNSS is originally designed for sensing ocean surface wind speeds and the DDMI is not an ideal SM sensor. The CYGNSS DDMI receives GNSS L-band forward scattered reflected signals that are pseudorandomly positioned with irregular spatiotemporal resolutions [17]. In contrast to mainstream microwave remote sensing satellites that have reproducible swaths and fixed local transit times, the irregular spatiotemporal mapping of CYGNSS data makes it challenging to convert the reflectance to SM [12], [21]. Therefore, it is critical to conduct a comprehensive evaluation and reliability analysis of the CYGNSS SM product for its utilization in hydrometeorological studies and applications.  The CYGNSS SM retrieval is now in its early phase of calibration, and the evaluations mainly rely on the SM observations from in situ sites and the SMAP mission [12], [22], [23], [24], [25]. The systemic errors in the SMAP/in situ observations propagate into CYGNSS SM retrievals because the CYGNSS SM values are calibrated based on the SMAP/in situ SM measurements. Additionally, the available in situ stations are limited and only record SM data at the point scale. As a result, it is not adequate to support the assessment of the CYGNSS SM product using only SMAP or in situ SM data as a reference. Moreover, few publications have investigated the error characteristics of the new CYGNSS SM product in comparison to the other microwave remote sensing SM products and in situ SM observations across different land covers and climates. The TC method [26] is a statistical tool that calculates the random errors of three independent datasets and has been widely used for the evaluation and fusion of remotely sensed SM datasets [27], [28], [29], [30].
The goal of this article is to identify the relative strengths and weaknesses of the CYGNSS SM product using SMAP, AMSR2 LPRM, and ESA CCI Active SM products as references and validation of 251 in situ sites at the quasi-global scale. The TC analysis characterizes the random errors, and the in situ validation estimates the systematic bias of the CYGNSS SM product. In this article, we first compared the spatiotemporal variations and identified the similarities and differences among the four microwave SM datasets. Then, the TC method was utilized to calculate the standard deviations and correlation coefficients of the CYGNSS SM product in virtue of three collocated datasets. Finally, the performance of the CYGNSS SM product was evaluated against in situ SM measurements under different land covers and climates.

II. MATERIALS AND METHODS
The influence of temperature on vertical soil profile variations is smaller during the early morning overpass than during the postmeridian overpass [31]. Therefore, we selected the SM products retrieved from the descending orbit of SMAP (6:00 A.M.) and Advanced Microwave Scanning Radiometer 2 (AMSR2) (1:30 A.M.) at the local morning overpass time. The datasets used in this article are listed in Table I. A. Satellite SM Datasets 1) CYGNSS SM Data: The CYGNSS mission consists of eight low earth-orbiting satellites at a common inclination angle of 35°. Unlike conventional radar with both transmitters and receivers, the CYGNSS is only equipped with receivers. The DDMI is mounted on each observatory that receives and processes reflected signals from earth's surface [32]. The direct signals from GNSS satellites are used to obtain the position, velocity, and time information of the GNSS receiver, while the reflected signals are used to draw the DDM on the ocean and land surface to sense various surface physical parameters.
To produce SM from the CYGNSS reflectivity, the peak value of the DDM corrected for the effects of gain, range, and incidence angle was applied to derive the effective surface reflectivity in dB (P r,eff ). The reflectivity observations that are affected by open water and their outliers were removed based on empirical quality control. Then, P r,eff was transformed into SM by the best-fit linear regression between collocated CYGNSS P r,eff and SMAP SM on the same day. More details about the CYGNSS SM retrieval algorithm are provided in [33].
2) SMAP SM Data: The SMAP mission is supplied with an L band passive radiometer and active radar, which provides global scale SM products with unprecedented accuracy. The active radar products were only available until July 7, 2015, due to the failure of the radar power supply. The SMAP SM retrieval algorithms are based on the tau-omega model [34]. For more information about the current algorithm of the SMAP SM product, readers can refer to [35]. The SMAP descending SM data with 36 km resolution in the EASEV2 projection was selected for this article. The SMAP SM datasets are freely available from (https://nsidc.org/data/).

3) AMSR2 LPRM SM Data:
The AMSR2 radiometer scans the equator at 1:30 P.M. and 1:30 A.M. (local time). Four alternative SM datasets are offered along with two operational SM retrieval methods. The Japan Aerospace Exploration Agency official look-up table algorithm provides SM data ranging from 0-0.6 m 3 /m 3 based on the 10.65 GHz (X-band) TB observations. The Land Parameter Retrieval Model (LPRM) provides SM products ranging from 0-1 m 3 /m 3 based on 6.9 GHz (C1-band), 7.3 GHz (C2-band), and 10.65 GHz (X-band) TB observations. It is noteworthy that radio frequency interference has a significant negative impact on the C-band AMSR2 SM retrievals [36], [37]. Thus, we choose the X-band AMSR2 LPRM SM (hereinafter called LPRM) product to evaluate the performance of the CYGNSS SM product. More information about the AMSR2 SM retrieval algorithms is provided by [38]. The LPRM SM datasets can be freely obtained from (https://gcmd.gsfc. nasa.gov/).

4) ESA CCI Active SM Data:
The European Space Agency Climate Change Initiative (ESA CCI) provides global long-term satellite surface SM products to support climate research. The datasets are updated quickly through the integration of new sensors and improved algorithms. The ESA CCI SM version 06.1 consists of three datasets: 1) Active SM dataset (August 1991-December 2020) generated from active scatterometers using the TU-Wien (Vienna University of Technology) change detection algorithm; 2) Passive SM dataset (November 1978-December 2020) generated from passive radiometers using the LPRM algorithm; 3) Combined SM dataset (November 1978-December 2020) incorporating data from both radiometer and radar sensors. In this evaluation practice, we used the ESA CCI Active SM product (version 06.1), which integrates the Advanced Scatterometer (ASCAT) Metop-A/B satellite SM datasets. More details about the ESA CCI SM retrieval algorithms are available at [39], and the datasets are from (https://www.esa-soilmoisturecci.org/).

5) Global Land Data Assimilation System (GLDAS) Noah SM Data:
The GLDAS Noah model utilizes satellite and ground observation data and adopts advanced land surface models and data assimilation techniques to provide four optimized near realtime ground SM data at different depths (0-10, 10-40, 40-100, and 100-200 cm) [40]. The simulations of the top 10 cm layer with a temporal resolution of 3 h and a spatial resolution of 0.25°w ere applied in the TC analysis. The GLDAS Noah (hereinafter called Noah) SM data can be downloaded from (https://data. gesdisc.earthdata.nasa.gov/data/GLDAS/).

B. In Situ SM Datasets
In this article, the CYGNSS SM product was evaluated using 251 in situ SM stations collected from the International Soil Moisture Network (ISMN) database [41]. Fig. 1 shows the spatial distribution of the in situ stations with six land cover  Table II.
Since the penetration depth of the GNSS reflectometry technology is 0-5 cm [42], in situ SM measurements only from the same depth with quality flags marked "G" are chosen from the ISMN database. The differences in the station-averaged SM measurements among morning, nighttime, and daily were not significant [43]. Therefore, similar to the previous validation studies of satellite SM products [44], [45], [46], the available in situ SM measurements nearest the satellite overpass time were selected for SMAP and LPRM, while the in situ SM observations over 24 h were averaged for the CYGNSS and ESA CCI Active.

C. Ancillary Datasets
The reliability and accuracy of SM products are affected by many factors, such as vegetation, precipitation, and climate. This study attempted to investigate the performance of CYGNSS SM product in varied land cover classes and climate conditions. The Global Precipitation Measurement (GPM) mission carried advanced radar/radiometer instruments and improved prediction capabilities in mid-to-high latitudes. The GPM mission has provided scientific researchers with three kinds of precipitation data (IMERG Early, IMERG Late, and IMERG Final) with spatial resolutions of 0.5°and 0.1°. The IMERG final synthetic data with a resolution of 0.1°were used in this article. All the GPM precipitation data can be accessed from (https://gpm.nasa.gov/data).
In the assessments, the 17-class Moderate Resolution Imaging Spectroradiometer (MODIS) International Geosphere-Biosphere Program (IGBP) land cover types were reclassified into six primary classes (unvegetated, croplands, grasslands, shrublands, woodlands, and forests) [47]. The SM retrievals over open water areas were effectively masked [33] and less data was recorded for urban areas [23]. Therefore, we only choose the barren or sparsely vegetated type in the unvegetated class. The 2018 MODIS MCD12Q1 product and the four main Köppen-Geiger climate zone categories (tropical, arid, temperate, and cold) [9] were taken into account to investigate the reliability of the microwave remote sensing SM products at the quasi-tropical scale. Detailed information on the land cover classes and climate zones of the evaluation used in this article is listed in Table III.

D. Methodology
Before the evaluation, several data preprocessing strategies are required to ensure consistency between the remote sensing SM measurements and the in situ observations: 1) SM units: The original unit of ESA CCI Active SM was converted from percentage (%) to volumetric water content (m 3 /m 3 ) through porosity (sand, silt, clay, and organic percentages), as described by [48].
2) Grid size and spatiotemporal coverage: The 3-h interval Noah SM data were averaged on a daily basis, and SMAP, ESA CCI Active, LPRM, and Noah SM datasets were resampled to the consistent grid size (36×36 km 2 ) and coverage (approximately 136°W to 164°E, 38°N to 38°S) using the nearest-neighbor interpolation method; and 3) Anomalous data processing rules: Abnormal SM values (less than 0 and greater than 1.0) were deleted.
1) Triple Collocation: This section used the TC-based standard deviation (SD TC ) and correlation coefficient (R TC ) values to evaluate the overall performance of three mutually independent SM products. The TC method makes three assumptions  [26], [27]: 1) each dataset has a linear relationship with the true values; 2) the errors in the datasets are mutually independent and do not change with time; 3) the errors in the dataset are independent of the true values. The CYGNSS SM dataset is derived from the SMAP SM by calibrating the CYGNSS reflectivity observations, SMAP and LPRM retrievals using the tau-omega model as the forward model, and these three SM datasets are correlated to some extent. To strictly comply with the key assumption within the TC method of zero error correlation, the Noah and ESA CCI Active SM datasets are introduced to construct collected triplets (see Table IV) for identifying the inter-product differences of SM products over diverse land cover and climate types.
The anomalous time series of SM concerning seasonal climatology reflect the temporal variational responses to precipitation events and dry-downs. Due to the fluctuations in the seasonal cycle of SM datasets, climatology variations across three datasets are more likely cross-correlated and violate the TC assumption (i.e., zero error correlation) [49]. To reduce the influence of climatology from the raw collocated datasets, we removed monthly average signals of the time series of collected triplets. In addition, the quantity of triplets less than 100 is problematic in TC evaluation [50], therefore, we removed the values that the original number of available SM triplets in each grid was not enough to meet the minimum requirement (>100).
For each grid cell, three SM time series of the unknown true values can be obtained where θ x , θ y , and θ z represent the independent spatiotemporal datasets, and θ is their true value. The scale parameters β correspond to the sensitivity of satellite observations θ i to θ , while α i represents the systematic additive bias. ε i represents the random error with a zero mean. The variance (σ 2 i ) and covariance (σ ij ) of the SM dataset can be written as According to the assumptions of error orthogonality ( σ θε i = 0), and zero error-cross correlation ( σ εiεj = 0, i ∈ [x, y, z] and i = j), (3) can be simplified as Consequently, the standard deviation (σ * ε i ) and correlation coefficient (ρ * i ) can be derived in (5) and (6), respectively [51] 2) Statistical Error Metrics: The error metrics between the time series of satellite SM estimates and the matching in situ SM observations for the validation period, include the Pearson correlation coefficient (R), root-mean-square error (RMSE), averaged bias (Bias), and ubRMSE, which are given as follows: where t is the time of observation, mv Sat (t)is the remote sensing SM estimate at time t, and mv in situ (t) is the in situ SM measurement at time t, n is the number of samples that both satellite SM retrievals and in situ SM measurements are available.

A. Spatial and Temporal Variations
Different inversion algorithms and modeling parameters can produce different SM values at the same satellite grid pixel. To address the similarities and capture the differences between the long-time series of CYGNSS and the other satellite SM datasets, a comparative analysis of the spatial and temporal variations within each match-up grid pixel was conducted on the same calendar day from March 18, 2017 to August 16, 2020. Due to the lack of SMAP SM data from June 20 to July 22, 2019, and a few days of CYGNSS SM data in 2019 and 2020, a total of 1210-day of SM data are involved in the calculations.
Global spatial maps of satellite SM products serve as the first step in the qualitative evaluation of CYGNSS SM product. Fig. 2 shows the comparisons of global spatiotemporal patterns among our selected microwave remote sensing SM products for single-day (June 6, 2018) and 1210-day averaged conditions. The spatial variability along the latitudinal and longitudinal directions was also analyzed. Fig. 3 shows the distribution maps  which are characterized by considerable gaps without any observations. This result indicates that CYGNSS has a distinct advantage in monitoring surface SM in a timely manner, and it could be a significant complement to radar and radiometer remote sensing. For the case of the 1210-day averaged SM, CYGNSS SM is missing over the Tibetan Plateau [see Fig. 2(b)] because of the removal of CYGNSS DDM data at surface elevations greater than 600 m [10]. For ESA CCI Active [see Fig. 2(h)], substantial SM data are missing over the Amazon Plain and Congo Basin, because the active retrievals tend to be less reliable in these densely vegetated regions and SM measurements that cannot meet the requirement of retrieval accuracy are masked.
In arid/semiarid regions (including northern/southern Africa, the Arabian Peninsula, and central/western Australia) or wet regions (such as the Amazon, Congo, India Peninsula, and Indo-China Peninsula), both the SM spatial variations between CYGNSS [see Fig. 2 Fig. 3(b) and (c) shows that the R values between the CYGNSS and LPRM, CYGNSS and ESA CCI Active SM products are smaller in northern Africa, the Arabian Peninsula, and southeastern China. According to statistics, only 21% of the R values between the CYGNSS SM dataset and the LPRM SM dataset, and 29% of the R values between the CYGNSS SM dataset and the ESA CCI Active SM dataset are greater than 0.5.
In general, similar spatiotemporal variations are observed between the CYGNSS and SMAP SM datasets in arid/semiarid and wet regions, indicating that the CYGNSS SM has a reasonable mean spatial pattern. However, in addition to northern Africa, there are certain discrepancies between the CYGNSS and LPRM, and ESA CCI Active SM datasets in some regions (including the Amazon, Congo, and southeastern China) with dense vegetation.

B. Triple Collocation Analysis
The TC evaluation was conducted to identify the relative performance of the CYGNSS dataset based on the random error metrics in comparison with SMAP, LPRM, and ESA CCI Active SM products. Table V shows the quantities of the SD TC and R TC values as well as their respective median values. The R TC and SD TC values are obtained based on the collocated datasets in Table IV. Fig. 4 shows the distribution of the SD TC and R TC metrics at the quasi-tropical scale. To further illustrate the impacts of vegetation and climates on the accuracy of the satellite SM datasets, boxplots of SD TC and R TC estimates are aggregated for six land cover classes and four climate zones, as shown in Figs. 5 and 6, respectively.
1) TC-Based Accuracy Assessment: As listed in Table V,  As shown in Figs. 4 and 5, the distribution of SD TC and R TC metrics varies across different regions and land covers. In general, SMAP performs best with small SD TC and large R TC values in most areas, and ESA CCI Active exhibits the second-best performance over substantial areas. Both the CYGNSS SD TC and R TC values are relatively small, LPRM SD TC values around 0.04-0.12 over areas with moderate to dense vegetation, including southeastern north America, eastern South America, the Sahel, South Africa, India, the Indo-China Peninsula, and southeastern China. All the products in each grid have a relatively small (large) SD TC (R TC ) value over substantial regions (e.g., southeastern North America, India, and central Australia). However, all the data exhibit relatively low SD TC and R TC values over northern Africa and the Arabian Peninsula with barren land covers and arid/semiarid climates. The reasons given for this discrepancy can be concluded: 1) The radiometer/radar has a challenge in receiving the relevant signals over extremely dry environments [30], [52]. 2) The SM retrievals are derived from deeper soil layers due to the lower frequency microwave bands penetrating deeper in dry soil layers [53], [54]. 3) The SM variations are so small that it is difficult for microwave instruments to detect their dynamics. What calls for special attention is that relative to other SM products, CYGNSS shows extremely low SD TC values ranging from 0 to 0.02 and R TC values between 0 and 0.2 in northern Africa and the Arabian Peninsula with barren type and arid/semiarid climates. The possible reason could be that the SM variations are small or even unchanged in the arid climate, and any effects due to random noise are amplified in the CYGNSS retrievals [33].
2) Comparisons Over Different Land Cover Classes: The boxplots (see Fig. 5) aggregated for land cover types are consistent with the spatial distributions (see Fig. 4) of the SD TC and R TC values. SMAP SM data generally has superior performance than other products over most land cover types. All the products performed poorly in the woodlands and forests classes with relatively large SD TC and small R TC values. However, ESA CCI Active exhibits relative advantages in woodlands and forests regions that can be attributed to the high sensitivity of the active microwave instrument to surface SM [55]. In sparsely vegetated areas (i.e., barren, croplands, and grasslands), all the products generally performed poorly in croplands with large SD TC values and lower R TC values. In contrast to moderate to densely vegetated areas (i.e., shrublands, woodlands, and forests), all the products generally performed well in shrublands with relatively small SD TC and large R TC values. The effects of vegetation in SM retrievals are represented here. Relative to the other land cover types, despite all the products with lower SD TC values, the performance of R TC degraded over the barren lands. Previous validation works [52], [56] also indicated that satellite-based products have difficulty in capturing the temporal SM variations over barren areas. This discrepancy also appears in the subsequent in situ validations in barren lands. The LPRM products slightly outperformed L-band (SMAP/CYGNSS) and C-band (ESA CCI Active) products over barren lands evidenced by larger R TC values, it may be ascribed that the relatively high-frequency X-band (LPRM) SM estimations from shallower subsurface soil layer matched the assumed truth. CYGNSS obtained smaller SD TC values with a median value of 0.009 and low R TC values with a median value of 0.22 over the barren lands. Instead, CYGNSS R TC values increased for croplands, grasslands, shrublands, and woodlands. This is mainly related to the reflectivity greatly decreased over barren lands [57] and evidenced by the lowest correlation coefficient between CYGNSS reflectivity and in situ SM observations over barren lands [58]. CYGNSS reflectivity signals are sensitive to topography roughness and the water content of vegetation and surface soil, and topography roughness and vegetation height are inversely correlated with the magnitude of reflected multipath measured by GNSS reflectivity [59]. It could be speculated that both the vegetation and topography roughness have a great influence on the reflectivity reception of the CYGNSS, with vegetation having a greater influence in vegetated areas and topography roughness having a greater influence in barren regions. Therefore, a physics-based CYGNSS retrieval algorithm needs to consider the effects of both topography roughness and vegetation cover.

3) Comparisons Over Different Climate Types:
The distributions of SD TC and R TC values under different climate types as illustrated in Fig. 6. In general, all the products exhibited lower SD TC and larger R TC values over the arid climate condition, which is possibly due to the lower vegetation attenuation (e.g., the Middle East, western America, and central Australia) [54], [56]. As evidenced by the relatively large SD TC and small R TC values, all the satellite SM products performed poorly over cold climate zone due to the influence of subpar soil temperature in these cold climate regions leading to deviations for microwave SM retrievals [60]. SMAP displays SD TC values smaller than 0.04 and better R values larger than 0.6 over most climatic regions. While CYGNSS obtained consistently smaller SD TC and R TC values, LPRM has relatively poor performance with larger SD TC values, but its R TC values were generally larger than CYGNSS.

C. In situ Validations
Taking the large heterogeneity of SM in the coarse-scale sparse networks and the nonadditive property of correlations [61] into account, as done in the previous validation [33], [62], Table VI shows the values of median ubRMSE, median R between satellite measurements and in situ observations, as well as their respective standard deviation values.  [44], [63]. CYGNSS also performs poorly in characterizing the temporal dynamics of in situ SM, with the lowest median R of 0.414 compared with LPRM (0.491), ESA CCI Active (0.514), and SMAP (0.621). This result indicates that it is a challenge for CYGNSS to capture the dynamic variations of ground SM. Moreover, note that the poor performance of CYGNSS over the SNOTEL network (median RMSE = 0.090 m 3 /m 3 and median R = 0.073), the reason is that these sites are in the mountains with needle leaf trees, and the P r,eff is significantly impacted by the dense vegetation and topographic roughness.
1) Comparisons Over Different Land Cover Classes: As illustrated in Fig. 7, the ubRMSE (R) over the densely vegetated areas (e.g., woodlands and forests) is generally greater (smaller) than that in the regions with sparse and moderate vegetation (e.g., grasslands and shrublands). CYGNSS has generally smaller ubRMSE values than that of LPRM and ESA CCI Active but larger than that of SMAP in moderately to densely vegetated areas (e.g., grasslands, shrublands, woodlands, and forests). In terms of R, SMAP has the best performance in woodlands and forests, LPRM outperforms other products in barren and shrublands areas, and CYGNSS has a slightly poorer performance in dense vegetation (e.g., shrublands, woodlands, and forest) areas. However, the correlations between the SM observations from satellite and in-situ sites over the barren lands are smaller than that over the moderate to densely vegetated regions. In addition to three reasons in TC analysis for the poor performance over barren lands, these reasons may also explain why the discrepancy exists: 1) The in-situ SM retrievals over barren lands with arid or semiarid climates are easily influenced by precipitation and evapotranspiration, making it difficult to capture the SM dynamic variations. 2) The statistics may have a slight representativeness bias, and it is difficult to accurately depict the characteristics and patterns of SM variability over barren lands by relying upon only eight in situ stations of the PBO, SCAN, and USCRN ISMN networks. It is especially noteworthy that ESA CCI Active performs poorly over shrublands [see Fig. 7(b)] because shrublands were mostly distributed in arid/semiarid areas and ASCAT does not perform well in arid environments [64], [65]. The ESA CCI Active (ASCAT-A/B) satellite SM data and in situ SM data do not have strong correlations, and even a complete lack of correlation does not mean the satellite data is inaccurate [54].
2) Comparisons Over Different Climate Types: As shown in Fig. 8(a), for ubRMSE, CYGNSS performed better in arid and temperate climates than in tropical and cold climates. In general, SMAP outperforms CYGNSS, ESA CCI Active, and LPRM in most climate conditions. As shown in Fig. 8(b), in terms of capturing temporal trends of the in situ SM observations over four climate zones, ESA CCI Active exhibits relatively more volatility than the other three products. Additionally, the temporal performance is consistent with the TC evaluation indicating that all the products are degraded over the cold climate regions. Note that some representative bias may exist in the statistics due to the limited in situ stations.
3) Product Intercomparison Analysis: To present the temporal variation of SM observations from in situ stations and remote sensing satellites, we selected in situ SM sites that satisfy two criteria: 1) the grid pixel should contain at least three in situ stations; and 2) the grid pixel should contain more than 100 samples that both the satellite data and in situ observations available during the validation periods. Based on these two criteria, 20 in situ stations distributed over six grid pixels were selected. The grid pixels were categorized into four land cover classes and three climate zones to assess the performance of the CYGNSS SM product. Table VII shows  The pixel in Bénin is characterized by an equatorial tropical climate and woodlands land cover. The pixel in Niger is distinguished by a typical hot semiarid steppe climate and cropland land cover. As shown in Fig. 9, both the in situ SM measurements exhibit a typical seasonal cycle with distinct wet seasons. The time series of satellite SM retrievals show good agreement with the precipitation fluctuation, and ESA CCI Active captures the precipitation events better than the other remote sensing SM products in both pixels. For capturing the absolute accuracy, CYGNSS achieves the lowest ubRMSE of 0.023 m 3 /m 3 and a larger R of 0.763 at the Niger, but a relatively large ubRMSE of 0.064 m 3 /m 3 and a smaller R of 0.689 at Bénin. This discrepancy proves that the errors of CYGNSS SM product in densely woodlands regions are greater than those in sparsely vegetated regions under similar climatic conditions. The pixel in Madison has a warm temperate climate and grassland land cover, and the precipitation is relatively common throughout the year. As shown in Fig. 10    terms of the number of observations involved in the statistical calculations, the amount of CYGNSS data is approximately twice that of SMAP and LPRM, indicating that CYGNSS has obvious advantages in temporal coverage in Madison. Except for the LPRM SM estimates that were overestimated with a bias of 0.005 m 3 /m 3 , the other SM products at the L-band (SMAP and CYGNSS) and C-band (ESA CCI Active) were characterized by underestimated. The comparison shows that the LPRM product has the worst performance for capturing the variation of in situ measurements over Madison. This result occurs because compared with the X-band, the L-and C-band has a deeper penetrating depth and are less susceptible to the vegetation layer. The pixel in Coconino has a temperate climate with a hot dry summer and forests land cover. As shown in Fig. 11, the amplitude of the SM variation is relatively large. This is probably because the soil is dry and rigid in extremely arid climatic conditions, which may cause dramatic SM variation even with minimal precipitation. In general, the overall accuracy of the satellite SM retrieval is not satisfactory. The time series of the satellite SM variations are not consistent with the in situ SM variations in the dry season, and the R values between the satellite SM estimates and in situ SM observations are relatively low. The poor performance may be explained by the fact that the stations are near open water bodies. Despite the efforts to minimize the impact of water bodies during the SMAP and CYGNSS retrieval, it is inevitable that some permanent water bodies are seasonally covered in vegetation. Open water bodies have a strong passive microwave signal and can produce positive biases in surface SM content retrieval, and seasonal variations in open water may further reduce the accuracy of surface SM retrievals [66]. Similarly, active microwave SM retrievals are also impacted by open water bodies [67].
As shown in Fig. 12, the precipitation events are more frequent at the Yanco and Kyeamba, and all the satellite products are overestimated at the two pixels. At the Yanco, CYGNSS exhibits the best performance for capturing the variations of the in situ SM observations with the largest R-value of 0.693, followed by SMAP (0.634), ESA CCI Active (0.609), and LPRM (0.495). Meanwhile, CYGNSS achieves the best accuracy with the lowest ubRMSE of 0.032 m 3 /m 3 . The ESA CCI Active SM estimates show good agreement with the in situ measurements with an ubRMSE of 0.049 m 3 /m 3 , which is smaller than that of SMAP (0.052 m 3 /m 3 ) and LPRM (0.072 m 3 /m 3 ). CYGNSS performs better than the other products except for the index of R at the Kyeamba. For capturing the temporal variations of the in-situ SM measurements, CYGNSS (R = 0.674) outperforms the LPRM (R = 0.582) but is inferior to the SMAP (R = 0.754) and ESA CCI Active (R = 0.716) products. However, for the metrics of ubRMSE and RMSE, CYGNSS performed better than the other SM products at both pixels.

D. Summary and Discussion
The fact that SD TC (R TC ) values are generally smaller (larger) than the ubRMSE (R) estimations of in situ validation is likely related to the existence of representativeness errors. To the best of our knowledge, the TC method characterizes random errors, and the in-situ validation estimates the system bias [68], [69]. In the TC evaluation, the CYGNSS dataset has relatively low SD TC and R TC values with assumed or underlying truth. The in situ validation also demonstrates that CYGNSS performs poorly in capturing the temporal dynamics. With relatively low SD TC (ubRMSE) and R TC (R) values, which suggests that the CYGNSS retrieval can yield decent estimates of the SM mean but poor estimates of the variability. The possible reason for the CYGNSS performance degradation is that the link between P r,eff and SM may not be a linear relationship, and it is likely inaccurate to assume that the sensitivity of P r,eff to CYGNSS SM does not alter over time [33]. Both limitations may lead to smaller R TC values for CYGNSS in comparison with those of other SM products. In addition, the low correlations may be due to a low bias in SM temporal variance (especially in Northern Africa, the Arabian Peninsula, and the Middle East). Therefore, it suggested that more efforts should be undertaken to improve the CYGNSS retrieval algorithms to obtain better correlations. Relatively small SD TC and ubRMSE, as well as low R TC and R values for CYGNSS over barren areas. One possible reason is that the SM variation in arid areas is too small to detect, and another reason is that the degradation of CYGNSS reflectivity is caused by surface roughness. Relatively large SD TC and ubRMSE values, as well as low R TC and R values for CYGNSS over densely vegetated lands. The reasons may be that the CYGNSS reflectivity greatly degraded over densely vegetated areas and the fitting-based estimates are constrained by the initial dynamics of SMAP SM datasets. The SMAP modeling and training datasets in densely vegetated areas are weakly related, as a result, the correlations of CYGNSS are weak in the TC evaluation and in situ validation.
Relative to other products, both the TC analysis and in situ validation reveal that SMAP generally performs best across varied land covers and climate classes. SMAP SM data as a high-quality reference dataset is necessary and scientific for calibrating the CYGNSS reflectivity observations to SM in the current retrieval approaches. However, the SMAP SM product also has its own intrinsic instrumental and retrieval errors and the errors also propagate into the CYGNSS retrievals.

IV. CONCLUSION
The CYGNSS SM dataset derived from reflectance signals is a new SM product with the inherent advantage of high temporal resolution at a quasi-global scale. However, the error characteristics of this new SM product are largely unknown. In this article, we conducted the first comprehensive evaluation of the CYGNSS SM product from 2017 to 2020 based on TC analysis and in situ validation. We investigated the relative strengths and weaknesses of the CYGNSS SM product through comparison with the SMAP, LPRM, and ESA CCI Active SM products from three perspectives: spatiotemporal variations, TC analysis, and in-situ validation over varied land covers and climates.
The CYGNSS and SMAP SM datasets displayed similar variations along the longitudinal and latitudinal directions and across arid/semiarid and wet regions. The TC evaluation indicated that CYGNSS has a relatively low median SD TC (0.024 m 3 /m 3 ) and a low median R TC (0.419). The CYGNSS SM retrievals were validated against observations from 251 in situ stations with a median ubRMSE of 0.057 m 3 /m 3 (standard deviation = 0.025 m 3 /m 3 ) and a median correlation coefficient of 0.414 (standard deviation = 0.260).
The CYGNSS SM product has certain relative advantages and limitations in comparison with the other three microwave remote-sensing SM products. For the metrics of SD TC of TC analysis and ubRMSE of in situ validation, the general performance of the CYGNSS SM product is comparable to SMAP but better than LPRM and ESA CCI Active. For TC-based correlations R TC and Pearson correlation R of in situ validation, the overall performance of CYGNSS is inferior to the other three SM products. Both the inter-product comparisons of the TC analysis and in situ validation revealed that the CYGNSS SM product was characterized by small SD TC and ubRMSE values but performed poorly in capturing the temporal dynamics of SM variability.
In addition, the considerable performance degradation for CYGNSS capturing the SM temporal variability over barren areas including in Northern Africa, the Arabian Peninsula, and Central Australia with arid/semiarid climates, and forested regions including in eastern South America, the Indo-China Peninsula, and Southeastern China with temperate/tropical climates. Since the CYGNSS algorithms and products are being continuously refined, this suggests that capturing SM temporal variations over barren and forest regions is a key priority to improve CYGNSS SM algorithms. XianYun Zhang received the B.Sc. degree in engineering surveying and M.Sc. degree in geodesy from Chang'an University, Xi'an, China, in 1997 and 2008, respectively.
Since 2009, he has been an Associate Professor with the School of Mining, Guizhou University. He teaches GNSS satellite measurement principle and application, and three-dimensional laser scanning and application. He was involved in a number of scientific research projects, including GNSS precision positioning and GNSS-R soil moisture inversion. His main research interests include GNSS precise point positioning, GNSS-R soil moisture inversion, and multisource remote sensing data fusion environmental parameter inversion.
Cheng Tong received the B.Sc. degree in geographic information science from the Anhui University of Science and Technology, Huainan, China, in 2017. He is currently working toward the Ph.D. degree in agricultural remote sensing with the Institute of Applied Remote Sensing and Information Technology, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, China.
His research interests include active and passive microwave remote sensing of soil moisture and nighttime light remote sensing.
Sinan Li received the M.Sc. degree in land resource management from Yunnan University, Kunming, Yunnan, in 2020. He is currently working toward the Ph.D. degree in agricultural remote sensing with the Institute of Applied Remote Sensing and Information Technology, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, China.
His research interests include territorial space planning, sustainable agriculture, urban development, and nighttime light remote sensing.
Ke Wang (Member, IEEE) received the Ph.D. degree in agricultural remote sensing from Zhejiang University, Hangzhou, China.
He is currently a Professor with Zhejiang University. His research interests include land-use planning and management, digital agriculture, and geographic information system.