On the Use of Native Resolution Backscatter Intensity Data for Optimal Soil Moisture Retrieval

The accuracy of soil moisture estimated from synthetic aperture radar (SAR) backscatter data at high resolutions is limited by speckle. A common practice to mitigate speckle is to multilook the data prior to retrieving soil moisture. While multilooking indeed reduces speckle, it also decreases the spatial resolution and removes possibly useful high resolution information from the data. We, therefore, hypothesized that using higher resolution backscatter data for soil moisture retrieval would lead to higher retrieval accuracies. A high resolution field study combined with a synthetic experiment showed that calculating soil moisture prior to multilooking to the final target resolution calculate-then-average (CtA) has substantial advantages over the average-then-calculate (AtC) approach. Currently, the AtC strategy is most often applied in soil moisture studies, mainly due to its computational advantage compared to the CtA approach. We show that by making use of a higher source resolution backscatter data than the target resolution, we could improve soil moisture retrieval over an agricultural field.

Space-borne SAR sensors emit microwaves and measure the fraction of radiation scattered back by the Earth's surface. This backscatter intensity can be used to retrieve soil moisture with a forward model. European Space Agency's (ESA) Sentinel-1 constellation currently provides high spatiotemporal resolution SAR data with a ground-range resolution of 20 m and a six-day revisit time until the failure of the B satellite in December 2021. At these high spatial resolutions, SAR backscatter accuracy is limited by the speckle that is inherent to the data. Speckle causes variations in the backscatter intensity that do not necessarily relate to variations in soil moisture. Multilooking the data reduces the speckle and for that reason, soil moisture products are often presented at relatively low resolutions (500-1000 m) [6], [7], [8].
Multilooking is generally performed on the backscatter data (e.g., [9]), rather than multilooking soil moisture data after its inversion (e.g., [10]). This choice has the advantage of higher computational efficiency. However, applying a multilook means assuming that the mean of the surrounding pixels is equal to the central pixel of interest [11]. This assumption does not hold when there is a significant spatial variability in soil moisture or other soil parameters that influence backscatter intensity [12]. Over such heterogeneous surfaces, multilooking of the backscatter not only averages speckle and soil moisture conditions, but also other types of information that are contained in the backscatter signal (land cover, vegetation water content, and roughness), and in turn, they cannot be easily removed from the signal as the information on individual pixels is now lost.
Ma et al. [13] have briefly compared the use of high-resolution backscatter data to the use of multilooked backscatter data for soil moisture estimation. Two strategies were applied and compared ( Fig. 1): the average-then-calculate (AtC) strategy and the calculate-then-average (CtA) strategy. The AtC strategy consists of multilooking backscatter data to the target resolution and then converting it to soil moisture, whereas the CtA strategy consists of computing soil moisture from high-resolution backscatter data, and then multilooking it to the target resolution. Ma et al. [13] found that the CtA strategy showed significantly better results than the AtC strategy, at a spatial resolution of about 125 000 m. In a related study, Satalino et al. [14] used a synthetic dataset to show that increased accuracy of the CtA strategy occurs, especially when model errors are expected to be large. To the best of our knowledge, the difference between the two aggregation strategies has not been tested on a sub-field scale, even though This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the processes underlying the effects of multilooking (e.g., heterogeneity) change with changing spatial resolution. The strategies have also not yet been tested with in situ soil moisture data.
We hypothesize that a substantial loss of information occurs in the AtC strategy compared to the CtA strategy. Hence, reducing the spatial source resolution (i.e., the resolution of data going into the retrieval algorithm) to mitigate the speckle, especially at high target resolutions, can negatively impact soil moisture retrieval accuracy. To test this hypothesis, we performed a synthetic experiment with increasing variation in soil moisture content (SMC) and roughness and applied the two strategies to these data. The experiment was duplicated using in situ data to determine whether the results from the synthetic experiment could be confirmed in a field in southeastern Luxembourg [15].

A. In Situ Soil Moisture Data
A field experiment was performed on a 2.5 ha non-irrigated agricultural field in southeastern Luxembourg [15]. The field was sampled on 20 m resolution under different moisture and vegetation conditions. Data were collected in 38 field visits between March 2020 and June 2021, except during the peak of the growing season, because soil moisture retrieval would be unreliable. The observed soil moisture was taken to be the average of five soil moisture measurements taken at each sampling location with a time domain reflectivity (TDR) probe with 5 cm pins. These measurements were calibrated using volumetric soil samples and linear regression.

B. Field Experiment
For the field experiment, satellite soil moisture was retrieved from descending Sentinel-1 (S1) IW ground range detected (GRD) backscatter data with a 33.5 • incidence angle over the field. Soil moisture was inverted from these data using the MULESME algorithm [6], [15]. MULESME is a multitemporal physical pixel-based algorithm that infers soil roughness and soil moisture from backscatter intensity by inverting the Oh forward model [16]. MULESME assumes that SMC changes considerably faster than surface roughness, and any change in backscatter intensity can therefore be attributed to changes in SMC. Since the algorithm is pixel-based, the soil moisture retrieval in a pixel is independent of that in its neighboring pixels.
SMC maps were retrieved from backscatter intensity data at six different resolutions (20, 40, 60, 80, 100, and 120 m) following the two strategies shown in Fig. 1. For both strategies, backscatter images were preprocessed to a square 20 m pixel. For the AtC strategy, these images were then multilooked into the five lower resolutions, and finally, the SMC was retrieved for all six resolutions. Alternatively, for the CtA strategy, the SMC was retrieved at a 20m resolution, and subsequently multilooked into the different target resolutions.

C. Synthetic Experiments
Three synthetic experiments with increasing data variability were set up to study how the computation strategy impacts SMC retrieval accuracy. A square region of 18 × 18 km (or 360 000 pixels at a 20 m resolution) was used for the synthetic experiments. Synthetic data for the experiments consisted of 15 days of soil moisture, roughness, normalized difference vegetation index (NDVI), land cover, local incidence angle (LIA), and backscatter intensity data. In all three experiments, speckle was added to the synthetic backscatter data, since speckle is inherent to SAR data and does not contain relevant information for soil moisture retrieval. SMC and roughness were varied in time and/or space as shown in Table I. In experiment 1, SMC varied only in time, and roughness was constant in both time and space. Any multilooking, therefore, only removed speckle, and no information on SMC or roughness was lost. In experiment 2, spatial SMC variation was added, so that during multilooking some information on SMC was lost since the resulting aggregated backscatter is an average of different values of SMC. In experiment 3, roughness also varied in space, thereby losing even more information during the multilooking. The synthetic data were used as input data for the MULESME algorithm and processed according to the two different strategies as outlined in Section II-B.
Synthetic data were sampled from either a normal or a uniform distribution, with their parameters derived from in situ (Section II-A) and satellite (Section II-B) field data. Both sets of experiments (i.e., "normal" and "uniform") were performed, because using a uniform distribution leads to increased spatial variability in soil moisture compared to the normal distribution. However, the uniform distribution is less comparable to a ground truth scenario, where neighboring soil moisture pixel values are usually similar. Hence, the results from the "normal" analysis were compared to the in situ data, and the results from the "uniform" analysis were used to demonstrate a more extreme case. Since random sampling was used, the entire experiment was carried out 15 times with different random values to reduce the chance that the presented results are merely the result of a stochastic artifact. The random sampling is described for every variable in Sections II-C1-II-C4.
1) Soil Moisture: For the synthetic SMC dataset based on normal distributions, data were sampled from two truncated normal distributions, one for the temporal variation and one for the spatial variation. The mean of both distributions was set to the spatiotemporal average of the S1 retrieved SMC data over the studied field (i.e., 0.19 m 3 m −3 ) and the distribution was truncated at the fifth and 95th percentiles of the S1 retrieved SMC data (i.e., 0.08 and 0.29 m 3 m −3 ). The standard deviation for the spatial variation was taken to be the same as the spatial standard deviation in the in situ SMC data, and for the temporal variation it was taken to be the same as the temporal standard deviation in the in situ SMC data. For the uniform distributions, the lower and upper limits were set to the same as for the normal distribution: 0.08 and 0.29 m 3 m −3 .
2) Roughness: In the experiments with a constant value for roughness, the spatiotemporal field average roughness was taken from the MULESME runs that were performed for the field study (1.2 cm). In the case of spatially variable roughness, the data were sampled from a normal distribution, based on the same mean, and supplemented with the standard deviation (0.48 cm) and fifth and 95th percentiles (0.50 and 3.8 cm) of the MULESME output. For the uniform distribution, the same percentiles were used as lower and upper limits, respectively.
3) NDVI, Land Cover, and LIA: The synthetic NDVI, land cover, and LIA data were set to be constant in both space and time. NDVI was set at 0.15 to mimic near-bare soil conditions, optimal for soil moisture retrieval. The land cover was set to class 211 [non-irrigated arable land in Corine Land Cover (CLC)], which is the same classification as the field in the in situ study. LIA was set to the field average S1 incidence angle (33.5 • ).
4) Backscatter Intensity: The Oh forward model was then used to infer synthetic vertical-vertical (VV) and verticalhorizontal (VH) backscatter from the sampled values of SMC, roughness, and LIA. The noise was added to all three experiments, based on the noise in the real S1 backscatter data over the field. By inferring the noise from real data, we accounted not only for speckle multiplicative noise but also for thermal noise [17].
The noise was added to the synthetic data by multiplying the synthetic linear backscatter at 20 m resolution by a random sample from a truncated normal distribution with a mean of one and a minimum of zero. The standard deviation of the distribution was derived from backscatter data over pixels that showed homogeneous soil moisture conditions: in that case, any remaining variation in backscatter is most likely caused by speckle. Homogeneous pixels were identified by first selecting days where in situ SMC data had a spatial standard deviation lower than 0.03 m 3 m −3 . Second, the 30% of pixels where the in situ SMC was closest to the field average were selected. Satellite backscatter was then extracted over the selected pixels and their spatial standard deviation was computed. Finally, the median of these standard deviations was used as the standard deviation of the speckle distribution.

D. Analysis
For both the synthetic and the field experiment, two performance metrics were used to compare observed and retrieved soil moisture: the Pearson correlation (R), and the unbiased root mean square error (ubRMSE) [18]. The analysis was performed for both retrieval strategies.
In Section III, the Pearson correlation is not shown for the "uniform" synthetic data. In those cases, when moving to lower spatial resolutions, the soil moisture values in the field converge toward the mean of the uniform distribution. Hence, variation in the data decreases with coarser resolutions and the resulting Pearson correlation cannot be fairly compared between different resolutions.

III. RESULTS AND DISCUSSION
The analysis performed on the "normal" synthetic data [ Fig. 2(a), top two rows] shows that for all source resolutions and experiments, correlation increases and ubRMSE decreases with a coarser target resolution. Differences in performance were found in the results of the two computation strategies CtA (circles) and AtC (triangles). For every target resolution, the CtA strategy outperforms the AtC strategy. This indicates that retrieving soil moisture at fine resolutions prior to multilooking results in higher retrieval performance at both fine and coarse target resolutions. The CtA performance is especially good when the difference between source resolution and target resolution increases, and peaks at the lowest target resolution.
When moving from experiments 1 to 3, i.e., with increasing spatial variability in the data, the performance variations between the different source resolutions grow (Fig. 3). A general decrease in performance is visible, especially in the correlation. The difference between the performance of the two strategies also increases. In experiment 1, no spatial variation in soil moisture and roughness was simulated. We hypothesized that under these conditions, the difference between the performance of the two strategies would be minimal as any spatial variation in backscatter is only due to speckle, and multilooking to mitigate speckle thus indeed only reduces speckle without losing any other type of information. The results confirm our hypothesis, with small but consistent differences in performance between different resolutions and between the AtC and CtA strategies. In experiment 2, where spatial variation in SMC was added, the performance metrics deteriorated compared to experiment 1. Furthermore, the difference between the two strategies as well as the difference between the six source resolutions increased. In this second experiment, multilooking not only averages the speckle but also different values of SMC. This trend continued when spatial variation in soil roughness was added to the simulation in experiment 3: the difference in performance further increased, as hypothesized.
Even stronger patterns were found in the "uniform" synthetic data [ Fig. 2(a), bottom rows]. Again, for all source resolutions and experiments, ubRMSE decreases with a coarser target resolution, and for each target resolution, the best performance is found for data with high source resolutions. In comparison with the normal distribution, the performance differences between the two strategies are larger, as well as the performance differences between the first two experiments. Interestingly though, the difference between the first and third experiments is smaller than the difference between the first and second experiments, contrary to the experiments based on synthetic data with a normal distribution (Fig. 3).
In this idealized synthetic scenario, it is possible that speckle has a smaller effect on the results than in a field experiment.
An analysis based on the field experiment [ Fig. 2(b)] confirms this. However, the patterns found in the synthetic data are still visible in the field experiment data. The best SMC accuracy on any target resolution is obtained when the calculation is performed with backscatter data with a finer source resolution. At the same time, this does not mean that at any target resolution, the highest possible source resolution should be used at resolutions of 80 m or lower, the data with a 40-m source resolution shows better performance than data with a 20 m resolution. This could indicate that at 20 m resolution, the soil moisture signal in the backscatter data was still too weak compared to the speckle to produce consistent soil moisture estimates over the field, and a performance improvement at lower target resolutions was possible by aggregating the source data to 40 m before calculating soil moisture. At high target resolutions (20 or 40 m), using the highest possible source resolution remains the best choice. In summary, the results from the synthetic and field experiments confirm our hypothesis: the CtA strategy leads to better retrieval performance than the AtC strategy. Moreover, we saw that in general, when more information is contained in the backscatter data, more information is lost in multilooking and the resulting performance difference between the two computation strategies increases. The exception to this was found in data derived from uniform distributions. In those data the CtA-AtC performance difference was more pronounced in experiment 2 than in experiment 3, indicating that the added spatial variation in roughness does not have as big of an impact on the results as the added spatial variation in SMC, although it could also have been caused by the numerical setup of the synthetic experiment.

IV. CONCLUSION
We performed a synthetic experiment using Sentinel-1 C-band SAR data to test the performance of two retrieval strategies: CtA and AtC. We hypothesized that the AtC strategy leads to the loss of important information on soil moisture conditions, which would mean that applying the CtA strategy would lead to higher accuracies over the area of interest. Our results showed that, indeed, applying the CtA strategy to native resolution (20 m) Sentinel-1 data led to a smaller ubRMSE and a higher correlation on all tested target resolutions (20-120 m).
The results from the synthetic experiment were confirmed in a 2.5 ha field in southeastern Luxembourg that was intensively sampled for in situ soil moisture conditions. Since in both the synthetic and the field experiment an increase in performance was found even at small resolution gaps between source and target data, we expect that the presented results are also relevant for coarser resolutions and for soil moisture applications on larger scales.