Generalized Fine-Resolution FPAR Estimation Using Google Earth Engine: Random Forest or Multiple Linear Regression

Accurate estimation of fine-resolution fraction of absorbed photosynthetically active radiation (FPAR) is urgently needed for modeling land surface processes at finer scales. While traditional methods can hardly balance universality, efficiency, and accuracy, methods using coarse-resolution products as a reference are promising for operational fine-resolution FPAR estimation. However, current methods confront major problems of underrepresentation of FPAR-reflectance relations within coarse-resolution FPAR products, particularly for densely vegetated areas. To overcome this limitation, this article has developed an enhanced scaling method that proposes an outlier removal procedure and a method weighting the selected samples and models FPAR through weighted multiple linear regression (MLR) between the coarse-resolution FPAR product and the aggregated fine-resolution surface reflectance. Meanwhile, a random forest regression (RFR) method has also been implemented for comparison. Both methods were particularly applied to Landsat 8 OLI and moderate resolution imaging spectroradiometer (MODIS) FPAR data on the Google earth engine. Their performance was tested on a regional scale for an entire year. The results of the enhanced scaling method were closer to the in situ measurements (RMSE = 0.058 and R2 = 0.768) and were more consistent with the MODIS FPAR (RMSE = 0.091 and R2 = 0.894) than those of the RFR, particularly over densely vegetated pixels. This indicates that a well-designed simple MLR-based method can outperform the more sophisticated RFR method. The enhanced scaling method is also less sensitive to the number of training samples than the RFR method. Moreover, both methods are insensitive to land cover maps, and their computation efficiency depends on the number of images to be estimated.


I. INTRODUCTION
T HE faction of absorbed photosynthetically active radiation (FPAR) is defined as the fraction of photosynthetically active radiation (PAR) absorbed by vegetation, where PAR refers to the incoming solar radiation within the spectral range of 0.4-0.7 μm [1]. It provides critical information on vegetation photosynthesis for modeling the energy-and-carbon exchange between the biosphere and atmosphere. Numerous land surface process models, e.g., soil-vegetation-atmosphere transfer models [2] and production efficiency models [3], require spatially explicit FPAR data as input to calculate the vegetation productivity, energy budget, and carbon transfer [4]. As the only efficient approach to acquire large-scale spatially continuous FPAR [5], accurate estimation from satellite data is increasingly important [6]. While many global satellite FPAR products are currently available at coarse spatial resolutions (250 m-1 km) [7], [8], [9], fine-resolution FPAR products are urgently needed for modeling finer-scale land surface processes and impacts [6], [10], [11]. However, despite the many methods developed for FPAR estimation from satellite data, they still have a long way to go from algorithm development to operational fine-resolution FPAR estimation in terms of algorithm generalization, computation efficiency, accuracy, and robustness.
Traditional methods for FPAR estimation include two main groups: physical and empirical methods. Physical methods use radiative transfer equations [7], gap probability [12], or recollisional probability [2], [13] to model the relations between FPAR and canopy reflectance through a physical description of the light transfer process. Physical methods have been successfully used in generating global operational FPAR products, e.g., moderate resolution imaging spectroradiometer (MODIS) leaf area index (LAI)/FPAR products [7], but they can hardly adapt to fine-resolution satellite data because they require many input parameters, such as the clumping index and soil albedo. Empirical methods model the linear or nonlinear relations between reflectance and FPAR from radiative transfer simulations [14], [15], [16] or in situ measurements [17]. They are simple and efficient and have been widely used at local scales, but they may experience major difficulties in extending an established empirical model to wider areas or longer time periods.
Recently, an alternative approach has been to model the complex relations between reflectance and vegetation parameters using operational coarse-resolution products as a reference through This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ statistical learning methods. Multiple linear regression (MLR) was initially used to estimate Landsat LAI [18], and then the random forest regression (RFR) method was also used to estimate LAI from Landsat data [19] or FPAR from sentinel-2 data [20] using MODIS LAI/FPAR products as a reference. These methods inherited the sound physical foundations of the operational coarse-resolution products [7], [8], [9], but their major challenge is to accurately represent the complex FPAR-reflectance relations contained in the operational coarse-resolution FPAR product that are highly variable in space and time and may not adapt to fine resolutions due to the scale effect [21], [22]. Both the MLR in [18] and RFR in [19], [20] selected homogeneous MODIS pixels from multiple images to capture the LAI-reflectance relations and avoid scaling issues. Considering the difference in FPAR-reflectance relations between different solar zenith angles (SZAs) on different dates, Wang et al. [23] proposed a scaling method to estimate fine-resolution FPAR via MLR learning from each pair of MODIS FPAR and Landsat images. However, these methods still face the problem of underrepresentation of FPAR-reflectance relations over densely vegetated areas, where fewer homogeneous MODIS pixels can be selected and the fine-resolution reflectance saturates with increasing vegetation density. This underrepresentation results in the underestimation of high FPAR values and prevents the further application of these methods to the operational estimation of fine-resolution FPAR. Therefore, a generalized method for fine-resolution FPAR estimation is urgently needed.
In addition, although both the RFR and MLR methods have been successfully used in retrieving fine-resolution vegetation parameters, few studies have compared their performance in generating operational fine-resolution FPAR products. The RFR method is advantageous over other traditional machine learning methods, e.g., artificial neural networks, for its capability of dealing with high dimensional problems and high computation efficiency, while the MLR method is simple, efficient, and can work with very few samples. The RFR method was found to outperform the simpler MLR method in downscaling MODIS LST products [24], [25], but the performance of a statistical learning algorithm depends on many factors. Whether the more complex RFR method outweighs the simpler MLR method remains an unsolved issue.
Therefore, this article aims to develop a generalized method for the operational estimation of fine-resolution FPAR based on the scaling method proposed by Wang et al. [23], hereafter referred to as the enhanced scaling method. To overcome the underrepresentation of FPAR-reflectance relations over densely vegetated pixels, the enhanced scaling method proposed a method weighting the selected samples, and modeled the FPARreflectance relations through a weighted MLR method. Other improvements include the self-adaptive selection of representative samples and an outlier removal procedure. The RFR method was also developed for comparison, and the retrieval chain was modified from the RFR method used for LAI estimation in [19]. Both the MLR-based enhanced scaling method and RFR method were implemented on the Google earth engine (GEE) to provide operational fine-resolution FPAR products. The sensitivity of the two methods to the number of training samples and to different land cover maps and their computation efficiency were investigated in detail. The performance of the two methods was compared over a large area covering more than 50 000 km 2 and during the entire year from September 2020 to August 2021. The results were evaluated against the MODIS FPAR and in situ FPAR measurements. This article expects to promote the operational estimation of fine-resolution FPAR, enhance our understanding of the performance of statistical learning methods, and provide a reference for retrieving other land surface parameters using a similar approach.

A. Study Area
The study area is the Guanzhong area in Shaanxi Province, China, covering a total area of 55 623 km 2 (see Fig. 1). The Guanzhong area features complex terrain and spans a wide range of elevations. The northern part belongs to the southern edge of the Loess Plateau, the southern part is the steep Qinling Mountains, and the Weihe River runs from west to the east in the central parts. The land cover types in the study area vary, including forests, croplands and built areas. Thus, the study area is representative for testing the performance of the two methods on a regional scale.
B. Data Set 1) Landsat 8: A total of 86 scenes of Landsat 8 OLI images with cloud coverage less than 10% from September 2019 to August 2021 were selected on the GEE as atmospherically corrected surface reflectance data in the UTM/WGS 84 projection (see Fig. 2). To implement the RFR method, Landsat data from September 2019 to August 2020 were used for training, and those from September 2020 to August 2021 were used for validation. The enhanced scaling method was only implemented from September 2020 to August 2021. 2) MODIS LAI/FPAR Product: The MODIS Version 6 LAI and FPAR product (MCD15A3H) at four-day intervals and 500 m spatial resolution [26] was used in this article. To match the selected Landsat images spatially and temporally, the MODIS FPAR images at the dates of available Landsat images were selected and resampled to 480 m resolution on the GEE. 3) Land Cover Maps: According to [19], the land cover maps used in the RFR method were supposed to be similar to the MODIS land cover product (MCD12Q1) used in MODIS FPAR estimation. For this reason, the finer resolution observation and monitoring of global land cover (FROM-GLC) product at 30-m resolution in 2015 [27] was used as input for the RFR method. To address the discrepancy in classification schema and biomes between the FROM-GLC and MCD12Q1 data, the land cover categories of the FROM-GLC data were analyzed and recoded carefully in comparison to the MCD12Q1 data using the method described in [19]. Finally, we defined eight biomes   The SunScan Canopy Analyzer [28], [29] was used to measure FPAR at each sampling point by measuring the incident and reflected PAR above and below the canopy layer [30] for crops (i.e., maize and wheat) in the study area. For high trees, the fraction of intercepted PAR was measured as the approximation of FPAR by measuring the incident PAR in nearby open areas and under the canopy.
To match the satellite data in time and space, the measured FPAR was temporally normalized to the time of satellite overpass using the method in [30], and those within the same Landsat pixel were averaged to the 30-m resolution. Finally, a total of 63 in situ FPAR measurements were acquired to validate the Landsat FPAR retrievals.

A. Enhanced Scaling Method
The core of the MLR-based scaling method [23] is to explicitly linearize the complex relations between MODIS FPAR and Landsat surface reflectance. The FPAR is modeled as a linear combination of multiband reflectance, which varies with land cover types and time, but is consistent across different spatial resolutions. As a result, a specific MLR model can be trained for each land cover type from each pair of MODIS FPAR and Landsat images as follows: where F c,t is a 1 × l matrix that includes MODIS FPAR values of l selected samples for land cover type c on date t, P c,t is a (n b +1) × l vector matrix that includes the reflectance of n b bands plus a constant term of l selected samples, A c,t is a (n b +1) × 1 vector matrix that represents the algorithm coefficients (including the constant term) for a specific linear retrieval model, and ε is the residual error between the prediction and true value. The implementation of the scaling method includes four major steps: classifying the fine-resolution images automatically using the K-means algorithm; selecting the best-quality and homogenous MODIS FPAR retrievals and the corresponding aggregated fine-resolution surface reflectance to compose the training samples; determining the coefficients of the linear model at 480 m resolution through the MLR for each land cover type on each date; and estimating the fine-resolution FPAR using the established MLR model. However, applying the existing scaling method to operational estimation of fine-resolution FPAR confronts a major problem of underrepresentation of densely vegetated pixels that have high FPAR values and relatively high heterogeneity due to growth differences. This problem leads to the underestimation of FPAR for densely vegetated pixels. Particularly when MODIS FPAR is very high (>0.8), the retrieved Landsat FPAR rarely exceeds 0.8. This underestimation problem can be attributed to a series of factors, including: using only the best-quality MODIS FPAR retrievals derived from the main algorithm under unsaturated conditions; the constant standard for selecting homogeneous MODIS pixels without considering seasonal variations; interference from invalid samples that have disproportionate MODIS FPAR and Landsat normalized difference vegetation index (NDVI) values [19]; and the saturation of Landsat reflectance for dense vegetation.
To address the underrepresentation problem, the enhanced scaling method made improvements in the following aspects: First, the enhanced scaling method utilizes high-quality MODIS FPAR retrievals from the main algorithm under both saturated and unsaturated conditions, which have quality flags lower than 50 (QC ≤ 50). This would include more saturated samples of dense vegetation.
Second, the enhanced scaling method proposes a self-adaptive standard for selecting homogeneous MODIS pixels. The mean band-average coefficient of variation (CV) of the surface reflectance is used as the threshold. A MODIS pixel with a CV value less than the mean CV is regarded as pure. Since spatial heterogeneity varies with vegetation growth, the mean CV represents the dominant spatial heterogeneity at a given time, which can help select adequate samples to include the growth variations within the same land cover type.
Third, the enhanced scaling method proposes a 3σ-based method to remove invalid samples, which have Landsat NDVI values falling outside the normal range of MODIS FPAR values. Specifically, the selected samples are sorted ascendingly by MODIS FPAR and divided into different groups by 0.02 intervals. For each group, the mean (μ) and standard deviation (σ) of Landsat NDVI are calculated. The samples having Landsat NDVI values outside the 95% confidence interval [μ-2σ, μ+2σ] are removed as outliers. As a result, invalid samples with disproportionate Landsat NDVI and MODIS FPAR values, especially those with low MODIS FPAR but high Landsat NDVI values, are removed, allowing the underrepresentation of FPAR over densely vegetated pixels to be improved. This 3σ-based outlier removal method can remove exceptional values while maintaining the normal values. The method requires sufficient samples (i.e., >10) within each MODIS FPAR group. When the number of samples within each group is < = 10, the neighboring groups can be combined or the 0.02 interval can be increased.
Fourth, saturation of Landsat surface reflectance is a major cause of the underrepresentation of the FPAR-reflectance relations in densely vegetated areas [19], [20]. This saturation is represented by the very small increase in Landsat NDVI with increasing vegetation density after Landsat NDVI > 0.8. Due to the saturation problem, the Landsat FPAR values estimated by the scaling method rarely exceed 0.9 [23].
To overcome this saturation problem, the enhanced scaling method proposes a weighting method to increase the weights of samples with very high MODIS FPAR values. The weights are calculated as follows: where W t represents the weight for each sample and θ t is the ratio of samples with MODIS FPAR > =0.9 occupying the samples with MODIS FPAR > =0. 8. When θ t = 0 or θ t = 1, indicating that no samples have MODIS FPAR > = 0.9 or all samples with MODIS FPAR > =0.8 are also > = 0.9, the weights remain at 1. When θ t > 0, the value of W t ranges from 1 to 2; the lower θ t is, the greater the weight of the samples with MODIS FPAR > = 0.9. Based on the weighting procedure, the enhanced scaling method employs a weighted MLR approach to model the FPARreflectance relations.
The enhanced scaling method inherited the four major steps of the scaling method and is primarily modified in selecting, screening, and weighting samples. The input bands of the model include green, red, near-infrared (NIR), shortwave infrared 1 (SWIR1) and shortwave infrared 2, while the blue band was excluded for its low signal-to-noise ratio after atmospheric correction [32], [33], [34], [35]. The enhanced scaling method is self-adaptive and implemented as an automated process on GEE (sample codes are available at https://code.earthengine. google.com/aec5d7e00b8b98d358b803ad0ed8fb2b), as shown in Fig. 3.

B. Random Forest Regression
The RFR method was initially proposed by Breimans [36] as a nonlinear statistical ensemble method to prevent overfitting [37]. The RFR builds multiple unrelated decision trees through a random selection of features and bootstrap sampling from the entire training set [20]. Each decision tree can yield a prediction, and the final output of an RFR is the mean of all predictions from all individual decision trees. The RFR method used for fine-resolution FPAR estimation was inherited from the method used for fine-resolution LAI estimation in [19] (sample codes are available at: https://github.com/yanghuikang/Landsat-LAI) and implemented on the GEE because it is more generalized for regional applications than that designed for cropland in [20].
To ensure the representativeness of the samples, the RFR in [19] designed a complete workflow of the sample generation process, which was also used in this article and involved five major steps as follows.
1) Spatially representative MODIS pixels were selected as those within which more than 90% of the 30-m FROM-GLC pixels belonged to a single land cover type, hereafter referred to as "spatial samples." 2) For each spatial sample, high-quality FPAR retrievals for homogeneous MODIS pixels were screened using the criteria of a quality flag below 50 (QC ≤ 50) and a band-average CV value below 0.15. 3) Within the homogeneous samples, outliers were removed as the samples whose NDVI fell outside the 1.5 interquartile range (IQR) from the first and third quartile for each FPAR interval. 4) All the selected samples were combined, and the sample balance among different FPAR values was controlled. We calculated the sample density in space and down-sampled densely populated areas to make the sample distribution more uniform. 5) The samples were further screened to remove confounding unsaturated and saturated samples that share similar spectral signatures from Landsat surface reflectance but contrasting FPAR values, hereafter referred to as "saturation screening process." A random classifier was trained to identify saturated and unsaturated samples based on Landsat surface reflectance in green, red, NIR, SWIR bands, NDVI, normalized difference water index (NDWI), and solar illumination angles. Separate models were built for different land cover types. Based on ten-fold crossvalidation, the misclassified samples were removed from the samples. Finally, a total of 124 509 samples were finally selected, which were distributed evenly across the entire study area and across the range of MODIS FPAR values, as shown in Fig. 4.
Considering the differences in the FPAR-reflectance relations between different land cover types, we built random forest (RF) models for each land cover type. Model inputs include surface reflectance in the green, red, NIR, and SWIR1 bands, NDVI, NDWI [38], geographic coordinates (longitude, latitude) of the pixel center, and solar zenith and azimuth of the scene center. In the RFR training, the number of regression trees was set to 100, the minimum leaf population was 50, and the number of variables for each segmentation was 5.

C. Validation and Comparison
Both the MLR-based enhanced scaling method and the RFR method were implemented on the GEE for estimating Landsat FPAR from September 2020 to August 2021. The results were validated using in situ FPAR measurements and compared with MODIS FPAR products. The root-mean-square error (RMSE), mean absolute error (MAE), and coefficient of determination (R 2 ,) were used to evaluate the accuracy with in situ data and the consistency with MODIS FPAR data [20], [23], [39]. The performance of the two methods was also evaluated for homogeneous and heterogeneous pixels.
Moreover, the sensitivity of both methods to the number of training samples was investigated. For the RFR method, varying numbers of images were inputted for training, and the Landsat FPAR images were estimated and evaluated for the entire year from September 2020 to August 2021. As the enhanced scaling method only requires one single pair of Landsat-MODIS images, the Landsat image with p127/r036 on May 30, 2021, was used, and the performance of the method was investigated by using a varying number of samples, which were randomly selected from the entire training set, for the MLR process.   and histograms reveal that the RFR underestimated high FPAR values in spring (May) and summer (August) and overestimated low FPAR values in winter (February), while the results of the enhanced scaling method were closer to the MODIS FPAR in spatial details and absolute values. Fig. 6 compares Landsat FPAR retrievals using the two methods with the MODIS FPAR image in the test area on August 2, 2021. Investigation of spatial details over the subareas of cropland and woodland clearly shows that the enhanced scaling method accurately restored the spatial patterns of MODIS FPAR, while the RFR underestimated high FPAR values. The histograms also show that the results of the MLR-based enhanced scaling method were highly consistent with the MODIS FPAR, while the RFR underestimated high FPAR values. Entirely built upon the selected samples, the RFR may experience major problems in densely vegetated heterogeneous areas, where fewer samples could be selected and averaging over a large number of samples may underrepresent pixels with too little or too much vegetation.

A. Consistency With MODIS FPAR
Since both the RFR and enhanced scaling method rely on pure MODIS pixels to model the relations between FPAR and reflectance [22], [40], the estimated Landsat FPAR of both methods for the entire year from September 2020 to August 2021 were compared with MODIS FPAR in homogeneous MODIS pixels and heterogeneous MODIS pixels, as shown in Fig. 7. For both methods, the density contours of homogeneous pixels were generally more concentrated around the 1:1 line than those of heterogeneous pixels. The contour lines of the enhanced scaling method bulged out in the direction of the 1:1 line very well and were more concentrated than those of the RFR for all pixels. The contour lines show that the RFR overestimated low FPAR values (i.e., <0.5) and underestimated high FPAR values (i.e., >0.8) for all pixels. This indicates that the RFR model built on historical data may not adapt to new data. In Fig. 7(a) and (b), a few homogeneous pixels with MODIS FPAR values ranging from 0.4 to 0.7 derived high Landsat FPAR values of approximately 0.8. These pixels were removed from sample screening as outliers but were kept in the validation. Compared with the MODIS FPAR products, the Landsat FPAR retrievals of the enhanced scaling method (overall RMSE = 0.076, MAE = 0.052, and R 2 = 0.887) were more accurate than those of the RFR method (overall RMSE = 0.091, MAE = 0.068, and R 2 = 0.819). Fig. 8 shows the retrieved Landsat FPAR values and the MODIS FPAR values as a function of day of year (DOY) from September 2020 to August 2021. In general, the retrieved Landsat FPAR values from the enhanced scaling method were more consistent with the MODIS FPAR values than those of the RFR in terms of the mean values, variations and RMSE for all three vegetation types. The RFR overestimated the FPAR values during the pregrowth and senescence stages for all three vegetation types. During peak growth stages, the RFR underestimated the FPAR values for the woodlands but overestimated the FPAR values for the grasslands and croplands. Such bias can be attributed to the adaptability of the RFR model built on training samples to the new data. In contrast, the MLR-based enhanced scaling method performed best over croplands, slightly underestimated the FPAR values for the woodlands and overestimated the FPAR values for the grasslands during peak growth stages. Such bias might be related to the high spatial heterogeneity in the grasslands during the peak season, as shown by the large standard deviations in the MODIS FPAR and Landsat FPAR retrievals in Fig. 8.

B. Validation With in Situ FPAR
The scatter plots in Fig. 9 reveal that the FPAR values estimated by both the RFR and the enhanced scaling method were generally consistent with the in situ measurements. This can be attributed to the high accuracy of the MODIS FPAR data over croplands, as validated in previous studies [30], [41],   [42]. The FPAR estimated by the RFR method saturated at 0.75 when the in situ FPAR exceeded 0.7, while the enhanced scaling method accurately estimated the FPAR values, which were very close to the 1:1 line from low to high values. This indicates that the MLR-based enhanced scaling method can estimate fine-resolution FPAR values more accurately.

C. Sensitivity to Training Samples
As a data-driven method, RFR relies on a large number of samples to model the FPAR-reflectance relations. The impacts of the number of training samples on the RFR were thus investigated by varying the number of input images for training. The trained RFR models were applied to the study area for the entire year from September 2020 to August 2021. Fig. 10(a) shows the change in the accuracy of the RFR with varying numbers of input images for training. A rapid increase in the accuracy was observed when the number of input images increased from 1 to 7, while the number of screened samples increased from 850 to 6000. The accuracy was then stabilized when the number of input images increased from 10 to 20, corresponding to at least 7400 training samples. For 20 to 35 input images, the accuracy remained at a relatively high level, corresponding to at least 29 000 training samples. However, when the number of input images was 35, a continual increase in the number of input images reduced the accuracy. This is because adding different training samples can cause different effects. Adding a peak-season image will add more saturated samples, which can improve the accuracy of saturated pixels but reduce the accuracy of unsaturated pixels. Consequently, the final effects of adding a peak-season image depend on the relative proportions of saturated and unsaturated pixels in the images to be predicted. Therefore, when the number of input images was 35, two peak-season images were added, and the decrease in the accuracy of unsaturated pixels was greater than the increase in the accuracy of saturated pixels, resulting in an overall decrease in the accuracy. This indicates that the RFR is very sensitive to training samples. Fig. 10(b) shows the change in the accuracy of the MLR-based enhanced scaling method with varying numbers of samples. When the number of samples increased from 100 to 600, the RMSE decreased from 0.095 to 0.088, and the R 2 , increased from 0.772 to 0.798. When the number of samples was > 600, the accuracy of the MLR was very stable. This indicates that MLR requires fewer samples and is less sensitive to the training samples.

A. Sensitivity to Land Cover Types
Both the RFR and MLR methods require land cover maps to establish land cover-specific models for estimating fineresolution FPAR. Relevant research shows that the uncertainty of land cover maps will affect the retrieval accuracy [43], [44], [45], [46]. Thus, we investigated the sensitivity of both methods to land cover maps by using different land cover maps. For both methods, the same Landsat image with p127/r036 on May 30, 2021, was used and classified into five land cover types using three different methods, including the unsupervised K-means clustering algorithm, classification and regression tree (CART) algorithm and RF algorithm. For the CART and RF supervised algorithms, two sets of training samples were used: manually selected training samples and randomly generated samples. Therefore, including the FROM-GLC land cover map, six land cover maps were available for both the RFR and MLR methods.
For the RFR method, the FROM-GLC land cover map was used as the benchmark land cover map. Using the benchmark land cover map as ground truth, the accuracy of the other five land cover maps ranged from 53% to 82%, as given in Table II. To maintain the same study area as the enhanced scaling method, the training samples for the RFR on the Landsat image with p127/r036 were selected from the entire set of training samples. Each land cover map was used to update the land cover types of the selected samples.
For the enhanced scaling method, the land cover map previously derived using the unsupervised K-means clustering algorithm was used as the benchmark land cover map. The accuracy of the other five land cover maps ranged from 53% to 91%, as given in Table II. For both methods, each land cover map was used as input to generate a new FPAR image. The FPAR image corresponding to the benchmark land cover map was used as the benchmark FPAR image to evaluate the other FPAR images. The statistical metrics, including RMSE, MAE, and R 2 , were computed for the respective pairs of FPAR images estimated using the different land cover maps. The magnitude of the difference in FPAR (FPAR) between the new and benchmark FPAR images was examined. Table II compares the FPAR images using the different land cover maps. The table data show that large discrepancies existed between different land cover maps, as the overall Kappa coefficients between the respective pairs of land cover maps varied from 0.43 to 0.87. Generally, the differences in FPAR were small for both the RFR and enhanced scaling methods, with RMSE ranging from 0.034 to 0.037 and from 0.025 to 0.038, respectively. The R 2 values exceeded 0.966 for all cases, indicating strong correlations between the different FPAR images using different land cover maps. The pixels with FPAR < 0.05 occupied more than 89.98% in the results of the RFR and 91.04% in the results of the enhanced scaling method. All these statistics reveal that the different land cover maps would result in limited differences in FPAR retrievals for both the RFR and enhanced scaling methods. Moreover, the differences in the FPAR images are positively related to the consistencies between the respective pairs of land cover maps. For example, in cases 4, 8, and 9, the land cover maps with an overall accuracy exceeding 80% resulted in very low RMSE values between the respective pairs of FPAR images.
The results of the enhanced scaling method were generally more accurate than those of RFR, while case 10 using the FROM-GLC land cover map in the enhanced scaling method is an exception that derived the lowest accuracy. This indicates that the use of a different land cover map from a different date can significantly reduce the accuracy of the enhanced scaling method.

B. Computation Efficiency
Both the RFR and enhanced scaling methods were automated and streamlined on the GEE, and their computation efficiency was further investigated. Fig. 11 compares the computation time for predicting varying numbers of images using the RFR and enhanced scaling methods. For the RFR method, the input images for training were kept the same as in Section III-B, and the time for predicting different numbers of images was recorded. The time used by the RFR includes the time for screening samples, training the model, and estimating the FPAR for new images. As shown in Fig. 11, the RFR used 100 min for screening samples, while it used a half minute for estimating the FPAR for one Landsat image. For the MLR-based enhanced scaling method, the processes of sample screening, MLR training, and FPAR estimation are completed on one single pair of MODIS-Landsat images, which in all took 1 min. Thus, the time used by the RFR and enhanced scaling method was equal when predicting approximately 200 images. This offers us insights into choosing an optimal method. MLR-based enhanced scaling is highly efficient for predicting fewer images, while RFR is optimal for predicting large numbers of images. However, although both methods were implemented on the GEE, the enhanced scaling method is more automated than the RFR by excluding all human interference. In contrast, the sample balancing in the RFR requires an overall analysis of the screened samples to decide whether and how the samples are balanced. The time of this process was not included in the comparisons above.

C. Improvement Over the Scaling Method
To validate the improvements over the existing scaling method, the results of the existing scaling method were also compared with the MODIS FPAR for the entire validation period and with the in situ FPAR measurements, as shown in Fig. 12. It clearly shows that the existing scaling method significantly underestimated FPAR for densely vegetated pixels with FPAR>0.75 compared with MODIS FPAR and in situ FPAR measurements. The Landsat FPAR retrievals from the existing scaling method were less consistent with the MODIS FPAR and less accurate than the enhanced scaling method.

D. Advantages and Disadvantages of Both Methods
Both the RFR and enhanced scaling methods have been successfully applied to FPAR estimation on the GEE, while both methods have their advantages and disadvantages. Both the RFR and enhanced scaling methods can inherit the high accuracy and wide applicability of the main algorithm of the MODIS FPAR product and are easily automated on the cloud computing platform.
To increase the representativeness of the FPAR-reflectance relations in the MODIS FPAR products, both the RFR and MLR methods used high-quality FPAR retrievals over homogeneous MODIS pixels and involved an outlier removal process. The difference is that the sample screening in the RFR is more rigorous and complex. First, the RFR adopted a higher standard (e.g., a lower CV threshold) for homogeneous MODIS pixels because the nonlinear RFR algorithm built over heterogeneous MODIS pixels cannot be applied to Landsat pixels due to scale effects [21], [22]. Second, the proposed 3σ-based outlier removal method in the enhanced scaling method is less rigorous. In our experiments, the 3σ-based method removed fewer outliers than the IQR method in the RFR method. In statistical learning models, a few outliers within the samples can enhance the generalization ability of the model [31]. Third, the RFR involved two additional processes of sample balancing and saturation screening because an unbalanced sample set and spectral similarities between saturated and unsaturated samples would undermine the skill of the RFR [19], [47]. In contrast, as a linear algorithm, the enhanced scaling method used self-adaptive CV thresholds to define homogeneous MODIS pixels, which can capture the major within-class spatial variations and seasonal variations. A weighted MLR procedure was proposed to increase the representativeness of densely vegetated samples.
To analyze the effect of sample screening on the RFR, we took a subarea covering the Landsat scene p127/r036 and used the sample screened by the enhanced scaling method to train the RFR. A total of 323362 training samples were screened from the 5 Landsat images from 2019 to 2020 using the samples screened by the enhanced scaling method, which is much more than the 124 509 samples screened by the RFR method. The trained RFR models were used to estimate Landsat FPAR images from 2020 to 2021, and the results were shown in Fig. 13. Compared to the results of the previously trained RFR, using the samples screened by the enhanced scaling method slightly improved the values of RMSE and R 2 for the RFR, while the overestimation of low FPAR values and the underestimation of high FPAR values were even more significant. This indicates that the sample screening strategy in the enhanced scaling method does not improve the performance of the RFR, which may be attributed to the following reasons.
First, sample balancing and saturation screening are important for the RFR to achieve optimal accuracy [19], but are not included in the enhanced scaling method. The sample set screened by the enhanced scaling method generally has a much larger size and a higher proportion of unsaturated samples (i.e., 50% and 65% for the RFR and enhanced scaling method for the woodland, respectively), which improves the overall prediction accuracy but does not improve the underestimation of high FPAR values.
Second, the definition of homogeneous MODIS pixels is more rigorous in the RFR than in the enhanced scaling method. Consequently, many samples screened by the enhanced scaling method have larger subpixel spatial variations, particularly for densely vegetated areas, where scale effects possibly occur [21], [22] and undermine the performance of the RFR.
Third, the algorithm design might be a major cause for the prediction bias in the RFR method. The RFR averages the predictions of all candidate decision trees as the final output, which tends to produce a moderate value instead of a very high or a very low value [48], [49]. Previous studies found similar biases for low and high values in downscaling LST satellite products and attempted to correct the residuals using the coarse-resolution product [50], [51]. However, such correction is performed at coarse resolution, thereby causing a block effect in the fine-resolution images, and was not considered in this article.
Therefore, the enhanced scaling method is even simpler, more accurate, and less sensitive to the training samples than the RFR method based on better representation of FPAR-reflectance relations in densely vegetated areas. The enhanced scaling method is more ready-to-use for requiring only one pair of MODIS FPAR and Landsat images, yet the FPAR-reflectance relations trained from one image pair can only apply to the image pair observed at similar SZAs. In contrast, the RFR requires many input images to cover various dates, locations, SZAs and land cover types to model the complex FPAR-reflectance relations. The major strength of the RFR method is that once built, it can quickly estimate fine-resolution FPAR from satellite images, while the major limitations include prediction biases and higher requirements for training samples. In practical applications when MODIS FPAR values are poorly retrieved on some dates, the RFR can still work, while enhanced scaling can use the MODIS FPAR data at adjacent dates in the same year or similar dates in another year as alternatives to establish the FPAR-reflectance relations.
Therefore, for a specific application, the choice of an optimal method depends on a series of factors. The enhanced scaling method is more ready-to-use with fewer requirements on input data, higher accuracy, and a higher degree of automation, while the RFR method is suitable for continuous estimation of FPAR for hundreds of images in the same study area.

VI. CONCLUSION
In this article, we developed an enhanced scaling method that integrates an outlier removal procedure and a method for weighting the samples in the MLR for operational fine-resolution FPAR estimation using the MODIS FPAR product as a reference. The proposed enhanced scaling method based on weighted MLR was compared to the RFR method on the GEE for regional application in the large Guanzhong area for predicting Landsat FPAR images for one entire year from September 2020 to August 2021. The main conclusions can be drawn as follows.
1) Both the enhanced scaling method and RFR method generate MODIS-consistent and spatially consistent fineresolution FPAR. The RFR method overestimated low FPAR values and underestimated high FPAR values, while the enhanced scaling method outperformed the RFR method. In particular, the underestimation of high FPAR values for densely vegetated pixels was significantly improved in the enhanced scaling method. 2) Both the enhanced scaling method and RFR method were validated with in situ FPAR measurements. The enhanced scaling method performed better than the RFR method, particularly for densely vegetated pixels. Overall RMSE values of 0.049 and 0.046 and overall R 2 values of 0.685 and 0.768 were attained for the RFR and enhanced scaling methods, respectively. 3) The enhanced scaling method is less sensitive to the training samples, while the RFR method has a higher degree of reliance on the training samples. The enhanced scaling method is easier to implement with a minimum requirement of one pair of MODIS-Landsat images and is more accurate than the RFR method over both homogeneous and heterogeneous pixels.