Spatial Downscaling of Vegetation Productivity in the Forest From Deep Learning

Accurately estimating vegetation productivity in the forest areas is important for studying the terrestrial ecosystem and carbon cycles. Global LAnd Surface Satellite (GLASS) vegetation production datasets provide new long-term basic products of gross primary production (GPP) and net primary production (NPP) for monitoring the issues related with carbon exchange and carbon storage. But the coarse spatial resolution of the GLASS GPP/NPP products have limited their application in ecosystem service assessment in regional scales. In this paper, a spatial downscaling method based on GLASS vegetation production datasets and four typical deep learning methods (deep neural network, convolutional neural network, back propagation neural network and recurrent neural network) was proposed to generate high resolution GPP/NPP in the forest areas in the upper Luanhe River basin in the north of Hebei Province in China. Then the downscaled GPP/NPP were validated with ground measurement data and reference high resolution GPP/NPP data, and the accuracy of downscaled GPP/NPP from different deep learning methods was compared. Results of this paper indicated the applicability and feasibility of deep learning methods in downscaling GPP/NPP. Direct validation and cross validation demonstrated that downscaled GPP/NPP using convolutional neural network obtained the highest accuracy.


I. INTRODUCTION
Vegetation productivity is the largest carbon flux component in terrestrial ecosystems and plays an important role in describing global or regional carbon exchange and carbon cycle [1]. Based on the light use efficiency (LUE) theory or the eco-physiological process [2], many global or regional gross primary production (GPP) and net primary production (NPP) have been generated, such as the MODerate Resolution Imaging Spectroradiometer (MODIS) daily GPP and annual NPP products, SPOT Vegetation (VGT) products.
The associate editor coordinating the review of this manuscript and approving it for publication was Jon Atli Benediktsson .
Global LAnd Surface Satellite (GLASS) vegetation production datasets were published in recent years and was believed to be a dataset with less inter-annual variations by integrating the regulations of several major environmental variables [3], [4]. But the coarse spatial resolution (500 m) of the GLASS GPP/NPP products have limited their application in ecosystem service assessment in regional scales. Although some efforts have been made and some progress have been achieved [5], [6], [7], how to exploit multiple scale data streams to generate high resolution vegetation productivity is still challenging, especially in the forest areas with complex terrain and high spatial heterogeneity [6]. Advances and new methods are urgent needed to generate high resolution GPP VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and NPP in forest areas by combining multiple scale remote sensing data, which is important in improving the estimates of carbon stocks and flux in forest. Spatial downscaling represents the process to increase the spatial scale based on the coarse datasets and information from ancillary data at a finer resolution [8], [9]. In recent years, many attempts have been made to downscale coarse resolution remote sensing datasets to generate high resolution GPP/NPP through statistical links models or data fusion models. Statistical models link coarse GPP/NPP with high resolution datasets through statistical model by using the predictors, such as vegetation indexes (VI), solar-induced chlorophyll fluorescence (SIF) [10], [11]. For example, Chen et al. [12] proposed a linear downscaling model from MODIS to Landsat to obtain high resolution albedo, evapotranspiration and GPP. Yue et al. [13] used the Carnegie-Ames-Stanford-Approach (CASA) model and statistical downscaling methods to calculate the NPP and soil water content. Hu and Mo [14] developed a framework to disaggregate the Global Ozone Monitoring Experiment-2 (GOME-2) SIF dataset to detecting regional GPP variations by using statistical relationships between SIF and Normalized Difference Vegetation Index (NDVI), the fraction of absorbed photosynthetically active radiation (FPAR) and soil moisture index. But studies have indicated that accuracy of statistical downscaling model depend on the robustness of the empirical equations. The statistic relationship used in the downscaling process may vary and became instability in different regions [6], especially in the complex terrain areas like forest.
The data fusion models provide a new way for generation of high resolution data from multiple sources and multiple scale data streams. Many researchers had tried to use data fusion models, such as the spatial and temporal adaptive reflectance fusion model (STARFM) [15], enhanced spatial and temporal adaptive reflectance fusion model (ESTARFM) [16], to integrating multiple scale satellite images to obtain high resolution surface reflectance datasets, then to generate high resolution GPP/NPP. For example, Singh [17] blended Landsat and MODIS data to generate high resolution chlorophyll index to retrieve GPP. Yu et al. [18] proposed a downscaling method for leaf area index (LAI) and FPAR based on STARFM and used Multi-source data Synergized Quantitative (MusyQ) algorithm to generate time series GPP/NPP in high resolution. He et al. [19] used a satellite data-driven LUE model to estimate GPP at 30 m resolution using a fused NDVI dataset which were reconstructed by blending Landsat and MODIS reflectance data. But the data fusion models may bring some uncertainly to the input data, and then to the GPP/NPP estimation [20]. Illustrating the errors transfers in the downscaling process was difficult [21], [22]. What was more, the applicability and accuracy of data fusion models in the high heterogeneity forest areas was still need to be improved [23]. In this condition, developing some methods for direct downscaling GPP/NPP to reduce the error accumulation and propagation becomes necessary and of great significance.
As deep learning has the capacity to simulate complex nonlinear relationships between multiple scale remote sensing images, some downscaling methods based on deep learning have also been developed in recent years. For example, Zhu et al. [24] downscaled the snow depth maps by fusing microwave and optical remote sensing data from deep neural network (DNN). Zhao et al. [25] compared the performance of deep belief network (DBN), residual network (ResNet) and back-propagation neural network (BPNN) in downscaling soil moisture. Wang et al. [26] developed a novel super resolution deep residual network (SRDRN) to downscale daily precipitation and temperature. Yu et al. [27] proposed an inverse weighted distance and a feed forward neural network (IDW+DNN) and a deep matrix network (DMN) to downscale tropospheric nitrogen dioxide. To sum up, deep learning methods have shown great potential in environmental parameters downscaling [28]. But few studies focused on the performance of typical deep learning models in the vegetation productivity downscaling. Therefore, comparison of the accuracy of these deep learning methods and analyzing the applicability of these methods in GPP/NPP downscaling is urgent.
Objectives of this paper are: (i) to generate a 30 m GPP/NPP products in forest area from 500 m GLASS datasets by using a downscaling method, (ii) to assess the performance of the downscaling method from different deep learning methods. This paper may provide a new way to generate high resolution GPP/NPP in forest area from coarse resolution GPP/NPP datasets by using the deep learning models. Organization of this paper is as follows: firstly, the study area and the GPP/NPP downscaling methods based on deep learning will be introduced in the Data and methods (section II). Then the downscaled GPP/NPP will be validated using the ground observation data and the reference data, and the accuracy of different downscaling methods from deep learning will be compared in the Results (Section III). At last, advantages and uncertainly of the downscaling methods will be analyzed in the Discussion (section IV).

A. STUDY AREA
A case study was conducted in the forest area  (Figure 1). This area had a semiarid continental monsoon climate, with a temperature about 0 • C in winter and 20 • C in summer. Annual mean precipitation of this area was about 400 mm [7]. The Saihanba Forest Farm, which was a national forest park and nature reserve as well as an important ecological shield in northern China, was located in the northern areas of the study area. Mainly tree species in the forest farm were larch (Larix ologensis), Scots pine (Pinus sylvestris), birch (Betula platyphylla Suk) and spruce (Picea asperata Mast). B. DATA AND DATA PROCESSING 1) REMOTE SENSING DATA GLASS GPP/NPP Products: The GLASS GPP products (Version V60) were generated based on the revised light use efficiency model with a spatial resolution of 500 m and temporal resolution of 8 days. By integrating the regulations of some environmental variables, such as atmospheric carbon dioxide concentration, radiation components, and atmospheric vapor pressure deficit (VPD) of the revised LUE model, the GLASS GPP/NPP were believed to be more temporally continuous than MODIS products and effectively reproduce the inter-annual variations [3], [4]. The GLASS NPP products (Version V60) with a spatial resolution of 500 m and temporal resolution of 8 days were derived from GLASS GPP by using a respiration index (ratio of NPP to GPP) which were calculated from 19 dynamic global vegetation models. Validation with field sites demonstrated that GLASS GPP/NPP products had high accuracy (mean R 2 was 0.81, averaged RMSE and absolute value of bias were 2.13 and 0.81 g C m −2 d −1 over all the investigation sites) [4]. Quarterly GLASS GPP/NPP from 2017 to 2021 were summarized from the 8-days products and were used in this paper to train the deep learning model to generate high resolution GPP/NPP datasets.
Forest type map: Forest types in this paper were derived from GLC_FCS30-2020. With 2019∼2020 time series Landsat surface reflectance data, Sentinel-1 Synthetic Aperture Rada (SAR) data, digital elevation model (DEM) terrain elevation data, global thematic auxiliary dataset and prior knowledge dataset, the GLC_FCS30-2020 products were generated from a random forest classification method [31], [32]. Studies have demonstrated that this land cover map have a high accuracy (overall accuracy: 95.1 %, kappa coefficient: 0.898) when validated with 15 regional field data.
DEM: The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model (GDEM) Version 3 (ASTGTM) with a spatial resolution of 30 were collected in this paper to analyze the spatial heterogeneity [33]. Validated with 10 m Japan digital topographic datasets demonstrated that geolocation error was about 0.3 m to west, 5.4 m to the north, and the standard deviation of the elevation error was 12.1 m.

2) CARBON FLUX DATA
Carbon flux data from 2020 August 1st to 2021 August 1st with a time interval of 30 minutes were collected at two VOLUME 10, 2022 eddy covariance systems at Saihanba forest farm. Location of the two eddy covariance systems were shown in Figure 1. The underlying forest types of these two eddy covariance systems were Larch and Pinus sylvestris, respectively. Steps to process the datasets included time delay correction, density fluctuation, secondary coordinate rotation, sonic virtual temperature conversion, gaps filling and flux partitioning [34], [35], [36]. GPP were then obtained by partitioning the observed net flux into GPP and ecosystem respiration, which could be described as [35], [36]: where NEE is the net ecosystem carbon dioxide exchange, R eco is the ecosystem respiration.

3) FIELD DATA
Ground-based field data were collected at 32 plots (including 11 sample plots of Pinus sylvestris, 4 sample plots of white birch, and 17 sample plots of larch) in September 2017 and July 2018 in Saihanba forest farm ( Figure 1). Geographic coordinates, diameter at breast height (DBH) of all trees in each 30 m × 30 m were measured. Core samples from both sides of the tree were obtained with a 5-mm diameter drill at breast height. Then the calendar year of each tree ring was obtained by using the standard dendrochronological techniques [37]. The annual DBH and tree height were obtained based on tree-ring width data. And the annual aboveground biomass was estimated according to the biomass estimation formula of each tree species [38]. Since the field sites were located in the areas with less forest management activities, annual increase of biomass was approximately regarded as annual NPP, and was used to validate the downscaled NPP in this paper [7].

4) METEOROLOGICAL DATA
Quarterly meteorological datasets (mean temperature and precipitation) with a spatial resolution of 30 m were generated by averaging the daily mean temperature and precipitation interpolated from 17 meteorological stations in and around the study area. Quarterly mean solar shortwave radiation with a spatial resolution of 30 m was obtained by averaging the daily solar shortwave radiation which was derived from a Mountain Microclimate Simulation Model (MT-CLIM) [39].

C. METHODS
Flowchart of downscaling GLASS GPP/NPP based on deep learning method was shown in Figure 2. Firstly, homogeneous pixels were selected by using the information from Landsat VI in the GLASS pixels. Then the training datasets were built based on the homogeneous pixels GLASS GPP/NPP, the meteorological datasets (PAR, temperature and precipitation), Landsat data (VIs, surface reflectance) and DEM. Third, the deep learning models were build based on the training datasets to generate GPP/NPP with a spatial resolution of 30 m. At last, direct validation and cross validation were conducted to assess the accuracy of downscaled GPP/NPP.

1) DOWNSCALING OF GLASS GPP/NPP
In the high heterogeneity forest areas, the coarse spatial resolution (500 m) GLASS pixels were usually mixed pixels and contained mixed information from vegetation and background soil. To build more robust relationship between the GPP/NPP and the input variables, and reduce the uncertainly of deep learning models, pure GLASS pixels were selected to obtain high quality samples to train the deep learning models. In this paper, the pure pixels were selected based on the coefficients of variation in each GLASS pixels. The coefficients of variation were defined as the ratio of the standard deviation to the mean value [40] of Landsat NDVI in each GLASS pixels. The minimum 30 % coefficient of variation of the pixels for each forest type (ENF, DBF, MF, DNF) were selected as the pure pixels. As GPP/NPP could be stressed by air temperature and water conditions, four VIs (NDVI, kNDVI, EVI and NDWI) related with photosynthetic activity, vegetation density and water conditions were selected as ones of the input variables of deep learning model in this paper. Specifically, NDVI and EVI, which could describe the terrestrial photosynthetic vegetation activity [41], have been widely used in GPP/NPP estimates. kNDVI is more resistant to saturation, bias and complex phenological cycles, and shows good correlations in applications of biomass and vegetation productivity estimation [30]. NDWI is an index which sensitive to changes in hydrological condition and liquid water content of vegetation canopies, which could reflect the level of water stress when estimating GPP/NPP [18], [42]. Then the training data were built based on the GLASS GPP/NPP in the pure pixels and corresponding meteorological datasets (photosynthetically active radiation, temperature and precipitation), Landsat datasets (surface reflectance, VIs) and DEM. To make these datasets comparable, the meteorological datasets and Landsat datasets in the pure pixels were aggregated to the GLASS scale (500 m * 500 m). For each forest type (ENF, DBF, MF, DNF), 70% of the training samples were used to build the deep learning models, and the rest 30% of the training samples were used to validate the models.
Deep learning could simplify the physical models in environmental parameters retrieval, and is effective in establishing the relationships between remote sensing images and environmental parameters [28]. The BPNN is a traditional neuron network framework, while DNN, convolutional neural network (CNN) and recurrent neural network (RNN) are the mainstream deep learning architectures in remote sensing [28]. In this paper, four deep learning models (DNN, CNN, BPNN and RNN) were adopted to generate 30 m GPP/NPP based on the training datasets from the pure GLASS pixels, and corresponding meteorological datasets and Landsat datasets.

2) VALIDATION OF THE DOWNSCALED GPP/NPP
To assess the performance of downscaling method in this paper, direct comparison was made between the downscaled GPP and field GPP derived from carbon flux data, and direct comparison was made between the downscaled NPP and field investigated NPP. The average GPP/NPP values in a 3 pixels × 3 pixels window around the field sites were used to compare with the field GPP/NPP to reduce the co-registration errors between images and field plot sites. Besides, the downscaled GPP/NPP were validated by using the reference downscaled GPP/NPP [7] pixel by pixel. The reference downscaled GPP/NPP were generated by using a data fusion approach and the MuSyQ model [7]. Determination Coefficient (R 2 ) and Root Mean Square Error (RMSE) were used to quantify the accuracy of downscaled GPP/NPP. And the mean difference (MD) [43] was adopted in this paper to evaluate the degree of under or over prediction of the results.

A. VALIDATION WITH GROUND OBSERVED GPP/NPP
Compared with 500 m GLASS GPP/NPP (Figure 3), the downscaled 30 m GPP/NPP (Figure 4, Figure 5) demonstrated finer scale features with more clear identification. In general, the spatial distribution features of the downscaled VOLUME 10, 2022 GPP/NPP from DNN, CNN, BPNN and RNN were almost consistent with the GLASS GPP/NPP products.
In general, a good linear relationship existed between the downscaled NPP and field observed NPP (R 2 ranged from 0.71 to 0.80, RMSE ranged from 74.33 g C m −2 3months −1 to 92.63 g C m −2 3months −1 , and MD ranged from 41.65 g C m −2 3months −1 to 54.21 g C m −2 3months −1 ) (Figure 7). R 2 could reach 0.80, RMSE was only 74.33 g C m −2 3months −1 , and MD was 41.65 g C m −2 3months −1 using CNN (Figure 6(b)), which also indicating that downscaled NPP from could obtain the highest accuracy from CNN. It was found that most plots in Figure 7 were located under the 1:1 line, indicating that the downscaled NPP products were overestimated in the study area. The main reason was that the GLASS NPP products were overestimated. As the training samples were collected from the GLASS products, the downscaled NPP were also overestimated at most times.

B. VALIDATION WITH REFERENCE GPP/NPP 1) TIME SERIES OF DOWNSCALED GPP/NPP
In general, time series of GLASS GPP, downscaled GPP from deep learning, and the reference GPP achieved good agreements at ENF, DNF, DBF and MF (Figure 8). We could find that the downscaled GPP agreed better with the GLASS GPP as the training datasets were obtained from the GLASS GPP products. The downscaled GPP shown more seasonal variations than GLASS products, and the seasonal trend is matched well with the reference GPP. What is more, downscaled GPP were higher than GLASS GPP at the first quarter and the fourth quarter (downscaled GPP were about 40 g C m −2 3months −1 ∼ 100 g C m −2 3months −1 , GLASS GPP and reference were less than 20 g C m −2 3months −1 ), while the    Temporal patterns of GLASS NPP, downscaled NPP and reference NPP were shown in Figure 9. We could also find that the downscaled NPP matched better with the GLASS NPP, but the seasonal trend matched better with the reference NPP. Also, the downscaled NPP were overestimated at the first quarter and the fourth quarter, and underestimated at the second and third quarter to some extent.

2) CROSS VALIDATION WITH REFERENCE GPP/NPP
The downscaled GPP/NPP were validated by using the reference GPP/NPP pixel by pixel. In general, good linear relationships existed between the downscaled GPP and the reference GPP, as shown in Table 1 (R 2 ranged from 0.47 to 0.64, RMSE ranged from 89.64 g C m −2 year −1 to 127.90 g C m −2 year −1 , MD ranged from 50.91 g C m −2 year −1 to 90.36 g C m −2 year −1 ). The best consistency existed between the downscaled GPP and reference GPP using CNN (  [44], support vector machine (SVM) [45]. Studies have indicated that sufficient number of samples with good quality was the base of the machine learning or deep learning models [46], [47], [48]. In this paper, a GPP/NPP downscaling method was proposed based on deep learning models and GLASS datasets. The deep learning models was trained on the pure pixels of GLASS products to increase the stability and reliability of downscaling model to generate high accuracy GLASS-like GPP/NPP. A high accuracy was obtained when validating with ground measurement data and reference high resolution GPP/NPP datasets. Comparison with the accuracy of several typical downscaling GPP/NPP methods, including the statistical models, data fusion models and interpolation models ( Table 3), indicated that the accuracy of the GPP/NPP downscaling method in this paper was satisfactory.

B. UNCERTAINTIES ANALYSIS
In the process of GPP/NPP downscaling and validation, the uncertainly of the input datasets, the limitations of deep learning model, the compositing algorithm of cloud free images, and the ways to validate the predicts, would bring some errors to the results.
First, the accuracy of deep learning model depends on the quality of the training samples to a large extent. Deep learning is believed to be a process of learning big data based on large-scale computing power [50]. The large scale and high quality training datasets could not only bring up benchmarks for the deep learning works, but also could improve the applicability and feasibility of the deep learning models [51]. In this paper, GPP/NPP in pure GLASS samples and corresponding composited Landsat datasets and meteorological datasets were used as the inputs to train the deep learning model. Therefore, the accuracy of downscaling results was related to the accuracy of source data (GLASS, Landsat and meteorological datasets, DEM). Any errors in the source data would propagate into the final downscaled results.
To improve the quality of training data, ways to selecting more representative training samples with higher precision and lower spatial heterogeneity may be developed in the future.
Second, limitations of deep learning model may have some influence on the accuracy of downscaling GPP/NPP. Specifically, the deep learning models may tend to not reconstruct the spatial pattern and may overestimate the results in low values and underestimated the results in high values [52], [53]. In the future, hyper-parameters in the deep learning models would be optimized to improve the accuracy.
Third, the composting algorithm to obtain cloud free and radiometrically consistent Landsat reflectance have some impact on the results. A pixel-based algorithm [24] which was applicable for integrating different pixel characteristics for optimized compositing was adopted in this paper. When the images were far from the center of time interval, some errors may be introduced in the compositing process, which may bring some uncertainly in the high resolution GPP/NPP estimation.
Lastly, ways to validate the GPP may also bring some uncertainly. Studies have indicated that validating GPP at footprint source scale may be more reasonable than at in-situ scale [35], [36]. In the future, footprint of the field GPP in the forest may be analyzed, and some up-scaling methods would be used for up-scaling the in-situ GPP/NPP to footprint source area or to regional scale to validate the results.

V. CONCLUSION
Estimating vegetation productivity is important in the research of terrestrial ecosystems, carbon cycles and climate change. In this paper, pure GLASS vegetation production pixels and corresponding Landsat data, meteorological data and DEM were used to train the deep learning models (DNN, CNN, BPNN, RNN), which were then used to downscale GLASS datasets to generate high resolution GPP/NPP. Validated with field data demonstrated that the model gained high accuracy (R 2 : 0.86∼0.92, RMSE: 60.51 g C m −2 3months −1 ∼ 74.54 g C m −2 3months −1 for downscaled GPP; R 2 : 0.71∼0.80, RMSE: 74.33 g C m −2 3months −1 ∼ 92.63 g C m −2 3months −1 for downscaled NPP). Compared with the reference GPP/NPP showed that good consistency existed between the downscaled GPP/NPP time series and the reference GPP/NPP time series, and good linear relationships existed between the downscaled GPP/NPP and reference GPP/NPP (R 2 : 0.47∼0.64, RMSE: 89.64 g C m −2 year −1 ∼ 127.90 g C m −2 year −1 for downscaled GPP; R 2 : 0.43∼0.65, RMSE: 50.92 g C m −2 year −1 ∼ 75.95 g C m −2 year −1 for downscaled NPP). Results of this paper indicated that deep learning has great potential in downscaling GPP/NPP, and downscaled GPP/NPP in the study area could obtain the highest accuracy.
TAO YU received the Ph.D. degree in cartography geographic information system from Beijing Normal University, in 2019. He is currently a Research Assistant with the Institute of Forest Resource Information Techniques, Chinese Academy of Forestry. His research interests include scale effects and scaling methods in remote sensing, applications of remote sensing in carbon cycle, and global change.
YONG PANG received the Ph.D. degree in cartography and geographic information system from the Institute of Remote Sensing Applications, Chinese Academy of Science, in 2006. He is currently a Professor with the Institute of Forest Resource Information Techniques, Chinese Academy of Forestry. His research interests include LiDAR remote sensing and applications in forestry and carbon mapping.
RUI SUN received the Ph.D. degree in physical geography from Beijing Normal University, in 1998. He is currently a Professor with Beijing Normal University. His research interests include remote sensing for natural resources and environment, remote sensing for vegetation productivity, and chlorophyll fluorescence.
XIAODONG NIU is a Postdoctoral Researcher with the Institute of Forest Resource Information Techniques, Chinese Academy of Forestry. His research interests include carbon and water cycle in forest ecosystems and the effects of climate change on forest productivity and evapotranspiration.