Mapping Annual Global Forest Gain From 1983 to 2021 With Landsat Imagery

The world's forests are experiencing rapid changes due to land use and climate change. However, a detailed map of global forest gain at fine spatial and temporal resolutions is still missing. To fill this gap, we developed an automatic framework for mapping annual forest gain globally using Landsat time series, the Landsat-based detection of trends in disturbance and recovery (LandTrendr) algorithm, and the Google Earth engine platform. First, stable forest samples collected based on the first all-season sample set and an automated sample migrate method were used to determine annual normalized burn ratio (NBR) thresholds for forest gain detection. Second, with the NBR time series from 1982 to 2021 and LandTrendr algorithm, we produced a dataset of global forest gain year from 1983 to 2021 based on a set of decision rules. Our results reveal that over 60% gains occurred in Russia, Canada, the United States, Indonesia, and China, and approximately half of global forest gain occurred between 2001 and 2010. The forest gain map developed in this study exhibited good consistency with statistical inventories and independent regional and global products. Our dataset can be useful for policy-relevant research on the global carbon cycle, and our method provides an efficient and transferable approach for monitoring other types of land cover dynamics.

the United States, Indonesia, and China, and approximately half of global forest gain occurred between 2001 and 2010. The forest gain map developed in this study exhibited good consistency with statistical inventories and independent regional and global products. Our dataset can be useful for policy-relevant research on the global carbon cycle, and our method provides an efficient and transferable approach for monitoring other types of land cover dynamics.
Index Terms-Change detection, forest disturbance, land cover, Landsat-based detection of trends in disturbance and recovery (LandTrendr).

I. INTRODUCTION
F OREST makes up approximately 31% of the global land area, and plays a significant role in both environmental and social conservations [1], [2]. They provide a diversity of economic benefits and ecosystem services, including the production of wood [3], [4], the storage of carbon [5], [6], and the protection of biodiversity [7], [8]. However, according to statistics produced by the Food and Agriculture Organization (FAO), about 420 million hectares of forests have been lost worldwide since 1990 [9], which contributes to climate change, soil erosion, flood risk, and disease outbreaks [10], [11]. To address this issue, ambitious afforestation and reforestation programs have been implemented in several countries including China [12], Russia [13], and the United States [14]. These programs, alongside the natural expansion or recovery of forests in some regions, have reduced the net rate of global forest loss [9].
Monitoring forest cover change at the globe scale is important in the context of rapid anthropogenic change. Forests are a significant source of carbon storage and sink, and their protection is essential to reduce the impacts of climate change. They also provide livelihoods for over a billion people, and their responsible management is critical for achieving sustainable development goals [15]. Optical remote sensing, particularly the open-accessed Landsat archive, is proving immensely valuable for long-term monitoring of forests. Products, such as the Landsat-based global forest map, which tracks annual losses since 2000 [16], the humid tropics map that tracks changes in the period 1990-2019 [17], and the mangrove forest change map in China during 1985-2015 [18], have been developed. However, these studies have focused more on the detection of forest loss or the forest change monitoring of specific regions or tree species. Currently, there is still a lack of a globally consistent tool for monitoring forest gain.
There are two main approaches to monitor forest cover gain over time. The first approach involves using forest age maps derived from the biophysical characteristics of trees (e.g., tree rings or height), along with regional climate conditions [19], [20]. Based on the field estimates of age, developing statistical or machine learning models were developed for the relationships between forest age and various covariates [21], [22], [23]. However, obtaining accurate estimates of forest age in the field is time-consuming and challenging, and the relationship between forest type and biophysical characteristics is not always accounted for in the modeling among various tree species. The second approach involves tracking long-term changes in land cover [24], where the year of forest gain is defined as the year when a nonforested pixel transitions to forest according to a set of classification rules. However, differences in land cover classification due to temporal inconsistency in image quality can overestimate annual losses and gains of forest cover [25], [26].
In this study, we aim to introduce a novel and automated approach for obtaining global forest gain year maps using the Landsat-based detection of trends in disturbance and recovery (LandTrendr) algorithm [27] and Landsat imagery captured between 1982 and 2021. Considering the vast amount of data processing required, this study was implemented using the Google Earth engine (GEE) platform [28], which has been widely used in the field of remote sensing, including forest monitoring [16], land cover mapping [29], crop yield estimation [30], etc. The spatial and temporal distribution of global forest gain since 1982 is then analyzed, and the dataset generated in this study is validated using statistical inventories and independent global and regional products.

II. METHODS
A new framework was developed in this study for the automatic mapping of spatial and temporal changes in global forest gain using long-term Landsat observations on the GEE platform (see Fig. 1). To achieve this, the study first delineated stable forest samples during the period of 1982-2021. The samples were delineated based on the first all-season sample set (FAST) [31] using an automated sample migrate method [32], [33], and annual normalized burn ratio (NBR) [34] thresholds for forest gain detection were generated based on these samples for different biomes [35]. Then, the study employed the NBR time series and LandTrendr to detect forest gain years at the pixel level, while minimizing false detections of forest gain by using global forest change (GFC) [16], European Space Agency (ESA) WorldCover 2021 [36], and the thresholds obtained in the first step. Finally, the accuracy of the mapping results was evaluated using statistical inventories and previous products.

A. Definition of Forest and Forest Gain
To ensure accuracy and consistency in forest gain monitoring, this study used multiple datasets, including FAST [31], GFC [16], and ESA WorldCover 2021 [36], and thus required harmonization of forest and forest gain definitions. The forest definition used in this study was based on FAST, where forest was defined as having a tree cover of 10% or greater, which was also compatible with the forest definition of ESA WorldCover 2021. To maintain the consistency of forest definition, a 10% threshold for the band of tree canopy cover in GFC was used. In addition, the definition of forest gain followed that of the GFC band of gain, which refers to "a nonforest to forest change entirely within the study period." B. Data and Data Processing 1) Landsat Imagery: In this study, all available Landsat Tier-1 surface reflectance images from 1982 to 2021 were used, including data collected by the thematic mapper (TM), enhanced thematic mapper plus (ETM+), and operational land imager (OLI) sensors. To avoid possible detection errors caused by seasonal disturbance [27], Landsat images used here were collected in the same season (from June to August). The pixels with clouds or clouds shadows were removed using the C function of the mask algorithm [37]. Fig. S1 in the Supplementary Material provides a visual representation of the Landsat imagery available for this study, spanning from 1982 to 2021, and demonstrates the number of available pixels that were used for forest gain detection. Furthermore, a transformation function [38] was used to ensure the consistency of reflectance between different sensors (TM, ETM+, and OLI). After that, NBR [34], which is known to be effective for forest disturbance monitoring [39], [40], [41], [42], was generated. NBR is an index that uses a Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
combination of near-infrared (NIR) and short-wave infrared (SWIR2) bands [see (1)]. Generally, healthy and dense vegetation has high NBR values, and the annual maximum value of NBR was used to detect the year of forest gain, given as follows: To address missing data resulting from the uneven data distribution (e.g., cloudy conditions in tropics meant few images were usable) and composite the continuous annual time series data from 1982 to 2021, a three-year sliding window was implemented in this study [43]. By moving backward from 2021, the null value was augmented with the sliding window by the mean value of the two adjacent years. Furthermore, for 2021, the closest year's value was used for the null value. In cases where the previous year's data were null, the following year's data were used instead.
2) Global Forest and Land Cover Datasets: Ancillary datasets were used to provide global forest and land cover data, including the Hansen maps of GFC [16] and the ESA World-Cover 2021 [36]. The GFC dataset provides detailed information on the area ratio of tree canopy cover for the year 2000, annual forest loss between 2001 and 2021, and total forest gain from 2000 to 2012 with a resolution of 30 m. The ESA WorldCover 2021 product was developed to map global land cover with a resolution of 10 m in 2021 and was resized and reprojected to 30 m for this study. These datasets were combined to produce a current (2021) global forest map, which was used to mask possible falsely detected forest gain.

C. Determining Annual NBR Thresholds for Forest Gain Detection
To identify stable forest areas during the entire period of 1982-2021, we utilized the forest samples in FAST and employed a sample migration method [32], [33]. This method allowed us to define stable forest pixels as those that remained unchanged throughout the study period. Finally, a total of 25 955 samples (pixels) were used to calculate the annual NBR thresholds for forest gain detection as follows [44]: where Thrd y is the threshold for the yth year, and NBR mean and NBR std were the mean and standard deviation of the annual NBR maximum of the stable forest samples in the year, respectively.
To account for the variation in forest growth conditions across different regions of the world, this study utilized the terrestrial and terrestrial-freshwater biomes developed by the International Union for Conservation of Nature (IUCN) [35]. These biomes include a range of regions, such as tropical or subtropical forests, temperate-boreal forests and woodlands, shrublands and shrubby woodlands, savannas and grasslands, deserts and semideserts, polar or alpine (cryogenic), intensive land use, and palustrine wetlands. Using these biomes, region-specific thresholds were calculated for forest gain detection at a regional level.

D. Forest Gain Year Detection Using LandTrendr
The LandTrendr algorithm [27], a spectral-temporal segmentation-based change detection algorithm, was used in this study to estimate forest gain year, implemented on the GEE [45] with parameter settings from a previous forest disturbance monitoring study [27]. In addition, to account for the differences between natural forest restoration and plantation gain, the parameters of time series fitting and segmentation were adjusted with more sensitive settings [43] for the product of spatial database of planted trees [46], which is a global plantation extent dataset developed by national or regional compiling.
As shown in Fig. 2, the LandTrendr algorithm was used for forest gain year estimation. The algorithm fitted the NBR time series and identifies breakpoints, and then temporal segments were sorted by the observed year of the start vertex in the segment (Mag). To detect the year of forest gain, the segment should satisfy certain conditions, given as follows.
2) The fitted NBR value for the start vertex in the segment (Start year ) should be lower than the threshold for the year.
3) The pixel should be classified as forests in the current (2021) forest maps. Then, the first year in the segment in which fitted NBR reached the threshold was preliminarily judged as the forest gain year (Gain year ). To remove false detections caused by missing data and short-term noise, the duration (Dur) and quality assessment (QA = Number of effective observations/Dur) between Start year and Gain year were calculated. Pixels with Dur > 1 (for plantations Dur >= 1) and QA > 0.5 were determined as the final forest gain year. If there were no year in the first segment meeting the rules, the second segment would be used to detect the year of forest gain, and so on. The results were discarded for pixels with no segments that fit the rules.
Moreover, considering the wide application and high mapping accuracy of GFC, the binary band of gain in GFC, representing forest gain during 2000-2012, was used as the baseline for the extent of forest gain between 2000 and 2012. For pixels classified as "gain" in GFC but not detected as forest gain during 2000-2012 in our study, the forest gain year was set as the year when NBR increased the most.

E. Evaluation
To assess the accuracy of the global forest gain year dataset produced in this study, we conducted two types of evaluations: 1) comparison with statistical inventories from the FAO; 2) comparison with forest gain, forest age, and long-term annual land cover products from previous studies (see Table I).
The FAO statistical database (FAOSTAT) was used to obtain information on planted forest area, which is annually updated during 1990-2018 for different countries [47]. Planted forests refer to forests dominated by trees established through planting or artificial seeding, including coppicing from trees that were originally planted or seeded. Planted trees are the main source of forest gain in most countries [9], so we compared our estimates of forest gain against these data. In addition, to include the comparison to natural restoration of forest, we also used the  annual global forest expansion area provided by the global forest resources assessment (GFRA) to evaluate our dataset at the statistical level.
In order to further validate the accuracy of our forest gain mapping results, we compared them with a range of independent regional and global products. Given the variability in spatial and temporal resolutions of these products, we employed various data processing methods and evaluation strategies to accurately compare the results. Our comparison included forest gain maps from GFC [16], which provides information on global forest gain trends from 2000 to 2012, and forest gain maps from North American land change monitoring system (NALCMS) [48], which focus on forest gain regions in North America during 2010-2015. We used these products to assess the agreement of the spatial distribution of forest gain with our dataset for the same period.
For European Space Agency Climate Change Initiative Land Cover Dataset (ESA-CCI-LC) [51], annual global land cover (AGLC) [52], China land cover dataset (CLCD) [50], and landscape change monitoring system dataset (LCMS) [49], the forest gain year was defined as the year when the gain of forests was observed in the land cover time series, enabling the generation of annual forest gain maps. Besides, secondary forest age for Brazil (SFAB) [24] also provided annual forest gain maps, which makes accurate evaluation possible. To compare the year of forest gain in our dataset with those in these products, we employed a stratified sampling strategy to collect ∼ 2500 validation samples for each product. This ensured a representative sample from various forest gain years, and the spatial distribution of the samples was visualized in Fig. S2 in the Supplementary Material. We calculated annual F1 scores to quantitatively evaluate the accuracy of the forest gain maps, which take into account both precision and recall and provide a robust assessment of the models' monitoring power. Besides, the overall accuracy (OA) of each product was obtained using an error matrix-based calibration estimator with a 95% confidence interval [53].

A. Global Forest Gain Patterns
Annual global forest gain maps from 1983 to 2021 were generated and shown in Fig. 3. We further calculated the area of forest gain for each continent and each country, which are tabulated in Table II and Table S1 in the Supplementary Material, respectively. The results suggest that global forest gain was concentrated mainly between 2001 and 2010, accounting for approximately half of all forest gain area. The most substantial gains occurred in the northern hemisphere, with Russia, Canada,   Fig. 4. Global forest gain intensity obtained with 10-km grids. and the United States being the largest contributors. In addition, Indonesia and China also had significant areas of forest gain. In total, the top five countries accounted for over 60% of the global forest gain area, highlighting their importance in global forest management.
Furthermore, to analyze patterns of forest gain on the local scale, the 10-km grids were utilized to obtain forest gain intensity, which was defined as forest gain area divide forest area of 2021. As shown in Fig. 4, we can observe several hotspots with high forest density and high intensity of forest gain as follows. 1) The analysis of forest gain revealed areas of high intensity in North America, particularly in the southeastern United States. This region has undergone extensive deforestation over the last few centuries, and the forest gain observed in this area may be attributed to afforestation efforts and the regrowth of natural forests. However, the high-intensity forest gain found in this region may also be caused by the harvesting of wood products, which has been a major industry in the southern United States for several decades [54]. 2) Our study showed that forest gain in South America is concentrated mainly in Chile and Brazil. The regions with the largest increase in Chile are the southern regions of Araucanía and Los Ríos, which are dominated by plantation forestry. For Brazil, forest gains are mainly concentrated in the Amazon region and the Serra do Mar in the southeast. 3) European countries with the highest forest gain intensity are Spain, Portugal, France, U.K., and Hungary, which are mostly due to afforestation programs and harvested forest [55], [56].

4) High forest gain intensity can be found in northwestern
Africa near the Mediterranean Sea, which may be attributed to the successful implementation of biosphere reserves and forest development projects in the region [57]. 5) China, India, and Indonesia are major contributors to forest gain in Asia. The revegetation policies of the Grainfor-Green Project in China [13], [22] and the Green India Mission [58] help to increase the forest cover. In Indonesia, the government's efforts to reduce deforestation and promote reforestation have also led to an increase in forest gain [59]. 6) For Australia and Oceania, the areas of high forest gain intensity are mainly located in Australia, which could be attributed to afforestation and reforestation policies, as well as natural regeneration after disturbances, such as fires and logging [60].

B. Evaluation Results
First, the extent of global forest gain was evaluated with planted forest area from FAOSTAT [47], forest expansion area from GFRA [9], and forest gain area from ESA-CCI-LC [51] (see Fig. 5). The overall trends of forest gain in our dataset were Fig. 6. Comparison of forest gain area in different countries with FAO-STAT [47] and GFRA [9]. Each point in the figure represents a country. The red solid and black dotted lines indicate the linear regression line and the 1:1 ratio line, respectively. The horizontal axis represents the forest gain area in our dataset. The vertical axis represents (a) the planted forest area in FAOSTAT and (b) the forest expansion area in GFRA. Fig. 7. Spatial agreement between the detected forest gain and various forest gain, forest age, and long-term annual land cover products, including LCMS in the United States [49], CLCD in China [50], SFAB in Brazil [24], and ESA-CCI-LC [51] and AGLC [52] global maps. similar to the statistics and ESA-CCI-LC product. However, it was noted that the forest gain prior to 2000 may have been underestimated due to the unavailability of Landsat imagery at that time. Furthermore, the forest gain area of each country as obtained from the dataset was found to be in good agreement with the data from FAOSTAT and GFRA, as depicted in Fig.  6. However, the forest gain area in our dataset tended to be underestimated, as some newly planted trees may not have been detected using the method employed.
Second, the comparison between the detected forest gain and various forest gain, forest age, and long-term annual land cover products was carried out in a spatial manner. The level of agreement between the two products was calculated as the overlapping area divided by the total area of both products (see Fig. 7). It is essential to consider the time coverage of the different products when comparing them. In the case of comparing GFC (2001-2012) and LCMS , it is necessary to consider only the forest gain data for the period of 2001-2012 in both sources to ensure a fair comparison. The Fig. 8. Annual F1 scores with different tolerances and OA with ±3-year tolerance obtained from forest gain, forest age, and long-term annual land cover products, including LCMS in the United States [49], CLCD in China [50], SFAB in Brazil [24], and ESA-CCI-LC (ESA) [51] and AGLC [52] global maps. analysis aimed to determine the level of agreement between the different products in terms of their location of the detected forest gain. The results of the comparison showed that our forest gain mapping results have the highest agreement with GFC, LCMS in United States, and CLCD in China, and the agreement with NALCMS in North America and SFAB in Brazil was second only to LCMS and GFC, respectively. In addition, the agreement with ESA-CCI-LC and AGLC was also ranked highly, placing our results in the top three in terms of consistency with these products, further reinforcing the validity of our mapping framework. These results, thus, demonstrate the potential of our mapping method to provide relevant and valuable information on the state and evolution of forest gain across the globe.
Third, to obtain quantitative accuracy evaluation results, we compared our dataset with the global forest gain maps developed with ESA-CCI-LC [51] and AGLC [52] global maps, as well as three independent regional 30 m products: LCMS in the United States [49], CLCD in China [50], and SFAB in Brazil [24]. Specifically, annual F1 scores with different tolerances were obtained from these forest gain, forest age, and long-term annual land cover products (see Fig. 8). The results indicate that the F1 score for each year was relatively high (∼ 0.60) when the deviation was limited to within ±3 years [see Fig. 8(a)-(e)]. Similarly, the OA reached ∼0.60-0.7 with the tolerance of ±3 years [see Fig. 8(f)]. This suggests that our approach is effective in identifying forest gain events with a relatively high degree of consistency with other products by allowing for some degree of temporal flexibility. In order to provide a more comprehensive understanding of the accuracy and reliability of our results, we have conducted a regional comparison with annual global gain maps obtained from previous research works. The comparison results are presented in Fig. 9. By examining the similarities and differences between our results and other global or regional products, we can see the differences and similarities between the forest gain distribution in each zoomed region.
Finally, we summarized the abovementioned evaluation results obtained from independent forest gain, forest age, and longterm annual land cover products (see Table III). In comparison to NALCMS [48], our dataset shows a higher agreement with GFC [16] in terms of the spatial distribution of forest gain. In addition, our dataset exhibits good consistency with four 30-m forest gain year maps from LCMS in the United States [49], CLCD in China [50], SFAB in Brazil [24], and the global AGLC map [52]. The mean values of annual F1 scores for these maps were 0.52, 0.58, 0.63, and 0.59, respectively, when allowing for a deviation of ±3 years. Moreover, our dataset demonstrates satisfactory agreement with ESA-CCI-LC [51], despite the significant difference in spatial resolution, with an average annual F1 score of 0.58 when the tolerance is set to ±3 years. These results suggest that our approach is effective in identifying forest gain events and can provide reliable and accurate information to support forest management and conservation efforts at various spatial scales.

C. Uncertainty of Global Forest Gain Year Data
The global forest gain year data developed in this study is the first annual forest gain product from 1983 to 2021 at 30 m resolution. Nevertheless, there are still some uncertainties that need to be acknowledged for the possible future applications. First, this study utilized a combination of GFC [16] and ESA WorldCover 2021 [36] to create a current (as of 2021) global forest map. This allowed the detection method to filter out NBR growth that did not result from forest gain, such as cropland expansion, and to eliminate data noise from the monitoring. However, the use of these products also introduced uncertainties due to their limitations, and users should be mindful of the possible risks when interpreting the results.
Second, the forest gain area exhibited a sudden increase in 2001 and a decrease in 2013, as shown in Fig. 5(a). However, these observations should be considered in the context of the lack of Landsat data before 2000 and the discontinuities in Landsat observations in 2013. These factors may result in an underestimation of forest gain before 2000 and in 2013. In addition, Landsat data with uneven spatiotemporal distributions and images with bands in Landsat ETM+ images may introduce noise and discontinuities into the dataset.
Third, in order to obtain NBR thresholds that can be applied to different regions of the world, IUCN biomes were used in Fig. 9. Regional comparison of our results with ESA-CCI-LC [51], AGLC [52], and regional products of LCMS in the United States [49], CLCD in China [50], and SFAB in Brazil [24]. this study to calculate thresholds at the regional level. This approach helps to avoid some of the monitoring errors associated with a single global threshold, but it may also result in unreasonable discontinuities at the boundaries between different biomes.
Fourth, it is important to note that the parameter settings for LandTrendr can greatly impact the final forest gain maps. Different settings can produce vastly different results, and the settings used in this study were based on previous forest disturbance monitoring research and visual inspection. However, the growth rate of forests varies globally, and using the same parameters for all regions may lead to errors. In areas with rapid growth, more sensitive parameters may be needed, while adjustments should also account for overestimation due to data noise. Partition modeling may provide a solution, as was done for plantations in this study, but this can also introduce possible discontinuities. Therefore, the basis for partitioning and the coherence of the global mapping results still need to be carefully considered. In general, the parameters used in this study may result in underestimated of forest gain area in some regions, but are reasonable on a global scale.

IV. CONCLUSION
In this study, we developed the first global forest gain year maps from 1983 to 2021 at 30 m resolution using an efficient and automatic framework and long-term Landsat observations. The dataset exhibits good consistency with statistical inventories and independent regional and global products, and the high spatial and temporal resolutions enable detailed information on global forest gain. The dataset has the potential to be used in a wide range of applications, including afforestation and reforestation policy making, yield prediction from tree crops, and carbon cycle research. Furthermore, the approach used in this study is portable and can be applied to the monitoring of other land cover classes with appropriate spectral index and parameter settings, making it a valuable tool for studying and monitoring changes in land cover and land use over time.

DATA AVAILABILITY
The Annual Global Forest Gain (AGFG) dataset presented in this study can be obtained in GEE as an Image: https://code.earthengine.google.com/?asset=projects/eetufangbobo/assets/ForestGain/forestGainFinal.