Exploring Photon-Counting Laser Altimeter ICESat-2 in Retrieving LAI and Correcting Clumping Effect

The Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) employs a unique multibeam photon counting approach to acquire a near-continuously sampled profile and provides more precise technology for mapping the leaf area index (LAI) at the global scale. The inversion accuracy of LAI is affected by the clumping effect, which has been an open question for spaceborne laser scanning (SLS). Here, we present a segmented method based on the path length distribution model to calculate the clumping-corrected LAI independently using ICESat-2 data. The results showed that the LAI derived by the proposed method with a 200 m segment was consistent with the airborne laser scanning (ALS)-derived LAI, with a root mean squared error (RMSE) of 0.37. A satisfactory agreement (RMSE $=1.03$ ) was also shown between moderate resolution imaging spectroradiometer (MODIS) LAI and ICESat-2 LAI. Moreover, the LAI derived by the proposed method was on average 31.72% higher than the LAIe derived by Beer’s law, which indicated that the proposed method achieved the purpose of correcting the clumping effect. The gap probability was calculated by the 200 m moving window and the path length distribution was obtained by the 1 m moving window as the model input had the highest accuracy. In addition, the limitation of the point cloud data and the time lag of ICESat-2 acquisitions and ALS observations may affect the inversion accuracy of LAI. This study proposed a feasible way to correct the clumping effect and invert LAI independently using ICESat-2 data, which has the potential to characterize vegetation structure precisely at regional and global scales.


I. INTRODUCTION
L EAF area index (LAI), commonly defined as one-half of the total leaf area projected per unit of the horizontal ground area [1], is a critical canopy structure parameter for modeling terrestrial carbon cycle, surface energy balance, and rainfall interception [2], [3], [4].
Techniques utilizing remote sensing to retrieve LAI can be divided into three categories according to the platform: 1) ground-based measurement; 2) airborne observation; and 3) satellite observation [5], [6], [7], [8]. As the most accurate measurement of forest LAI, ground-based measurement is usually used to verify the LAI inversion of higher platforms (e.g., airborne or spaceborne) [9], [10], [11], [12]. However, these methods are limited by labor and time when observing in a large area. Airborne observation can implement over larger areas than ground measurement, and it can be used to validate the LAI derived from satellites [4]. However, aircraft observation is too expensive to acquire the LAI with frequent repetition in a large area. Satellite observation, characterized by a short repetition period and large-scale observation, provides a key technical means for retrieving LAI over a continental or global scale [13], [14].
The spaceborne instruments used to retrieve LAI mainly include passive optical sensors and active light detection and ranging (LiDAR) equipment [15], [16]. Currently, the passive optical inversion method has achieved remarkable achievements, and it is also a common method for estimating global vegetation LAI [17], [18], [19], [20]. Combined with global land surface reflectance products, it provides an effective way to estimate LAI in large regions [21]. The passive optical satellite-borne derived LAI is based on empirical relationships with vegetation index [22], [23], [24] or physical radiative transfer model [25], [26], [27]. Although these studies have been extremely significant, there are some limitations because of the limited penetrability of sunlight in dense forests [4], [14], [28]. In contrast, spaceborne laser scanning (SLS) has a good penetration ability to provide vertical canopy distribution and characterize the interior canopy information [16], [29], [30]. Therefore, it is significant to develop a method of deriving LAI from SLS. SLS sensors mainly include: 1) geoscience laser altimeter system (GLAS), onboard Ice, Cloud, and land Elevation  [31]; 2) Advanced Topographic Laser Altimeter System (ATLAS), onboard ICESat-2 [32]; 3) Global Ecosystem Dynamics Investigation (GEDI), onboard International Space Station (ISS) [33]; and 4) Gaofen-7. GLAS, GEDI, and Gaofen-7 are fullwaveform recordings, while ATLAS is a photon-counting laser altimeter. The full-waveform spaceborne LiDAR has more capability to penetrate dense canopies [34], but its disadvantages are large footprint diameter and large sampling interval along the flight track. Compared to full-waveform LiDAR, photon-counting LiDAR has a small footprint and a high laser pulse repetition rate, which produces a relatively high point density and can acquire a near-continuously sampled profile alongside the flight direction [35]. Therefore, ICESat-2 is suitable for portraying fine forest structure parameters. Many studies have used ICESat-2 data to acquire forest canopy height [35], [36] and aboveground biomass [34], [37], [38], [39], [40] and have achieved remarkable results. However, the application of ICESat-2 geolocated photon data for correcting the clumping effect and estimating clumping-corrected LAI has not yet been demonstrated in the literature. In view of the novelty of ICESat-2 photon counting data, it is necessary to conduct research to understand its uses and limitations in characterizing forest structure parameters.
Currently, gap fraction-based methods are the mainstream method of SLS inversion of LAI [41], [42], [43], which invert the effective LAI (LAIe), not the actual LAI. The LAIe is defined as the LAI value that assumes the foliages are simple random distribution in the ideal homogeneous canopy, which has the identical gap probability in the clumped canopy. However, in natural forest canopy scenes, the aggregation of leaves will increase the gap probability so that the LAIe will be smaller than the true LAI. The clumping effect is the key factor that affects the accurate LAI inversion of SLS. There have been studies using clumping index (CI) products [44], [45] or combined optical data [14] to correct the clumping effect. The method of using the CI products will cause certain errors in using a fixed value to correct the clumping effect because the CI is a time-varying quantity [45], [46]. The method of combining SLS with optical data (Landsat-5) to correct the clumping effect has achieved the correction of the clumping effect between the canopy, but the correction of the clumping effect within the canopy needs to be improved, and this method also requires optical images as auxiliary data [14]. Therefore, it is an open issue to develop a general method based on ICESat-2 to correct the clumping effect and invert the true LAI.
In the past few decades, some scholars have proposed bit correction algorithms for the clumping effect based on the fine gap distribution information obtained by optical instruments [9]. However, the fine gap distribution information is unavailable from SLS because its laser spot size reaches tens of meters, which is much larger than the gap size between the canopy and within the canopy. Hu et al. [46] employed path length distribution, which describes the canopy thickness distribution and can be acquired from 3-D point clouds of airborne laser scanning (ALS) to model and correct the clumping effect. The use of path length distribution provides a possibility to correct the clumping effect and retrieve true LAI using large-footprint laser scanners that cannot acquire fine gap distribution information. Jiang et al. [47] have used the path length distribution method to achieve the correction of the clumping effect using the GEDI; however, their results were based on virtual scenes, not real forests. Therefore, the clumping-corrected LAI from SLS based on the real forests has not yet been reported. The ICESat-2, which has a 0.7 m interval of footprint in the along-track [48], [49], [50], also has the potential to provide 3-D information of the canopy for requiring path length distribution and modeling the clumping effect, although its spatial organization is significantly different from that of ALS. ALS could repeatedly sample to achieve uniform coverage of the point cloud in the study area, but the point cloud data obtained by SLS are spatially discontinuous. The path length distribution method suitable for ALS research cannot be directly applied to the research of LAI retrieval by ICESat-2. Therefore, it is of great significance to develop an improved path length distribution method to retrieve the clumping-corrected LAI based on ICESat-2. In this context, the study mainly has the following three goals: 1) to investigate the use of ICESat-2 point clouds for retrieving laser metrics and path length distribution over forests; 2) to propose a method to correct the clumping effect and retrieve the LAI using path length distribution method in ICESat-2 segment; and 3) to validate the ICESat-2 LAI with the ALS LAI.

A. Study Area
The study was conducted in the Genhe forest reserve, which is located in Genhe, Hunbulir, Inner Mongolia, China, and between the latitudes 50 • 20 N and 52 • 30 N, and between the longitudes 120 • 12 E and 122 • 55 E [see Fig. 1(a)-(c)]. Topography ranges from 753 to 1158 m, with an average elevation of 910 m, and hilly topography (slopes of less than 15 • ) is the main geomorphic type, accounting for 80% of the total area [51]. The climate in Genhe has a monsoon-influenced subarctic climate and a cold and humid temperate forest climate, which renders Genhe the coldest city in Inner Mongolia. The average annual precipitation is approximately 464 mm, and the average annual temperature is approximately −5.3 • C [52]. The forest is mainly composed of Larix gmelinii (Rupr.) Rupr., Betula platyphylla Suk, and Pinus sylvestris var. mongolica Litv. [53].

B. ICESat-2 Dataset
As the successor of ICESat, ICESat-2 was launched in September 2018 and equipped with an ATLAS instrument, which had a photon-counting LiDAR to make measurements. The ATLAS instrument transmits laser pulses with a wavelength of 532 nm at 10 kHz every 70 cm along ground tracks, and the laser pulse is split to generate six individual beams, arranged in three pairs [54]. The beam pairs have a spacing of 3.3 km in the across-track direction, and each pair includes a strong beam and a weak beam and the two beams have a distance of 90 m [55]. The diameter of the laser footprint is approximately 17 m [54]. The ICESat-2 data offer new opportunities for the fine sampling of global terrain and vegetation as well as simulating the carbon stocks.
The products of ICESat-2 are divided into three levels. The level-2 product of global geolocated photons (ATL03) [48] was used in this study to acquire fine canopy height distribution. ATL03 data were acquired from National Snow and Ice Data Center (https://nsidc.org) in HDF-5, which recorded the longitude, latitude, and absolute height above the World Geodetic System 1984 (WGS84) for each photon event [54]. In the study area, we obtained the point clouds data of release 005 on 25 May 2019 and 21 August 2020. The strong beam ground track (1L and 2L) in 2019 and 1L in 2020 (see Fig. 1) were chosen to retrieve the LAI.

C. Airborne LiDAR Dataset
Airborne LiDAR data were acquired using a Leica ALS60 system onboard a Yun-5 aircraft in the summer of 2012. The Leica ALS60 operated at an altitude of 1800 m with a 1064 nm wavelength. The grid LAI derived from airborne LiDAR data based on the path length distribution method was used to validate the LAI derived from ICESat-2, which has been validated by the field measurements and has a spatial resolution of 5 × 5 m [46]. The overlapping part of ICESat-2 and airborne LiDAR was used to validate. Although ICESat-2 and airborne LiDAR have a time lag, the forest in Genhe is less disturbed by humans and only grows under natural conditions, so the tree changes relatively little.

D. Land Cover Dataset
The moderate resolution imaging spectroradiometer (MODIS) land cover type (MCD12Q1) and the global 30 m land-cover classification with a fine classification system product (GLC_FCS30) were used to compare the vegetation type under the same MODIS pixel in 2020. The MCD12Q1 product was acquired from the NASA Earth data website (https://www.earthdata.nasa.gov), with a spatial resolution of 500 m at yearly intervals. The GLC_FCS30 product was acquired from CASEarth (https://data.casearth.cn), with a spatial resolution of 30 m and a temporal resolution of five-year. Two land cover products were collected in 2020 to ensure temporal consistency with the ICESat-2 data.

III. METHODS
An improved path length distribution model was developed by combining the ICESat-2 photon-counting data to correct the clumping effect and derive the true LAI. The methodology consisted of preprocessing ATL03 photon data, acquiring path length distribution, calculating the gap probability from LiDAR metrics, deriving the LAI for each segment, and comparing the LAI derived by ICESat-2 with the ALS-derived LAI. The workflow for estimating LAI with ICESat-2 data is provided in Fig. 2.

A. ICESat-2 Photons Preprocessing
We used the LAStools [56] for preprocessing ATL03 point cloud data. First, point cloud information (longitude, latitude, and height) was extracted through the txt2las module [see Fig. 2(a)]. Second, we conducted photon quality control by using the ICESat-2 ALT03 signal photon label (photon signal confidence >= 2) in order to remove the noise signals [see Fig. 2(b)]. The confidence level associated with each photon event is selected as a signal, per surface type (0-noise. 1-added to allow for buffer but the algorithm classifies as background, 2-low, 3-medium, 4-high) [54]. Third, the ground points were extracted and the relative height above the ground for each point was calculated through the lasground module [see Fig. 2(c)]. Finally, the preprocessed height normalized point clouds data were obtained for future processing.

B. Segmentation
The segment is defined as an along-track span (or aggregation) of recorded photon events from a single ground track [54]. In this study, segment sizes of 50, 100, and 200 m were used to split the ground trajectory, and then the gap probability and path length distribution were acquired for each segment (see Fig. 2). Finally, we compare the effects of different segmentation scales on the LAI inversion accuracy.

C. Path Length Distribution From ICESat-2 Point Clouds
The path length distribution represented the thickness distribution of the canopy filled by the leaves, and the thickness distribution represented the spatial distribution of leaves. Assuming that the height of the top of the canopy can represent the thickness of the canopy, the path length distribution can be obtained by counting the height of the top surface of the canopy. On the basis of the normalized point cloud data, the moving window of 1 m size was used to search for the maximum height in the window to obtain the top of the canopy [see Fig. 2(d)]. Then, the path length distribution [see Fig. 2(e)] was acquired from the top of the canopy in each segment based on statistics. Moreover, moving windows with sizes of 2 and 5 m were tested.

D. Gap Probability From LiDAR Metrics
The LiDAR metrics were calculated for the LAI estimation in each segment through the number-based ratio [29], [46] LPM = N ground /N all (1) where N ground is the number of ground points, N all is the number of all points in the segment. The points were classified as canopy and ground return with a height threshold of 2 m to exclude low grass and retain relatively high underlying vegetation.

E. Path Length Distribution Model
The path length distribution and gap probability of each segment obtained above were input to the path length distribution model [9], [57] where lr is the relative path length normalized to [0, 1], l is the absolute path length at a location of a transect, and l max is the maximum path length along the transect.p lr (lr) is the frequency of lr falling within the infinitesimal interval [lr, lr + d(lr)].
The LAI of the path length distribution model can be expressed as LAI = 1 0 (FAVD · l max )·lr· p lr (lr)d(lr) (4) where FAVD is the leaf area volume density. The intermediate variables FAVD · l max can be derived by solving where P is the gap probability of each segment, G is the leaf projection coefficient which can be calculated from leaf angle distribution, and it is set as 0.5 for canopies with a spherical distribution of leaf angles.

F. Clumping
The CI describes the level of leaf grouping within the canopy relative to the random distribution and is a key structural parameter of the plant canopy [58]. CI was quantified by a plant dispersion parameter at a given zenith angle θ P(θ ) = e −G(θ )L/cosθ (6) where L is the LAI, is the CI, and its value is usually less than 1.0 for canopy. Chen et al. [59] defined an average CI as CI = LAIe/LAI (7) where LAIe was inverted by Beer's law LAI and LAIe were calculated for each segment, and a single CI was calculated for the segment.

A. LAIe Retrieved With ICESat-2
The LAIe was obtained through Beer's law and gap probability. The gap probability was acquired through LPM, where a 2 m height threshold was chosen to distinguish ground and non-ground points. The calculation of LPM was based on a 200 m size segment. The gap probability and the LAIe also have a fixed 200 m segment. The gap probability and LAIe derived from ICESat-2 were validated through ALSderived gap probability and LAIe in overlapping regions of ALS and ICESat-2. The results showed that the gap probability retrieved by ICESat-2 performed a good correlation against the ALS-derived gap probability [see Fig. 3(a)], and the root mean squared error (RMSE) was 0.06, which indicated that the number-based ratio [29] was suitable to calculate the gap probability from ICESat-2 point clouds. Moreover, the LAIe retrieved by ICESat-2 [see Fig. 3(b)] showed good consistency with the ALS-derived LAIe (RMSE = 0.29, R = 0.81). Meanwhile, the scatter points were roughly distributed on both sides of the 1:1 line.

B. LAI Retrieved With ICESat-2
The LAI retrieved by ICESat-2 with a 200 m segment was validated by ALS-derived LAI (see Fig. 4). The results showed that LAI retrieved by ICESat-2 performed a strong correlation against the ALS-derived LAI, with an RMSE of 0.37. This indicated that the ICESat-2 photons had the potential to accurately portray the forest canopy structure. Compared to the LAIe derived by ICESat-2, the LAI was 31.72% higher than the LAIe, which showed that the segment method based on the path length distribution model has achieved correcting the clustering effect.
In addition to validation with airborne data, we also compared the ICESat-2 LAI with the MODIS LAI (see Fig. 5). Due to the scale difference between the ICESat-2 and MODIS, we aggregated the ICESat-2 LAI to the pixel scale of MODIS,  matching MODIS pixels. There is a satisfactory agreement between ICESat-2 LAI and MODIS LAI (see Fig. 5), with an RMSE of 1.03. Most of the relative error (RE) values were lower than 20%, accounting for 48% of all values. These results also indicated that the proposed method could reliably derive LAI with ICESat-2 photons. Moreover, MODIS LAI was lower than the ICESat-2 LAI for LAI less than about 1.0.

C. LAI Distribution Along the ICESat-2 Track
The LAIe and LAI along the ICESat-2 track in the study area are shown in Fig. 6

A. LAI at Different Segment Sizes
The LAIs of different segment sizes (50, 100, 150, and 200 m) were compared with ALS-derived LAI with a spatial resolution of 5 m and MODIS LAI. The ICESat-2 LAI was highly correlated with the ALS-derived LAI at the 200-m segment, followed by 100 and 150 m (see Table I). The consistency showed that the 200-m segment photons already  had enough photons and path length distribution information for statistics in each segment. To further compare the accuracy of ICESat-2 LAI at different segment scales, we also calculated the RMSE and RE between ICESat-2 LAI and MODIS LAI. The RMSEs of 50, 100, and 200 m segment LAI were not much different, with the value of 0.98, 1.0, and 1.03, respectively. The RE between ICESat-2 LAI and MODIS LAI below 20% was most at the segmentation of 200 m, followed by 50 and 150 m (see Fig. 7). These results also indicated that the 200 m segment was suitable for ICESat-2 photons to invert LAI and correct the clumping effect. Some segments that had worse quality points were deleted by visual interpretation when comparing the ICESat-2 LAI with ALS LAI and MODIS LAI. The 200 m segments chosen to compare with ALS or MODIS must have higher point cloud density, like Fig. 8(b), which could capture the top of the canopy profile. The segments which had spare point cloud density [see Fig. 8(a)] or most of the points marked as ground [see Fig. 8(c)] would be deleted.

B. Comparison of ICESat-2 LAI and MODIS LAI
We aggregated ICESat-2 LAI to the scale of MODIS LAI and then made a comparative analysis. Although the results showed satisfactory accuracy, ICESat-2 LAI was significantly higher than MODIS LAI when MODIS LAI was less than 1.0. We found these pixels to compare the real landscape [see Fig. 9(a)], MODIS land cover type [see Fig. 9(b)], and GLC_FCS30 land cover type [see Fig. 9(c)]. According to the Sentinel-2 image, these pixels may be covered by forests and low shrub vegetation and they were inhomogeneous. According to the MODIS land cover product, these pixels passed by  the ICESat-2 track were woody savanna. In the GLC_FCS30 product, these pixels were covered by deciduous broadleaved forest and deciduous needle-leaved forest, which were more consistent with our field investigations than MODIS. This result showed that the MODIS LAI values of these pixels may be relatively lower. Vegetation type was an important input parameter of the algorithm of MODIS LAI, which provided the vegetation structural information [45]. Therefore, the MODIS LAI algorithm was sensitive to the vegetation types and misclassification can induce uncertainty for LAI estimation [15]. In contrast, our method could acquire a relatively reliable LAI value by correcting the clumping effect. The method was less affected by vegetation type because of the 3-D structure information as the main input parameter.

C. Impact of Moving Window Sizes on LAI
The path length distribution and gap probability can be obtained through the moving window of different sizes on the basis of the 200 m segment. In this study, we compared the impact of moving window sizes of 1, 2, and 5 m on the acquisition of path length distribution, and the inverted LAIs were compared with the ALS LAI (see Fig. 10). We found that the 1 m moving window had the highest accuracy (RMSE = 0.37, MRE = 0.15), followed by 2 m (RMSE = 0.51, MRE = 0.20), and the lowest accuracy was 5 m (RMSE = 0.63, MRE = 0.26). The correction of the clumping effect in the 1 m window was also the most obvious, and the LAI value was obviously higher, which indicated that the top of the canopy with the fine resolution is more conducive to the acquisition of path length distribution and the correction of the clumping effect. Although ICESat-2 had achieved satisfactory performance, ICESat-2 LAI was still lower than ALS LAI. The reason may be that the ICESat-2 had a larger sampling interval than ALS and the variation of canopy heights was underestimated [60], [61].
As for the gap probability, we assigned the same weight value to different windows in the segment, which could avoid the situation that the gap probability was too high or too low due to uneven point distribution. This study tested the window sizes of 5, 10, 20, 50, 100, and 200 m to calculate the gap probability, LAIe, and LAI based on the 200 m segment and compared them with the ALS. The ICESat-2 gap probability calculated by the 200 m moving window was in best agreement with the ALS-derived gap probability [see Fig. 11(a)], with a standard deviation (STD) of 0.09 and a root mean square deviation (RMSD) of 0.06, followed by 5 m. Moving windows of 10, 20, 50, and 100 m were not much different, with the RMSD of 0.06, 0.07, 0.07, and 0.06, respectively. These results indicated that the gap probability of ICESat-2 based on a 200 m moving window was reasonable. The Taylor diagram of the LAIe [see Fig. 11(b)] showed that the LAIe and the gap probability were consistent, and the LAIe calculated by the 200 m moving window was the best agreement compared with the airborne (STD = 0.33, RMSD = 0.20). From Fig. 11(c), the LAI inverted by the gap probability, which was calculated by a moving window of 5 m, was in good agreement with the ALS LAI. In summary, the LAI retrieved based on the gap probability obtained from a moving window of 200 m was consistent with the LAI retrieved by the airborne. In addition, it has to be acknowledged that there are relatively few verification points for comparison between ALS and ICESat-2 because of the limitation of ALS data coverage and the discontinuity of ICESat-2 data. In future studies, more airborne and ground observation data should be collected with dense coverage of the ICESat-2 track.

VI. CONCLUSION
In this study, we presented a segmented method based on the path length distribution model to retrieve the clumping-corrected LAI using ICESat-2 laser altimetry data. The ICESat-2 photon point clouds data were segmented with a specific length, then the gap probability from LPM and path length distribution from the height of the top of the canopy were acquired to retrieve the CI and LAI at the segment scale. The gap probability was calculated based on the 200 m segment and path length distribution obtained based on the 1 m moving window as the model input had the highest accuracy in LAI estimation through a direct comparison between ICESat-2 LAI and ALS LAI. The proposed method based on the path length distribution model has the ability to characterize the clumping-corrected LAI using ICESat-2 laser altimetry data, with an RMSE of 0.37. Comparing the inverted LAI segments with different sizes, it was found that the inverted LAI with a 200 m segment was preferred for the ICESat-2 laser altimetry data.
The limitation of the point cloud data, the time lag of ICESat-2 acquisitions, and ALS observations may affect the accuracy of LAI inversion. Future studies focusing on the quality check of ICESat-2 point cloud data in larger regions will help improve the LAI estimation using ICESat-2. In general, this study paved the way for the use of ICESat-2 data alone to invert LAI and correct the clumping effect, and the segment method based on path length distribution is worth further testing and application in larger areas.