Improving GEDI Footprint Geolocation Using a High-Resolution Digital Elevation Model

Global Ecosystem Dynamics Investigation (GEDI) is a lidar system on-board the International Space Station designed to study forest ecosystems. However, GEDI footprint low accuracy geolocation is a major impediment to the optimal benefit of the data. We thus proposed a geolocation correction method, GeoGEDI, only based on high-resolution digital elevation models (DEMs) and GEDI derived ground elevations. For each footprint, an error map between GEDI ground estimates and reference DEM was computed, and a flow accumulation algorithm was used to retrieve the optimal footprint position. GeoGEDI was tested on 150 000 footprints in Landes and Vosges, two French forests with various stands and topographic conditions. It was applied to GEDI versions 1 (v1) and 2 (v2), by either a single or four full-power laser beam tracks. GeoGEDI output accuracy was evaluated by analyzing shift distributions and comparing GEDI ground elevations and surface heights to reference data. GeoGEDI corrections were greater for v1 than for v2 and agreed with errors published by NASA. Within forests, GeoGEDI improved the root mean square error (RMSE) of ground elevation in Landes by 26.8% (0.34 m) and by 13.3% (0.14 m) for v1 and v2, respectively. For Vosges, ground elevation RMSE improved by 59.6% (3.82 m) and 36.2% (1.41 m), for v1 and v2, respectively. Regarding surface heights, except for v2 in Landes, where insufficient variations in topography combined to GEDI ground detection issues might have penalized the adjustment, GeoGEDI improved GEDI estimates. Using GeoGEDI showed efficient to improve positioning bias and precision.

Improving GEDI Footprint Geolocation Using a vegetation structure [1]. Launched by NASA in 2018, GEDI is a high-resolution laser system installed onboard the International Space Station (ISS) [2]. Since March 2019, GEDI has been acquiring high-quality 3-D observations over noncontiguous 25 m circular footprints on the ground, between 51.6°North and South latitudes, which have proven highly relevant to the study in forest ecosystems on a global scale [2], [3].
GEDI footprint geolocations are derived from GEDI's own inertial measurement unit, global positioning systems, and star tracker sensors onboard the ISS [2], [4], [5]. However, the ISS's low orbit, size, and shape result in increased mechanical vibrations and greater variations in orientation and altitude than traditional Earth Observation satellites [6]. Consequently, the horizontal position precision of GEDI footprints was expected at 10 m after calibration [2]. For GEDI products' first version (v1), released before in-flight calibration, the mean 1 σ horizontal geolocation error reached 23.8 m. After a calibration process accounting for geolocation biases, a second data (v2) version was released in April 2021 with a positioning error estimated at 10.2 m, with final targeted accuracy at 8 m [4], [7]. Assuming as in [8] that GEDI geolocation errors follow a normal distribution N (μ = 0 m, σ = 10 m), 68.3%, 78.9%, and 95.4% of the footprints would have a horizontal location error within 10, 12.5, and 20 m, respectively. Owing to footprint diameter on the ground (i.e., 25 m), more than 20% of footprints overlap by less than 50% with the expected footprint. This hampers the comparison and combination between GEDI data and other georeferenced data, such as field measurements and continuous remote sensing data, and therefore GEDI products' qualification and the development of models to predict vegetation attributes from GEDI data [1], [9].
Recent studies assessed GEDI data quality to estimate ground elevation, canopy height, and aboveground biomass (AGB) through comparison with aerial lidar system (ALS) data [10], [11], [12], [13]. GEDI was found to provide accurate ground elevation and canopy top heights measurements, although errors can reach up to several meters [14], [15], [16], [17], [18]. A significant part of errors was attributed to low horizontal accuracy [1], [2], [8], [12], [14], [18]. Based on GEDI data simulations, Milenković et al. [19] showed that AGB estimation errors increase with increasing geolocation error. The geolocation error has more impact in heterogeneous forests and in fragmented land-covers than in very homogeneous forests [8], [19]. Slope and density of canopy cover have shown to influence GEDI estimations [14], [15], [17], [20], [21], but the link with geolocation error impact has not been tested in these studies. However, as This  geolocation errors in GEDI coordinates in slope terrain can result in larger elevation differences between the actual and provided coordinates than in flat terrain, it is reasonable to hypothesize that slope terrains will be more impacted by geolocation errors than flat ones.
Improving the georeferencing is important and requires specific approaches. The most widespread geolocation improvement method uses ALS data to simulate GEDI-like waveforms around the original footprint location [12], [22], [23]. The method processes by successive footprint clusters along individual ground tracks and a corrected geolocation is assigned where correlation between simulated and actual GEDI waveforms is maximized [12], [23]. Different studies used this approach to improve either v1 [12], [24] or v2 [17] data. Lang et al. [12] compared GEDI derived canopy heights with ALS heights, after geolocation correction, and obtained a 3.6 m root mean square error (RMSE) and a −0.3 m bias, while RMSE dropped to 2.7 m and bias to −0.1 m for 70% most certain position predictions, i.e., highest correlations between real and simulated waveforms. Liu et al. [17] compared ground elevation accuracy for v2 with and without geolocation correction and observed that improving geolocation led to a slight decrease in RMSE and in mean absolute error (MAE) of 0.12 m (4.15 m without and 4.03 m with correction) and 0.33 m (2.13 m without and 1.80 m with correction), respectively. Furthermore, Ni et al. [25] provided a comparison for AGB models based on relative height (RH) metrics obtained from v1 and v2, and from an optimized geolocation based on waveform matching of v1. When geolocation of v1 data was optimized, the determination coefficient (R 2 ) of the RH-based AGB model was sharply improved compared to v1 and slightly better than the one obtained with v2 data.
The method presented in Hancock et al. [23] has been primarily and successfully used to improve GEDI georeferencing. However, it requires waveform simulation from ALS data and is therefore limited to areas surveyed with ALS system, ideally at a time close to GEDI acquisitions. The method also requires downloading GEDI waveforms, a level 1 (L1) product that needs significant storage capacity and is not as user-friendly as higher level products. To overcome these limitations, the aim of this article is to develop an alternative georeferencing method based on the hypothesis that ground elevation data from reference digital elevation model (DEM) and GEDI level 2 (L2) ground elevation estimates are sufficient to improve the geolocation of GEDI footprints and to assess its performance. The approach, henceforth, referred to as GeoGEDI, should benefit from high-resolution DEM increasing availability and temporal stability, thus enabling much broader use. GeoGEDI was tested on v1 and v2 data for different forest and terrain conditions. Its performance was evaluated by analyzing magnitude and direction of the corrections and the impact on GEDI ground elevation and canopy height errors. The rest of the article is organized as follows. Section II introduces the data used to test and evaluate GeoGEDI. In Section III (method), GeoGEDI algorithm is detailed, prior to the presentation of the experimental setup and statistical analyses. Results are reported and discussed in Sections IV and V, respectively. Finally, Section VI concludes this article.

A. Study Sites
Two contrasting French forest environments were considered, the Landes de Gascognes, or Landes' lowland forest, and the Vosges mountainous area. The Landes region is located in southwestern France and cover the largest metropolitan French forest. The relief of the Landes is mainly flat, with elevations ranging from 0 to 200 m and mean slope of 2.6% (±4.7%). Forests account for 74% of the area and are almost entirely composed of maritime pine (Pinus pinaster Ait) plantations [26], with an average canopy cover of 45% (±23%), measured at plot level by the National Forest Inventory. The Vosges site is located in northeastern France and is much more heterogeneous in terms of topography and forest stands. It covers part of the Vosges forest and the Haguenau forest, a large lowland forest. Elevations range from 100 to 1200 m, with mean terrain slope of 17.8% (±17.0%). Dominant species are European beech (Fagus sylvatica), silver fir (Abies alba), and Norway spruce (Picea abies) [26]. The forest cover is dense with mean canopy cover of 78% (±21%). Study sites were bounded by the extents of reference digital height models (DHMs) (see Section II-C-1). The Landes study site covers 14 051 km 2 and the Vosges study site covers 6 264 km 2 . They will further be referred to as Landes and Vosges.

B. GEDI L2A Data
The GEDI instrument is composed of three lasers emitting 14 ns long near-infrared laser pulses at high frequency (242 Hz). One laser is split into two coverage beams, while the other two lasers produce two full-power beams. Each beam is deflected every other shot by the beam dithering units, which results in eight parallel ground tracks. Tracks are spaced 600 m apart and composed of 25 m diameter circular footprints 60 m apart along-track. For each footprint, the lidar waveform backscattered by the Earth's surface is recorded [2]. The recorded waveforms are processed to provide GEDI data products at footprint level. In GEDI L2A products, ground elevation, top of canopy, and RH metrics are derived from geolocated waveforms (L1B product). RHs correspond to cumulative waveform energy from bottom (0%) to top (100%), in 1% increments (RH0 to RH100) [27].
GEDI L2A products over study sites were downloaded from NASA's archive center [28], [29]. A total of 30 and 15 orbits crossing Vosges and Landes sites, respectively, and for which both version 1 (v1) and 2 (v2) GEDI products are available, were selected. Acquisition dates range from May 2019 to May 2020. The latitude, longitude, and elevation of the lowest mode (i.e., ground peak) were assimilated to footprint center coordinates and mean ground elevation within the area covered by the footprint, respectively. RH98 was used to assess the maximum height as suggested in [11] and [22]. To avoid issues with poor quality data in forest environment, only full-power footprints with good quality flags were used, as recommended in [11]. After filtering, Landes and Vosges study sites were sampled with 73 280 and 78 719 footprints, respectively (total: 151 999 footprints, Fig. 1). C. Reference Datasets 1) High-Resolution DEM and DSM: DEMs at a spatial resolution of 1 m were downloaded from the BD ALTI product of the National Institute of Geographic and Forest Information (IGN) [30]. For both study sites, the DEMs were derived from ALS acquisitions and delivered with altimetric and planimetric mean quadratic errors within 0.2 and 0.6 m, respectively [31]. Digital surface models (DSMs) representing the top of canopy, top of buildings, or other first return objects were also acquired from IGN, at the same spatial resolution. They were generated from either photogrammetric or ALS point clouds. DSMs were chosen in order to have a minimal temporal acquisition difference with GEDI data. For the Landes, the chosen DSM was produced using a photogrammetric point cloud generated using aerial photographs acquired in summer 2018 at a 35-cm resolution and processed using MicMac dense matching algorithm [32]. For the Vosges, the DSM was computed using ALS data acquired in winter 2020, and characterized by an average first return point density of 4.8 pt/m 2 . On both sites, a DHM was obtained by subtracting ALS DEM from DSM. To allow for comparison with GEDI products, DEMref and DHMref, a 1-m resolution focal mean DEM and focal maximum DHM, were computed by using a sliding 25-m diameter circular window at each pixel.
2) Forest Database: BD Forêt v2 [33] provides information about the composition and density for forest stands which have areas of greater than 5000 m 2 . The open-source database was used to classify footprints as forest or nonforest.
The different datasets are summarized in Table I.

III. METHOD
In this section, the GeoGEDI method is presented (in Section III-A) and the experimental setup is designed (in Section III-B). The latter includes parameter settings and filtering criteria used before analyzing algorithm outputs. The statistical analyses used to assess the algorithm performance are presented in Section III-C. The official French coordinate system, Lambert 93, was used during all the processing steps and analyses. While all IGN datasets were given in Lambert 93, GEDI data had to be transformed from WGS84 to Lambert 93. GEDI's latitude and longitude coordinates were transformed to Lambert 93 coordinates and GEDI's ellipsoidal heights were transformed to fit Lambert-93 altitude system by applying an altimetric conversion grid [34].

A. GeoGEDI Algorithm
GeoGEDI aims to match GEDI ground elevations to a reference DEM. Therefore, two inputs are needed: 1) GEDI footprint positions and ground elevations, and 2) DEMref. Each footprint F i (with i ranging from 1 to the total number of footprints in the study area) is processed independently. However, coregistration relies on footprints clusters (Fig. 2). For each footprint F i , the cluster C i is made of n i footprints acquired in a short time interval (δ time ) centered on F i acquisition time. ISS structural vibration frequency is estimated between 0.1 and 1 Hz [35], [36], which is lower than the GEDI laser emission frequency (242 Hz). Consequently, it can be assumed that position errors of footprints belonging to a small cluster C i are temporally correlated. During the small amount of time considered for a cluster, the pointing deviations due to ISS movements and vibrations will be similar in direction and magnitude. The lasers will not be randomly pointing in different directions and the cluster mean shift can be used to correct the position of Fi. C i 's optimal position (Δ opt ) is searched within a maximal distance of ±shift max (m) in X and Y and with a shift step (δ shift ) defined as a multiple of the DEMref resolution [i.e., k * r, with k ∈ N * and r, the resolution of DEMref (i.e., 1 m here)]. This results in where n i number of footprints in the cluster; z p DEMref values; z p GEDI ground elevations; and dz p difference between z p andẑ p .
Each MAE p value is associated to its specific shift in X and Y from the initial footprint position, resulting in a 2-D MAE i map providing a description of spatial error distribution according to shifts [Fig. 2(c)]. The best shift Δ opt is computed from the MAE i map using a two-step procedure. First, a divergent flow accumulation algorithm is applied to the MAE i map [ Fig. 2(d)]. The FD8 flow accumulation algorithm [37] was used (whitebox R package [38], [39])-a multidirectional flow algorithm commonly used to identify catchment areas and analyze drainage patterns in hydrological studies from raster DEMs. Unlike unidirectional algorithms, multidirectional flow algorithms allow flow dispersion and suit better in flat areas, while results between both types of algorithms are similar in the presence of slope [40], [41]. From each DEM cell, the flow is distributed toward the downslope neighboring cells according to proportions depending on the difference in elevation between the starting cell and its neighboring cells, i.e., the higher the difference, the higher the proportion [40], [41], [42]. The computation continues across grid cells until no more neighboring lower cell is encountered, i.e., once the flow has reached its catchment area. The final highest scores identify cells where flows most often stopped. When applied to the MAE i map, flow accumulation leads to the point with the lowest error. Cells with highest scores highlight the areas corresponding to the shifts minimizing differences between DEMref and GEDI ground elevations. Second step: computing Δ opt from the flow accumulation map. First, a convergence area is defined by selecting a given percentage of cells having the highest accumulation flow values. Then, Δ opt is defined as selected cells' barycenter and computed as the average coordinates weighted by flow accumulation values. The approach integrates information from the entire error map and is relevant to address situations with no clear identified minima, for example, when several cells exhibited the same or similar maximum scores.

1) GeoGEDI Algorithm's Parameter Settings:
Considering the positional accuracy of GEDI v1 provided in Beck et al. [7], we used 50 m as a reasonable upper shift limit (shift max ). Even though the DEMref spatial resolution was 1 m, δ shift was set to 2 m for computational efficiency. This results in N shift = 2601 tested positions for each footprint. The convergence area was defined as the 1% cells having the highest accumulation flow value. This choice resulted from an experimental tradeoff to include enough pixels to describe the convergence area while limiting the selection of secondary convergence areas pixels.
GEDI laser units are fixed at different positions, with slight orientation differences, and each has its own depointing capacity, resulting in different viewing angles. Consequently, GeoGEDI should theoretically be applied to a cluster of footprints belonging to the same beam track, thus aligned on the ground. However, matching elevations along a single direction could be suboptimal for a robust footprint position adjustment. To overcome this limitation, GeoGEDI can be applied to a cluster including several beam tracks. To analyze the pros and cons of giving priority to the logic of acquisition geometry or 2-D spatial distribution of points when coregistering GEDI data and DEMref, GeoGEDI was applied by track or considering the four full-power beam tracks together, using the same time interval (δ time ). Selecting δ time lower than the period of structural vibration of the ISS (0.1 to 1 Hz) is recommended. After testing several time intervals, δ time was set to ±0.215 s to select a sufficient number of footprints for the adjustment, while avoiding large changes in shifts inside the cluster. This δ time corresponds to a 3-km distance along a track and to ∼50 and ∼200 footprints for the single-beam and four-beam approach, respectively.
GeoGEDI was initially designed for GEDI v1 release. It was also applied to v2 data to demonstrate its potential for later releases with an improved geolocation. We hypothesize that the algorithm will also improve the later version, as NASA v2 products are said to be corrected for biases only, while GeoGEDI is supposed to improve the precision, i.e., to correct for nonsystematic errors due to ISS vibrations, in addition to correcting biases. For each of the 151 999 footprints, GeoGEDI was applied with four configurations. The different GeoGEDI outputs based on v1 or v2, using either the single-beam or four-beam approach, will be referred to as v1_1, v1_4, v2_1, and v2_4.
2) Data Filtering: Once the shifts were computed, several filters were applied. First, footprints associated to too small clusters were discarded. Indeed, cluster size (n i ) can be lowered due to removing low-quality footprints (see Section II-B). Threshold value was set to 1/4 of the theoretical maximum number of footprints for the considered time interval, corresponding to 13 and 50 footprints for the single-beam and four-beam approaches, respectively. All footprints that did not meet one of the abovementioned criteria, with either v1 or v2 dataset, were excluded. From the 151 999 footprints, 150 093 were kept for further analysis. Second, in each data set, i.e., v1_1, v1_4, v2_1, or v2_4, footprints where the shift in X or Y for Δ opt reached shift max (i.e., 50 m) were discarded.
Finally, some footprints were discarded due to issues identified in GEDI ground elevation assessment. Six waveform interpretation algorithms (01 to 06) were defined by the GEDI science team to identify the ground peak from GEDI waveforms, with different thresholds and smoothing settings [7], [27]. In GEDI L2A data v1, the default algorithm for all footprints was algorithm 01. In v2, the presumed best ground elevation is provided for each footprint along with the corresponding algorithm. This leads to possible changes in best algorithm choice and in differences in ground detection and elevation between the two GEDI versions. For v1, the default choice is always algorithm 01. For v2, the selected optimal algorithm was either 01 or 02 for our study sites. A comparison with DEMref revealed great ground elevation underestimation for some footprints where algorithm 02 was selected, probably due to faulty ground peak detection ( Fig. 3 and Appendix A, Table V). To eliminate these misestimations in further analyses, footprints having a ground elevation difference between v1 and v2 of more than 1.5 m were discarded. This concerned 26.9% and 39.3% of footprints processed with algorithm 02, corresponding to 3.4% and 13.8% of footprints, for Landes and Vosges, respectively.
Please note: this source of error was identified after processing footprints; and the footprints were used during the georeferencing process. To limit influence of erroneous ground peak detection when comparing error estimates for different datasets, they were discarded regarding ground elevation and surface height estimation analyses. Fig. 3. Example of a sorted-out GEDI footprint waveform of (a) v1 using Algorithm 01 and (b) v2 using Algorithm 02. Ground elevation of the variable "lowest_mode" in red and RH98 transformed to surface elevation (and RH98) in green.

C. Statistical Analysis
Analyses were performed on the four GeoGEDI sets, i.e., v1_1, v1_4, v2_1, and v2_4. Statistics regarding differences between NASA v1 and v2, further referred to as v1_v2 results, were also reported as baseline for discussion. As effective GEDI footprint positions are unknown, GeoGEDI's performance can only be evaluated indirectly: 1) shifts were analyzed and 2) ground elevation and surface height errors were compared before and after applying GeoGEDI.
1) GeoGEDI's Shift Analysis: As GeoGEDI is supposed to correct for geolocation errors, checking whether GeoGEDI positions tend to be in the same direction and shifts of the same magnitude than NASA's is a complementary source of algorithm assessment.
Both shift magnitudes and directions were analyzed. In order to analyze mean shift directions while taking into account major differences in orientation between ascending and descending orbits as well as minor differences according to ISS's exact flight path, the coordinate system was changed. X and Y shifts, expressed according to West/East and South/North directions, were transformed into X T and Y T considering a coordinate system linked to the local orbit ground track direction. X T axis follows the orientation of the orbit ground track [i.e., flight path direction relative to the West/East direction assessed by calculating the orientation of the track between the first and last footprint (of a same beam) of each orbit from v2 dataset] and Y T axis is perpendicular to X T , forming a local orthonormal coordinate system, centered on the initial footprint position (v1 or v2). Angular deviations can therefore be estimated when transforming new X T and Y T to polar coordinates, i.e., the footprints Euclidean distance to the initial position (0;0) and the shift angle relative to the track direction (X T ).
First, shift magnitudes' mean, median, and standard deviations were assessed. Then, mean relative shift distances and directions were used for dataset mean positions intercomparison. The mean positions were also compared by beam, so as to identify possible beam-dependent behavior.
Additionally, the temporal evolution of shift distances and directions was visually analyzed by plotting the positions of successive footprints belonging to the same orbit. For visual simplification, the temporal variability was illustrated for three  Fig. 4(c)]. This highlights differences between ground tracks among the different datasets.
2) Elevations and Heights Qualification: GeoGEDI outputs are expected to improve the agreement between GEDI and reference elevations and heights. Therefore, ground elevation and surface height errors were analyzed. Ground elevation errors are expected to diminish as the algorithm is based on minimizing ground elevation errors. However, height errors analysis provides a fully independent evaluation of the algorithm performance. It consists in comparing GEDI RH98 data with DHMref.
The evaluation relied on four standard metrics: MAE (2), mean error [ME (3)], error standard deviation [σ (4)], and RMSE (5). These metrics were computed for the six different datasets (v1, v2, v1_1, v1_4, v2_1, and v2_4) where n number of footprints in the dataset; z i DEMref values; z i GEDI ground elevations; dz i difference between z i andẑ i ; and dz sample's mean difference between z i andẑ i .
The same statistics were used for height estimations, replacing DEMref by DHMref and z by h.
For each footprint, available auxiliary information included: 1) the study site, 2) forest vs. nonforest status, 3) shift magnitude, 4) and a local slope indicator. The latter was defined as the ground elevation range at each GEDI footprint level, and was computed from the 1-m raster DEM using v1 footprint positions. Forest vs. nonforest status was established using both the forest map (see Section II-C-2) and DHMref. All nonforest footprints of the forest map were assigned the "nonforest" class, while forest footprints with a less than 2-m DHMref value were reclassified as "nonforest," in order to remove footprints acquired over clear-cuts or areas that changed from forest to agricultural land between the last forest map update and GEDI data acquisitions. Distributional metrics were compared for several datasets, defined based on auxiliary information. To evaluate the shift magnitude influence, footprints were divided into five classes based on quantiles of shift magnitude distribution, resulting in an equal number of footprints per classes. Classes were noted C Q1 , C Q2 , C Q3 , C Q4 , and C Q5 .   Although footprints average corrected positions were relatively close to each other, there were notable differences among experimental setups. Fig. 6 illustrates the spread in shift distributions for an example orbit. Fig. 6, unlike Figs. 5 and 7, is presenting the applied shifts in X and Y coordinate system, i.e., following the usual West/East and South/North axis, in order to illustrate the shifts with regards to the search window.  Fig. 8 illustrates GeoGEDI positions' temporal evolution for an orbit segment and highlights differences between ground tracks corresponding to the various datasets. V1 and v2 tracks are nearly parallel, which translates the bias correction announced by NASA. Tracks obtained with GeoGEDI wobble around v2 tracks, and may vary quickly over time, as illustrated in Fig. 8. In Landes, local variations are greater than in Vosges. Within only 3 km, the v1_1 track can deviate by more than 50 m from the reference track line [ Fig. 8(c)].

B. Impact of GeoGEDI Corrections on Ground Elevation and Surface Height Estimates
Next, the differences between DEMref and GEDI ground elevations and between DHMref and GEDI RH98 are referred to as dz (Z DEMref -Z GEDI ) and dh (H DHMref -H GEDI ), respectively.
1) Evaluation of Ground Elevation and Surface Height for Forest and Nonforest Areas: Table III shows ground    Surface height results are presented in Table IV. Overall, GEDI heights were closer to reference heights at v2 positions than at v1: ME, σ, and RMSE all decreased. The greatest height assessment improvements were achieved with the four-beam approach, except for v2 in Landes; there, GeoGEDI brought no improvement. For Vosges, slightly better performances were observed with v2-based approaches than with v1-based ones. In both sites, mean heights were underestimated for forest footprints-ME ranging from 0.54 to 0.76 m for Landes and from 2.38 to 2.69 m for Vosges-and overestimated for nonforest footprints-ME ranging from −1.12 to −1.41 m for Landes and from −0.84 to −1.10 m for Vosges. RMSEs were similar for both land uses, with values ranging from 4.19 (v2, nonforest) to 5.25 m (v1, forest) and from 5.99 (v2_4, forest) to 7.58 m (v1, nonforest), for Landes and Vosges, respectively. Overall, in Vosges, RMSEs were lower for forest footprints than for nonforest footprints. Opposite results were observed in Landes. 2) Shift Magnitudes Influence: GeoGEDI shift magnitudes impact on ground elevation and height estimates was considered, to evaluate whether large shifts were justified or artifacts. Fig. 9 compares dz distributions between v1 and v1_1 [ Fig. 9(a)] and between v2 and v2_1 [ Fig. 9(b)] for five shift magnitude classes (see Section III-C-2). The improvement in ground elevation accuracy increases with shift magnitude increase. For v1_1, RMSEs were lowered by 18.8%, 39.0%, 54.6%, 62.0%, and 68.1% for classes C Q1 , C Q2 , C Q3 , C Q4 , and C Q5 , respectively. The same trend, with a decrease in precision and an improvement in bias [ Fig. 9(b)], was observed for v2, although improvements in accuracy were less pronounced. For v2, ground elevation RMSEs were, respectively, improved by 5.7%, 18.5%, 27.5%, 39.9%, and 61.0%. As already noticed, shifts applied to v2 were much smaller than those applied to v1 (see class limits, Fig. 9). For v1_1, 20% of footprints were shifted by less than 12.8 m, while for v2, this quantile limit was 6 m.
Regarding surface heights [ Fig. 9(c) and (d)], compared to v1, v1_1 RMSEs decreased by 4.2%, 11.8%, 14.7%, 20.6%, and 6.4% for classes C Q1 to C Q5 . Like for ground elevations, the further the shift, the more important the improvement in height estimates in the first four classes. However, RMSEs did not continue to improve for C Q5 . Compared to v2, v2_1 height RMSEs were slightly improved by a maximum of 3.2% for the smallest shift distances (C Q1 , C Q2 , and C Q3 ). But RMSE improved by only 0.6% for class C Q4 , and even deteriorated by 24%-from 4.55 to 5.64 m-for footprints belonging to C Q5 [ Fig. 9(d)]. Note that C Q5 is mainly composed of Landes footprints (79% Landes against 21% Vosges).

3) Influence of the Slope:
In sloped terrain, a small error in geolocation results in large ground elevation errors. As expected, the higher the slope indicator, the higher the errors in ground elevations [ Fig. 10(a) and (b)]. For example, v1 ground elevation RMSEs were 0.98, 1.70, 2.87, 4.65, and 9.05 m for the five slope classes reported in Fig. 10. Moreover, the higher the slope indicator, the greater the improvement brought by GeoGEDI, and, compared to v1, v1_1 ground elevation RMSEs were improved by 9.7%, 31.3%, 48.2%, 59.2%, and 63.4%, for classes C 1 , C 2 , C 3 , C 4 , and C 5 , respectively. Similar results were obtained for v2_1 regarding v2, with improvements of 5.7%, 12.7%, 21.4%, 31.7%, and 41.7% for all five slope classes.
The slope effect on height estimates is illustrated in Fig. 10(c) and (d). For all datasets, the smaller the slope, the better the estimate. For v1, GeoGEDI outputs improved the flattest footprints' height accuracy by 4.9%. For the other four classes, height RMSEs decreased between 18.1% and 20.8%. For v2_1, height RMSE increased by 9.6% for footprints with no slope (C 1 ) and height RMSE was improved by 1.7% for footprints with low slope (C 2 ). Concerning footprints with greater slope (C 3 , C 4 , C 5 ), height RMSEs were improved by 5.1%, 6.0%, and 5.7%, respectively.

A. Shift Analyses Corroborate GeoGEDI's Efficiency
GeoGEDI-based mean shifts were in accordance with horizontal geolocation errors announced by NASA's user guide [7]. Logically, shifts obtained with v1-based approaches were greater than those obtained with v2-based approaches (Table II). Beck et al. [7] studied GEDI geolocation error over a 30-week time-span. The mean of weekly computed 1 σ errors was  (Fig. 5), thus providing additional GeoGEDI robustness validation. The on v1 applied corrected angles all rotated toward the same direction, above and perpendicular to the flight path direction. Compared to initial v1 positions, v2 positions were moved in a direction of ∼93°with respect to v1 track direction and GeoGEDI v1_1 and v1_4 in a direction of ∼85°and ∼87°, respectively. This is in line with the direction found by Quirós et al. [21]. In order to correct the positions for geolocation bias, they tested eight directions (0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°) at two distances (5 and 10 m) from the initial v1 position and defined the best fitting position for each footprint based on the lowest RMSE between GEDI elevation and aerial lidar DEM. Among the 17 tested positions (e.g., eight directions at two distances and the central initial position), the best fitting position was at 10 m and 270°clockwise, corresponding to a 90°angle above the flight path (i.e., standard counterclockwise angle measurement used in this article). A total of 31.88% of their footprints had the lowest RMSE for this position.
Moreover, GeoGEDI per beam results complied with theoretical expectations. When the single-beam approach was applied to v1, resulting mean positions were paired according to laser units. Nevertheless, mean positions also exhibited small differences, possibly arising from the difference in pointing direction between the two beams of a beam pair emitted by the same laser unit. Compared to initial v1 positions, v2 positions were rotated by ∼78°for one beam pair and by ∼109°for the other pair. GeoGEDI v1_1 average positions were rotated by 76°for beam pairs 1000 and 1011, while beams 0101 and 0110 were rotated by ∼97°. When applied to v2, GeoGEDI mean shifts were almost identical for all beams regardless of the laser unit, confirming NASA's biases correction on v2 products. Mean positions of v2_1 and v2_4 corrected initial v2 positions with 20°to 35°a ngles. Finally, we assumed GeoGEDI could correct for geolocation source inaccuracy that cannot be handled from ISS-borne sensors and in-flight calibrations, such as ISS structural vibrations. Beyond the trends provided by mean shifts, the quick shifts temporal changes and their magnitude are worth noticing. For both, the single-beam and the four-beam approaches, two consecutive footprints could have significantly different shifts with respective clusters differing by few footprints. Yet, shift values followed a relatively continuous pattern (Fig. 8). This continuity is important, as it is key for our assumption validity, i.e., using footprints acquired in a shorter time interval than the highest vibration period captures these vibrations impact on the geolocation error. Resulting GeoGEDI tracks have more variable and less "smoothed" track patterns than those observed in NASA footprint positions, highlighting that GeoGEDI succeeded in capturing part of ISS high frequency variations. As a result, computed shifts were observed as spatially correlated (shift continuity). However, we are aware that GeoGEDI tracks are probably still slightly smoothed compared to real tracks, as footprints were corrected for local mean deviations.

B. GeoGEDI Advantages and Limitations
The proposed georeferencing method proved efficient and robust for a diversity of environments (Section V-B-1), even if some limitations (Section V-B-2) and possible ways of improvements (Section V-B-3) were identified.

1) GeoGEDI Main Strengths:
One of GeoGEDI's major assets is it needs only two inputs: 1) coordinates and "low-est_mode" variable from GEDI L2A footprints and 2) a highresolution DEM, which are increasingly available worldwide. Additionally, it is simpler than methods based on waveform correlation between GEDI and ALS simulations [23]. Results indicated that GeoGEDI greatly improved consistency in ground elevation between GEDI and DEMref (see Section IV-B-1). Height estimates were also improved for most cases, except for v2-based approaches in Landes (see Section V-B-2). Consistency between GEDI estimations and reference values proved considerably improved in sloped areas where even small geolocation error can lead to high discrepancy.
However, results on ground elevation and canopy height accuracy after improving the geolocation were still and inevitably influenced by study site characteristics. The Landes are flat and stands are mainly composed of maritime pine, a species that lets a high proportion of light reach the ground. On the contrary, in the topographically complex area of the Vosges mountains, stands are more dense and are composed of species with higher foliage density. Several studies have reported a link between an increase in RMSEs and an increase in either vegetation density [15], [17], [21] or terrain slope [1], [17] for both GEDI ground elevation and vegetation height products. For example, Liu et al. [17] reported high ground RMSEs (6-7 m) for dense and tall vegetation and a 2.88 m RMSE for areas with slope <5°compared to 6.70 m for areas with slope >30°. Similarly, errors for v1 and v2 forest footprints are much higher in the Vosges than in the Landes, and remain higher in the Vosges even after geolocation has been improved, e.g., v2_1 ground elevation RMSEs are 0.91 and 2.49 m and canopy height RMSEs are 4.99 and 6.10 m, in Landes and Vosges, respectively.
Concerning canopy heights estimations, they are directly impacted by ground estimation accuracy [17] and thus by the abovementioned factors. Despite a large geolocation bias correction, improvements in RMSEs between v1 and v2 remain limited [i.e., 5.25 m down to 4.48 m (−17 %) over the Landes and 7.48 m down to 6.30 m (−19 %) over the Vosges]. This can be attributed to the relative stability of vegetation height at stand level as both study sites are mainly occupied by even-aged production forests. Even once shifted, a majority of footprints will be located in the same stand and have a similar canopy height value than at their initial location.
The uncertainty of reference data may also affect the discrepancy between GEDI and reference data. Most importantly, the time and seasonal differences between the two data acquisitions allow for changes in vegetation heights. The Landes have significant forest dynamics in pine plantations [43], drastically impacting canopy heights.
2) GeoGEDI Limitations in Flat Areas: Validation highlighted better GeoGEDI performances for Vosges than for Landes. Shift distances v2_1 and v2_4 for Landes were also higher than for Vosges, departing from horizontal geolocation errors announced by the user guide [7]. Additionally, mean shift distances barely decreased between approaches applied on v1 and on v2. The presence of large flat areas in Landes might explain such results. Typically, DEMref values in Landes optimal position search windows are highly similar, which impedes convergence toward minimal error and finding the optimal position. The error analysis by shift magnitude classes [ Fig. 9(d)] highlights issues with footprints belonging to C Q5 (shift ≥ 32.6 m) for v1 and to C Q4 and C Q5 (shift ≥ 14.6 m) for v2. While all classes' ground elevation estimates improved, surface height estimations of footprints with the largest shifts worsened. These classes may include footprints for which GeoGEDI converged toward a suboptimal position. These geolocation errors have more impact on height accuracy than on ground elevation estimates due to the lower variability in elevation compared to surface height variability. It is worth noticing that those very large shifts mainly concern Landes footprints (61% of the footprints in C Q5 in v1 and 79% of C Q5 in v2 belong to Landes). In Section III-B-2, we also reported that a subset of footprints was removed prior to statistical analyses because the convergence process was interrupted at the search window limit. This mainly concerned Landes footprints, with up to 8.7% of footprints compared to 1% in Vosges, suggesting the algorithm had punctually some converging issues in flat areas. The important dispersion of GeoGEDI shifts [e.g., Fig. 8(c)] can be explained by ISS large movements and vibrations, or by convergence issues in flat and textureless areas.
3) Recommendations on the Use of GeoGEDI and Possible Improvements: On the one hand, using the single-beam approach better respects the lidar systems acquisition geometry. On the other hand, using the four-beam approach increases the number of footprints in the cluster and spatial dimension (from 1-D profile to 2-D sampling), which is likely to increase elevation variability within the cluster, especially in low-relief areas. For v1-based approaches, best estimates were observed with the single-beam approach. Therefore, it is more important to respect the acquisition geometry than to build on the beneficial effect of 2-D sampling. To improve georeferencing of v1 data, the single-beam approach should be preferred in all cases. Processing GeoGEDI by beam pair clusters could also be considered in future studies, increasing the number of footprints, while respecting the instrument geometry. NASA v2 geolocation was corrected for bias and is less, or even no more impacted by acquisition geometry effects thanks to in-flight calibration. Therefore, the four-beam approach can be considered on v2. Single-beam and four-beam approaches gave very similar GeoGEDI outputs. GeoGEDI v2_1 estimates were slightly better for ground elevations, whereas v2_4 estimates were slightly better for surface height estimates. Both approaches can be used to further improve GEDI v2 geolocation. However, assessing height estimates aimed to provide an independent validation, suggesting that the four-beam approach should be preferred to process v2 data.
Furthermore, in low-relief environment, increasing the cluster size would increase heterogeneity in elevations, allowing better convergence of the flow accumulation algorithm. However, it would also result in "smoother" tracks closer to v2 tracks, and thus to lower improvement in geolocation precision with less consideration to errors due to high frequencies vibrations. Moreover, as it is only based on GEDI and DEMground elevations, GeoGEDI would certainly benefit from improved ground peak detection in L2A data. Indeed, even if GeoGEDI improved estimates, footprints with sharp local ground underestimates were included during adjustment process and might have impacted GeoGEDI's v2-based outputs.
Results could also be improved by increasing the search window beyond 50 m and by using a smaller shift step, e.g., equal to the DEMref resolution (1 m), instead of the 2-m δ shift that was used in this article. However, this would result in a sharp computation time increase, and should be accompanied by an optimization strategy, e.g., considering a multiscale approach, using a large step (∼5 m) to identify the main shift direction, followed with a more local search with a smaller search window and smaller shift step to refine the optimal position. Moreover, as stated in Section III-A, the flow accumulation map value at the optimal Δ opt position can be interpreted as an indicator of GeoGEDI's reliability. The lower the accumulation value of Δ opt , the higher the ambiguity around Δ opt . Examples of low confidence footprints can be found in Fig. 11 in Appendix B. A simple threshold could be used and added to each footprint by adding a tag, warning users about possible convergence issues, similarly to quality and degrade flag implemented by NASA.

VI. CONCLUSION
GEDI footprints provide large scale and high sampling density data about forest structure. But low georeferencing accuracy can be detrimental to their use for predictive models of forest attributes. The proposed method is based on GEDI ground elevations and a high-resolution DEM, to improve geolocation of GEDI footprints. The method was tested on GEDI v1 and v2 for two French forests, broadleaved-dominated forest in a flat area and dense coniferous-dominated forest in a mountainous area. Our results quantified the georeferencing improvements undertaken by NASA between version 1 and 2. Besides, a ground detection issue was identified for GEDI v2 footprints using algorithm 02. However, GeoGEDI successfully improved GEDI v1 and v2 footprints positioning, simultaneously reducing bias and improving precision components. Despite improved footprint geolocation in GEDI v2, already corrected for the systematic error components, there is room for additional improvement. Yet, its performance depends on the topography, with lack of convergence in very flat areas. The method showed efficient to correct for ISS attitude and altitude variations for a diversity of forest environments, and to assess GEDI data quality with more confidence. The methods' relative simplicity allows for fast and efficient large-scale deployment, wherever a high-resolution DEM is available. With improvements in the range of those obtained with more complex methods based on waveform processing, the method is a good alternative candidate to process GEDI data prior to implementing methods requiring a precise matching of data sources, such as for data fusion purposes.

APPENDIX A GEDI GROUND ELEVATION ESTIMATIONS BY SELECTED GROUND PEAK ALGORITHM
See Table V.

APPENDIX C AVAILABLE DATASET AND SCRIPT
The dataset processed for this article is available online [44]. GEDI footprints with coordinates of unchanged GEDI v1 and v2 releases and the coordinates calculated with GeoGEDI algorithm, as well as all variables used for this article, are included.