Machine-Learning-Method-Based Inversion of Shallow Bathymetric Maps Using ICESat-2 ATL03 Data

The application of empirical methods for satellite-derived bathymetry is limited by the lack of in situ bathymetric data in remote, inaccessible areas. This challenge has been addressed with the launch of Ice, Cloud, and land Elevation Satellite-2 (ICESat-2). This study provides an accurate bathymetric photon extraction process for ICESat-2 ATL03 data, and the ${{\bm{R}}}^2$ value of the bathymetric photons obtained using this process and airborne bathymetric LiDAR data is up to 99%. Next, based on two types of remote sensing data, ICESat-2 and Sentinel-2, machine learning models, including linear regression (LR), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost), were trained to obtain bathymetric maps. The experimental results show that the mean root mean square error (RMSE), mean absolute error (MAE), and mean relative error (MRE) values of the LR models are less than 3.02 m, 2.38 m, and 86.03%, respectively. The mean RMSE, MAE, and MRE values of the LightGBM and CatBoost models are less than 0.91 m, 0.66 m, and 23.17%, respectively. It is concluded that the proposed denoising process for ICESat-2 ATL03 data is effective, and the results of the bathymetric maps obtained using these data are satisfactory. Thus, the proposed approach is effective, and this strategy can be used to replace conventional bathymetric inversion methods to obtain high-accuracy bathymetric maps.


I. INTRODUCTION
I T IS fundamental to obtain accurate shallow water depth information for marine ecological environment research and marine safety navigation. In traditional bathymetry, bathymetry rods and echo sounders are utilized; although these tools can be used to obtain measurements with high accuracy, there are several factors and environmental constraints on measurements, and it is arduous to work in remote and unreachable areas [1]. Satellite-derived bathymetry (SDB), which has gained significant attention from researchers as a result of the advancement of remote sensing technologies, has been used to obtain bathymetric maps [1], [2], [3], [4], [5].
The current methods to derive shallow water bathymetry can be categorized mainly into theoretical [3], [4], [6], [7], semiempirical [2], [8], [9], [10], [11], and empirical [12], [13], [14] methods. The radiative transfer process serves as the foundation for the theoretical methods used to create the water depth inversion equation, which makes it challenging to obtain and calculate many of the optical parameters of water. Empirical methods are based on the relationship between the reflectance of remote sensing imagery and in situ bathymetric data. Machine learning (ML), such as random forest, is also a special form of empirical method that has high accuracy in a specific area [12], [13], [14], [15], [16], [17], [18]. Using a support vector machine, bathymetric inversion of Sint Maarten Island and Ameland Inlet was performed; the performance for shallow water depths was improved, and accurate bathymetric maps were effectively obtained [19]. However, the lack of in situ bathymetric data for constructing a method in some study areas has become an important issue limiting its application. Nonetheless, Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) was launched successfully in 2018, offering a fresh approach to this issue [20], [21], [22], [23].
ICESat-2 carries an instrument called ATLAS (advanced topographic laser altimeter system), a highly sensitive photoncounting LiDAR system. ICESat-2 has three pairs of beams [24]. Each beam pair contains two beams, i.e., a strong beam and a weak beam, and the strong beam has four times the energy of the weak beam. Because ICESat-2's detectors are so sensitive, and for other reasons, the raw photons contain a large number of noise photons, especially during daylight hours. Consequently, there are numerous ways to extract signal photons from ATL03 data. A modified density-based spatial clustering of applications with noise method was used to extract signal photons and noise photons from ATL03 data and apply them to SDB [25]. The multipeak Gaussian fitting method was used to extract the sea surface photons from ATL03 data, and then the final photons were obtained after three median filtering iterations [26]. The water signal was extracted directly from the ATL03 data file using a semiautomatic method, which is effective in clear water and good weather conditions [22]. The adaptive variable ellipse filtering bathymetric method was proposed to accurately sort the photons at the bottom of the sea to obtain bathymetric data [27]. The difficulty of signal photon extraction is increased by the influence of environmental factors and a nonuniform noise distribution. The existing denoising algorithms still cannot eliminate manual and empirical threshold selection. Hence, a simple, feasible, and accurate process to extract accurate bathymetric photons is necessary.
In this study, we first extract accurate bathymetric photons from ICESat-2 ATL03 data, and then the following steps are completed: 1) removing noise photons with large errors by Level-1 denoising, 2) separating the sea surface and seafloor photons, 3) obtaining accurate bathymetric photons by Level-2 denoising. The final photons replace in situ bathymetric data combined with Sentinel-2 imagery to build ML methods to obtain bathymetric maps of Waimanalo and the Antelope Reef. Simultaneously, the accuracy index of the methods is evaluated and analyzed. We can combine the benefits of both types of data and perform high-accuracy shallow water bathymetry on a massive scale by fusing the ICESat-2 ATL03 data with multispectral satellite imagery. The framework of this study is shown in Fig. 1.

A. Study Areas
In this article, there are two study areas. The first study area is Waimanalo (see Fig. 2), which is located to the southeast of Oahu. Oahu is the third largest Hawaiian island, covers an area of approximately 1546 km 2 , and is the most populated island. The island has two parallel mountain ranges, the Koolau and the Waianae ranges, which are connected by a central plateau. The coast is tortuous, and there are many coral reefs along it. There is also clear water, making this area suitable for SDB.
The second study area is Antelope Reef, which is situated in the southwest corner of the Yongle Atoll in the Paracel Islands, as shown in Fig. 3. This atoll becomes a closed lagoon inside after low tide.
B. Data Collection 1) Sentinel-2 Multispectral Imagery: The Sentinel-2 satellites, which were launched on 23 June 2015 and 7 March 2017, respectively, are comprised of Sentinel-2A and Sentinel-2B. They were placed in the same sun-synchronous orbit with a 180°p hase shift. Each satellite carries a multispectral instrument and orbits the Earth at an altitude of 786 km. The spectral range covers the visible light, near-infrared, and shortwave infrared regions, and there are three spectral bands in the red-edge range, which have high resolution and can be used to extract more water information hidden in the band. There are a total of 13 spectral bands, and they have distinct spatial resolutions at the ground that range from 10 to 60 m [28]. The Sentinel-2 Level-1C product used in this study is free and available from the European Space Agency (https://scihub. copernicus.eu/dhus/#/home). The Level-1C product is transformed into the Level-2A product using the Sen2Cor toolbox, followed by resampling to 10 m in the free Sentinel Application Platform.
To avoid errors caused by environmental conditions such as cloud shadow, sun glint, and waves when imaging, three Sentinel-2 Level-1C products are used in each study area. Detailed information on the two study areas and the data used are shown in Table I.
3) Airborne Bathymetric LiDAR Data: The bathymetric data for Waimanalo were collected from the Scanning Hydrographic Operational Airborne LiDAR Survey (SHOALS) system and are available at http://www.soest.hawaii.edu/coasts/data/oahu/ shoals.html. The SHOALS system consists of near-infrared and green lasers, a laser oscillator, an optical receiver, and a data processing unit and is capable of detecting depths up to 40 m. The error in the vertical direction is less than 0.15 m, and the point spacing between the sounding data is between 3 and 15 m [30]. It is a relatively mature airborne laser bathymetry system. These data are used to validate the accuracy of denoising and SDB on Waimanalo. 2) Separating Photons of Sea Surface and Seafloor: Due to the significant difference in photon density distribution between the sea surface and seafloor, the majority of photons are concentrated at the sea surface. To differentiate between the sea surface and seafloor photons, a Gaussian function is employed. However, fitting a Gaussian function to all photons may result in more than two peaks on the elevation histogram. To address this issue, this study uses the elevation histogram to determine the sea surface height and assumes that elevation fluctuation does not exceed 5 m, restricting the photon range to within this limit [31]. Photons with elevations within the range of (μ − 3σ, μ + 3σ) are classified as ocean surface, whereas photons with elevations below μ − 3σ are considered sea surface photons. The one-dimensional (1-D) Gaussian function used is as follows: where α is the amplitude, μ is the expectations, and σ is the standard deviation of the crest.
3) Level-2 Denoising: In Level-2 denoising, accurate bathymetric photons for the extracted seafloor photons are obtained using a median filtering algorithm or grid-based statistical denoising. Here, we adopt the artificial identification method, select median filtering for the sparse distribution of noise points, and select the method based on grid statistics for the relatively dense distribution of noise points. In order to compare the applicable objects of the two algorithms more intuitively, this study adopted the two algorithms for comparative analysis of 20190613 GT1L, 20190524 GT2L, and 20190524GT3L, respectively. The result of the best denoising effect is used as the input data for the bathymetric inversion.
a) Median filtering: The fundamental idea behind median filtering is to swap out a photon's value in a digital sequence for the median of the photon values nearby so that the surrounding elevation levels are closer to the real value and isolated noisy photons are removed [28].
Let a 1-D sequence f 1 , f 2 , . . . , f n take the window length m (m is an odd number). Then, perform median filtering on it (Med means the median value of the 1-D sequence), i.e., remove m numbers successively from the input sequence and take the median value of the data as the filtered output using the following formula: The larger the window value is, the smoother the seafloor water photon distribution obtained. However, the amount of data will be reduced accordingly. For 20190609 GT1L/R and 20190613 GT1L with validation data, we find the window value of the optimal denoising results in the range from 1 to 100 in turn. For 20190524 GT2L and 20190524 GT3L without validation data, we use the empirical method to determine the optimal window value. b) Grid-based statistical denoising: Because some data contain a large amount of noise, bathymetric photons cannot be accurately obtained by median filtering. Therefore, a grid-based statistical denoising method is proposed.
The method involves several basic steps, which are given as follows: 1) plotting the distribution of seafloor photons, with the Along-Track parameter as the horizontal coordinate and the Height parameter as the vertical coordinate, 2) dividing the distribution into an n * m grid, 3) counting the number of photons in each grid, 4) retaining the grid with the highest number of photons in each column by comparing the photon counts 4) Refraction Correction and Tidal Correction: The h_ph parameter of ICESat-2 ATL03 data is calculated by only considering the laser propagation in a single air medium. To obtain the true height, refraction correction for seafloor photons is required [32]. Fig. 4 shows the geometric relationship of a single beam refraction correction in air and water media, where θ 1 and θ 2 are the angle of incidence and angle of refraction, respectively; n 1 and n 2 are the refractive indices of air and water, which are 1.00 and 1.34, respectively; A and B are the actual propagation distance of the laser underwater when no refraction is emitted and the actual propagation distance of the laser when refraction is emitted, respectively; H is the observation depth of the laser underwater; ref_elev is one of the parameters of ATL03 data; θ 1 = π 2 − ref_elev; and β = γ − α. According to Snell's Law In the triangle HAD, we obtain In triangle ABC, using the sine theorem and the cosine theorem, we obtain C and α such that Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Finally Tidal correction is performed by subtracting the tidal data at the time of photon acquisition from the bathymetric value of the bathymetric photon and adding the tidal data at the time of Sentinel-2 imaging.

A. ML Models
The models used in this article are as follows, and all are performed on Python 3.9.7.
1) Linear Regression Model: Linear regression (LR) is a model for prediction that is based on the linear combination of independent variables. In SDB, there are multiple independent variables, so the LR method is used. This method is easy to understand and apply, and the weights can visually express the importance of each band where y is the dependent variable, x 1 , x 2 , …, x n are the independent variables (model inputs), i.e., the radiometric brightness value, θ 1 , θ 2 , …, θ n are the regression coefficients of the independent variables x i and b is the intercept.
It uses a histogram-based partitioning algorithm instead of the traditional presorted traversal algorithm, which has faster parallel training efficiency, lower memory consumption, and higher accuracy. In addition, it has the advantages of supporting various distributions, handling massive volumes of data more adaptively, and effectively preventing overfitting. LightGBM not only increases the accuracy of the prediction but also greatly accelerates the prediction speed and reduces memory utilization [34].
3) Categorical Boosting Model: Yandex proposed categorical boosting (CatBoost), an innovative gradient boosting technology [35], [36]. This method has undergone significant parallelization enhancements, allowing the layout to be finished more quickly and more easily on internet networks [34].
CatBoost is an improved algorithm in the framework of the GBDT algorithm, which has the advantage of overcoming the gradient bias and effectively solving the problem of prediction bias, improving the accuracy of the algorithm, enhancing the generalization ability, and preventing the occurrence of overfitting.
In this study, the reflectance data from all bands of the multispectral images are used as input for each model, whereas the output of the model is the extracted bathymetric data from accurate bathymetric photons; 80% of the dataset is used as the training set. The accuracy of the SDB is verified using airborne bathymetric LiDAR data for the Waimanalo area. For Antelope Reef data, where no in situ bathymetry data are available, the remaining 20% of the ICESat-2 ATL03 data are utilized for verification.

B. Accuracy Evaluation Methods
The denoising process is evaluated using R 2 (coefficient of determination), root mean square error (RMSE), mean absolute error (MAE), and mean relative error (MRE). The RMSE, MAE, and MRE are used to evaluate the models where i is the ith bathymetric photons, n is the number of bathymetric photons used to validate the models, h i is the bathymetric value of validation photons, andĥ i is the estimated depth.

A. Accurately Obtaining Bathymetric Photons From ICESat-2 ATL03 Data
The algorithm described in this article is used to accurately extract bathymetric photons from data in two regions, Waimanalo and Antelope Reef, and the trajectories of the study are shown in Figs. 2 and 3. For the three laser beams (20190609 GT1L, 20190609 GT1R, and 20190613 GT1L) focused on Waimanalo, the process proposed in this article is used to filter the ATL03 data, and the feasibility of the algorithm is evaluated using the airborne bathymetric LiDAR data. Due to the lack of in situ bathymetry data on the Antelope Reef (20190524 GT2L and 20190524 GT3L), this study only analyses the denoising effect of two algorithms from a visual perspective.
Because some ATL03 data have a wide range of noise points, obvious noise points can be removed by Level-1 denoising. Figs. 5-9 show the results after Level-1 denoising, and the denoising effect is very clear. Subsequently, the elevation histogram was utilized to determine the height of the sea surface for each dataset, which is represented by the solid red line in Figs. 5(b) and 9(b). To distinguish between the photons at the sea surface and those at the seafloor, a Gaussian function was used, with photons falling within the solid red line being considered Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  For data from the 20190609 GT1L and 20190609 GT1R laser beams, median filtering is used for Level-2 denoising. In Figs. 5(d) and 6(d), the denoising results are the red points, and the refraction corrected photons are the magenta points.
In Figs. 8 and 9, d1 is the processing result based on median filtering, and d2 is the processing result based on grid-based statistical denoising. For the 20190613 GT1L data, the median filtering algorithm was not effective in accurately extracting bathymetric photons due to significant noise photons present in Fig. 7(d1), which could not be removed. Therefore, the gridbased statistical denoising algorithm proposed in this study was employed for filtering, and the results are presented in Fig. 7(d2). The grid statistics method, which involves dividing the grid of 17×10 (empirically specified), can retain signal photons and eliminate noisy photons to a significant extent, as compared to the median filtering algorithm. As shown in Table II, R 2 , RMSE, MAE, and MRE improved by 6.44%, 2.21 m, 0.94 m, and 2.75%, respectively.
The results from two laser beams (20190524 GT2L and 20190524 GT3L) in the Antelope Reef area are shown in Figs. 8 and 9. The two data were divided into 100×10 and 150×10 grids [see Figs. 8(c) and 9(c)], respectively, but some noise cannot be effectively removed [see Figs. 8(d2) and 9(d2)], and signal photons are missed; the denoising effect is not as good as the result of median filtering [Figs. 8(d1) and 9(d1)]. Therefore, we still use the accurate bathymetric photon data obtained by median filtering when performing the SDB of Antelope Reef.
The extracted seafloor photons based on the suitable algorithm are significantly correlated with airborne bathymetric LiDAR data. Compared with in situ bathymetric data, as shown in  Table II, the R 2 is greater than 99.74%, the RMSE is less than 0.75 m, the MAE is less than 0.50 m, and the MRE is less than 16.85%.

B. Accuracy of Different ML Models
Using accurate bathymetry photons from ICESat-2 ATL03 data and Sentinel-2 multispectral imagery, three ML methods (i.e., LR model, LightGBM model, and CatBoost model) are constructed for three dates in two study areas, Waimanalo and Antelope Reef. To assess the inversion effect, three quality indicators are used: RMSE, MAE, and MRE.
Figs. 10-14 show the error scatter diagrams of retrieved water depth and ICESat-2 ATL03 data at two areas on three dates. The black dashed line is the 1:1 line, whereas the red line corresponds to the regression line. N is the number of data used to validate the models. The higher the inversion effect is, the more scatters converge to the 1:1 line and the better the model fitting. The worse the inversion effect is, the more divergent the scatters are from the trend line. As seen from the figures, it can be seen that the degree of regression is lower using the LR model. The error scatter diagrams also show that the LR model has a poorer effect in deep water. Concentrated scatter distributions are observed when using the LightGBM and CatBoost models, and the trend lines are approximate to be 1:1 line, thus an ideal inversion result is obtained when using these models.
Figs. [15][16][17][18][19] show the shallow water bathymetric maps of Waimanalo and Antelope Reef using multidate satellite images and ICESat-2 ATL03 data obtained by inversion using the LR, LightGBM, and CatBoost ML methods. In Figs. 15-19, panels (a), (e), and (i) show true color images of different dates in each study area. In the Waimanalo study area, it can be seen from the Sentinel-2 true color images that the water depth near the shore on the left side is shallow, the bottom can be seen, and the water depth on the right side is deeper. For the Sentinel-2 images of the Antelope Reef study area, the water depth in the surrounding area is shallow, and the water depth in the middle is deeper. Figs. 18 and 19 show the bathymetric maps of Antelope Reef obtained after inversion, and the water depth distribution is approximately 6 m. Due to the low accuracy of the LR model, the bathymetric map obtained based on the LR model has a large deviation from those obtained from the LightGBM and CatBoost models.
Tables III and IV show the accuracy index results of bathymetric inversion for the Waimanalo and Antelope Reef data. From an overall perspective, significantly improved accuracy is obtained using the LightGBM and CatBoost models compared with using the LR ML model. For the Waimanalo data, the mean RMSE, MAE, and MRE values of the inversion results using the LightGBM and CatBoost models are 0.81 m, 0.605 m, and 18.095%, respectively. For the Antelope Reef data, the average RMSE, MAE, and MRE values of the inversion results using

C. Comparison of Strong and Weak Laser Beams
The 20190609GT1L and 20190619GT1R data in the Waimanalo study area, which correspond to the pair's strong laser beam and weak laser beam, respectively, have an energy ratio of approximately 4:1. The number of photons also differs greatly. The number of strong laser beam photons is larger than the number of weak laser beam photons after denoising. Comparing the inversion results of the strong and weak beams with respect to the three indicators (see Figs. 10 and 11), better performance is obtained using the strong beam than the weak beam.

A. ATL03 Data Processing Procedure
As seen from Figs. 5-7, when processing ICESat-2 ATL03 data, the elevation distribution histogram is first used to remove a large number of outliers that obviously do not conform to the elevation of the study area. For the extraction of sea surface photons, the Gaussian function adopted in this article was used to achieve the effect of sea surface extraction simply and accurately. The method of manual recognition is used to determine whether Level-2 denoising is performed by using a median filter or grid-based statistical denoising. For the laser beam in Figs. 5 and 6, a remarkable filtering effect can be obtained by using the median filter algorithm. However, for the photons shown in Fig. 7, since the data collected during the day (16:32:47 local time) are greatly affected by the solar background, it is difficult to distinguish signal photons from noise photons. Thus, the grid-based statistical denoising proposed was adopted for denoising, and the accuracy is 99%. Although not all noises were removed, a relatively obvious denoising effect was observed and the accuracy has been greatly improved. For data whose noise photons and signal photons are uniformly distributed and cannot be accurately denoised using median filtering, the grid-based statistical method can effectively achieve the denoising effect.
Based on the above-mentioned results, we can prove the effectiveness and feasibility of the proposed method. Therefore, the data collected from two laser beams (20190524 GT2L and 20190524 GT3L) on Antelope Reef are denoised using two methods. By trying different grid divisions, the algorithm can achieve a good denoising effect under a suitable grid. At present, the algorithm has some limitations. The divided grid size is consistent, and it is not possible to adjust the grid size         for different data. For example, obvious signal photons were missing in Fig. 8(d2), which is the area where the algorithm needs to be improved in the future. However, on the whole, the method based on grid-based statistical denoising is comparable to the median filtering, which is a wide range of algorithms, and its applicable data are more extensive, with strong universality.
Consequently, in both study areas, the denoising process proposed has excellent performance in terms of accurately obtaining bathymetric photons from daytime and nighttime ATL03 data, which is crucial for using ICESat-2 ATL03 data for SDB.

B. SDB With ICESat-2 Photons
To compare and analyze the accuracy of the three models, scatter plots of the errors of the three models for the two study areas and three dates are created, and the RMSE, MAE, and MRE values were calculated. For the three multispectral images used in the same study area, the depth inversion results were affected and biased to some extent due to the different imaging moments of Sentinel-2, which are partially affected by clouds and shadows. In analyzing the inversion results for the three dates, the errors on 16 April 2019 are relatively large at Waimanalo, which may be caused by several obvious clouds in the image and errors in tidal correction [25]. However, the results are generally consistent, mainly because the study areas are more remote and less affected by human activities, and the selected images are almost cloud-free. Therefore, there is no effect on the inversion method, and it is more reliable to take the average value as the accuracy analysis of the bathymetric inversion methods. Overall, the RMSEs of both the LightGBM and CatBoost models are less than 1.12 m, the MAEs are less than 0.82 m, and the MREs are less than 27.01%. The RMSE, MAE, and MRE values of the LR model are less than 3.42 m, 2.90 m, and 93.10%, respectively. Very significant inversion effects are observed using the LightGBM and CatBoost models compared with those observed using the LR model. It can be concluded that the nonlinear ML method can better handle SDB in complex and diverse environments compared to the linear ML method. Fig. 10-14 are used to more directly determine the inversion effects of these three methods, and the error scatter plots also show that the LR model has difficulty accurately describing the water depth information above 20 m in the Waimanalo area. The scatter distributions of the LightGBM and CatBoost models are around the reference line, and the inversion results are more satisfactory. In addition, these models have a greater advantage in solving the bathymetric inversion problem because of the   can be concluded that the bathymetric maps obtained based on the LR model have large deviations from those obtained using the LightGBM and CatBoost models due to the low accuracy of the LR model. The bathymetric maps based on the Light-GBM and CatBoost models obtained in this study are generally consistent with the actual topography. Both the LightGBM and CatBoost models are based on the boosting algorithm, in which a weak stem of weak learners is trained with a certain combination strategy, resulting in a strong learner. LightGBM uses a leafwise algorithm with a depth limit to reduce error; thus, better accuracy can be obtained using the same number of splits. In CatBoost, the need for extensive hyperparameter optimization is reduced, which subsequently reduces the probability of overfitting. Therefore, higher inversion accuracy can be obtained by applying these two models to bathymetric inversion problems.
Based on the results of analyzing the strong and weak beams, it can be concluded that the inversion effect is better if there are more data points simulated in the same study area when the depth distribution is approximately the same. When establishing the bathymetric inversion method, more samples of a priori bathymetric data are beneficial to obtain a better training model. Thus, a strong laser beam is used for the actual bathymetric data in the Antelope Reef study area.

VI. CONCLUSION
In this study, we used ICESat-2 ATL03 data and Sentinel-2 multispectral images to build ML methods to obtain accurate bathymetric maps. This approach could be used to overcome the lack of in situ bathymetric data. Meanwhile, we developed a high-precision denoising process with a correlation of more than 99% compared to airborne bathymetric LiDAR data, as well as an accuracy improvement of 6.44%, 2.21 m, 0.94 m, and 2.75% for R 2 , RMSE, MAE, and MRE, respectively, compared to the results of median filtering. Three ML methods were constructed using the accurate bathymetric photon data obtained by this process as bathymetric data and Sentinel-2 images, and the accuracy of SDB was verified by airborne bathymetric LiDAR data on Waimanalo. In each study area, LightGBM and CatBoost ML methods took full advantage of their ability to solve nonlinear and highly complex problems, resulting in much higher accuracy values, with mean RMSE, MAE, and MRE values of less than 0.91 m, 0.66 m, and 23.17%, respectively.
Comparing our research to the previous studies mentioned above, it can be noted that the biggest improvement is the use of ICESat-2 ATL03 data and the ability to accurately obtain bathymetric photons without many parameters according to the denoising process proposed in this article. Thus, SDB is no longer limited by a lack of in situ bathymetric data. The RMSE obtained by the ML model and verified with the measured water depth data is between 0.49 and 1.12 m. Our results show that the accurate bathymetric photons obtained by using the denoising process proposed in this article are fully capable of replacing the in situ bathymetric data for SDB, and it is very significant to select suitable, high-accuracy, and nonlinear ML methods for SDB. The insights gained from this study may be of assistance to map large areas and perform high-accuracy bathymetry in remote areas. In future research, we will continue to develop automatic extraction methods for signal photons.