The Spatio-temporal Reconstruction of Lake Water Levels Using Deep Learning Models: A Case Study on Altai Mountains

— Monitoring of lake water level (WL) changes with satellite altimeters is pivotal in assessing the dynamics of hydrological ecosystems. However, the spatial coverage, temporal interval, and quality degradation of altimeter data limit the continuity of the measurements. In this study, a learning-based framework is proposed for the reconstruction of WLs for inland lakes and reservoirs. This is achieved by learning the relationship between lake WLs and the related hydrological and climate variables employing the deep learning models. By introducing hydrological knowledge into the data-driven learning framework, the lakes are firstly clustered into several groups for training and prediction considering the spatial homogeneity and heterogeneity of water cycling process among multiple lakes. Secondly, for each cluster category, the number of WL training samples is augmented using the empirical function fitted with lake level-area pairs, and the augmented samples are used in the pretraining process to improve the accuracy of the deep learning model simulation. The obtained models are used for estimating the missing WLs and to construct a consecutive 192-month WL dataset (2003-2018) for the 14 lakes (>20 km 2 ) in the Altai Mountains. The typical multiple layer perceptron and deep belief network models are tested. Validation indicates that the proposed method performs well in WL reconstruction in the case of a large proportion of missing data. Moreover, the performance of learning-based models can be effectively improved by introducing the idea of spatial clustering and pretraining. The comparative tests also show that the proposed method outperforms the traditional level-area fitting methods.


I. INTRODUCTION
HE ongoing global warming trend and intensified human activities have greatly influenced the dynamics of inland lakes, and have brought severe challenges to monitoring global and regional hydrological and ecological changes [1]. As one of the key physical parameters of lake hydrology, the variation of lake water levels (WLs) can directly *  reflect the change of a hydrological ecosystem [2]. Therefore, the dynamic monitoring of lake WLs is of great significance for the assessment of the regional response of lakes to climate change, as well as the utilization and protection of water resources.
Lake WLs are usually measured by in-situ hydrological stations and altimetry satellites. However, hydrological stations are generally sparsely distributed, especially for remote mountainous areas, and are thus unable to meet the demand of dense monitoring of WL fluctuations. Since the late 1980s, the development of remote sensing altimetry techniques and the increased availability of satellite altimeter data have provided us with an effective way to conduct large-scale and highaccuracy monitoring of lake WL changes [3], [4].
Satellite altimeters record the range between the nadir point and sensor by analyzing the waveforms of reflected electromagnetic pulses. According to the working wavebands, altimeter systems can be categorized into two kinds, i.e., laser altimeters and radar altimeters [5]. The radar altimeters carried on board satellites use microwave pulses (e.g., TOPEX/Poseidon, CryoSat-1/2, ENVISAT, ERS-1/2, Jason-1/2/3), and can thus receive returned echoes in all weather conditions, regardless of the cloud cover [6]- [8]. However, the footprint sizes of radar altimetry data are generally too coarse to monitor most of the medium-and small-size lakes [9]. For the satellite laser systems (e.g., the Geoscience Laser Altimeter System (GLAS) on the Ice Cloud, and land Elevation Satellite (ICESat-1)), the altimeters use pulses in the visible and nearinfrared wavelengths to obtain surface elevation with a submeter level accuracy [10], [11]. Although the GLAS data have a finer spatial resolution (smaller footprints and denser points along the tracks) and have been reported to have a higher vertical accuracy compared with radar altimetry measurements, the temporal coverage is limited due to the long repeat cycle (~90 days) and invalid data caused by cloud contamination [12]. Moreover, the GLAS sensor stopped operating in 2009, and 2 thus the data only cover from 2003 to 2009. In September 2018, the subsequent ICESat-2 mission was launched and began to acquire data points with denser tracks [13]. Due to the sparse track distribution and temporal intervals of the pulse points, some researchers have tried to integrate multiple altimetric data sources to extend the spatial and temporal coverage of the survey points [14], [15]. However, the spatial coverage, temporal interval, and quality degradation of altimeter data still limit the continuity of the measurements [16], [17]. Therefore, the reconstruction of lake WLs is necessary for fine-scale monitoring and the evaluation of water resource variations.
One of the common ways to estimate missing WL values is to establish the statistical relationship between the extracted surface area and measured WL of each lake, and to reconstruct the time-series WL data depending on the corresponding water area derived from remote sensing images. This empirical method has been proved to be effective in previous works, where linear and polynomial curve fitting methods have been widely used in estimating WLs based on matched lake area data [18]- [20]. However, the level-area (L-A) fitting approach assumes that the terrain around the lakes is relatively stable, and the performance varies for different lakes. Moreover, the missing WLs can only be reconstructed with known water area values. The water surface area is generally extracted from optical remote sensing images, which are also easily affected by rainy and cloudy weather conditions.
The essence of spatio-temporal reconstruction is to estimate the unknown value of a target object or geographical variable according to the auxiliary attributes. Considering the spatiotemporal correlation among samples, geostatistical methods are commonly used to predict the sampling values to be estimated, e.g., univariate and multivariate interpolation methods [21], [22] and Bayesian geostatistical models [23], which are widely used in geospatial fields. However, these strategies often fail to fully mine the complex nonlinear correlations between the data sources across the spatial and temporal dimensions in the modeling process. As a result, it is difficult to achieve highprecision estimation in cases with large temporal gaps or distinctive spatial heterogeneity. In recent years, machine learning methods has been widely utilized in solving hydrological problems, including the prediction of river flow, lake WLs, and water quality factors [24]- [27]. Artificial neural networks (ANNs) are one of the most popular models, and have the potential to learn the complex nonlinear relationships between multi-dimensional data in a nonparametric manner. The machine learning methods have shown great advantages and potential in estimating WLs, by taking the related geographical and environmental variables into account. However, the current machine learning-based methods still have some limitations. Firstly, most of the current methods for WL prediction train the model for a single lake, and use the temporally dense in-situ WL data for the training process [25], [28]. For basin-scale lake level monitoring with satellite altimetry data, there are generally larger temporal gaps among WL measurements. Moreover, the homogeneous and heterogenous fluctuation states among multiple lakes related to recharge sources, climate patterns, and human activities need to be considered [29]. This situation has increased the difficulty of obtaining accurate WL estimation results.
Deep learning, which is a powerful tool to better represent the nonlinear numerical relationship from raw data with more hidden layers, may be a potential way to address the problem [14], [22], [30], [31]. For example, Zhu et al. tested a recurrent neural network (RNN) to predict the monthly lake WL of 69 temperate lakes in Poland and obtained good estimation results [31]. Wen et al. used a deep neural network (DNN) to simulate and predict the lake level dynamics on a 2-hour time scale, and the results showed that the performance of the deep learning model is beneficial for exploring the mechanism of lake level dynamics [22]. However, the prediction model was still trained separately for each lake in this study, and the potential spatial correlation was ignored. In addition, the long short-term memory (LSTM) neural network is a popular framework for lake WL prediction, which captures the short-and long-term dependencies and learns the sequential relationships [32]- [34]. Nevertheless, this model requires temporally dense sequential data for the training process. On the whole, the deep learning models generally require sufficient training samples to learn the numerical relationship between the input and target variables. In the case of large spatial and temporal gaps, the amount of satellite altimetry data may be insufficient to train a wellestablished deep learning model.
To address the above issues, this study was aimed at introducing deep learning techniques to the reconstruction of lake WLs. The Altai Mountains region was chosen as the study area, where the abundant lakes play a significant role in freshwater supply and are sensitive to regional climate variations. The overall objective of this study was to develop a deep learning-based method to obtain the spatial-temporal complex relationship between lake WLs and the auxiliary components related to terrestrial water storage changes, and to then reconstruct a region-scale monthly WL dataset. The core idea behind this study was to integrate the hydrological knowledge into the data-driven learning framework, including the lake level-area correlation, the spatial homogeneity and heterogeneity of water cycling process among multiple lakes within basin-scale. Specifically, the main contributions include: 1) The lakes were clustered into several groups for training and prediction, based on the different climate conditions and recharge sources. This is different with the single-lake models established in the previous works, where similar or different hydrological changing patterns among lakes could be explored.
2) For each cluster category, the amount of WL data for the model training was augmented using the empirical function fitted with lake level-area pairs. Considering the uncertainties of the fitted WLs, a two-step training strategy was proposed, where the augmented samples were used in the pretraining followed with a fine-tune process to improve the accuracy of the learning model simulation.
3) The deep learning models were trained in the proposed manner of two-step training, and constructed the consecutive monthly WL dataset using the estimated missing WLs for the 14 lakes (>20 km 2 ) in the Altai Mountains from 2003 to 2018.
To clarify the influence of model structure, the typical multiple layer perception (MLP) and deep belief network (DBN) models were involved in the tests. The evaluation of the estimated WL values was conducted using the random testing set and the independent HydroWeb datasets. Finally, the finescale spatial and temporal variation of lake WL change was analyzed based on the reconstructed WL dataset.

A. Study Area
The Altai Mountains (Fig. 1) are located in the middle of Europe-Asia continent, spanning four countries, i.e., China, Kazakhstan, Russia, and Mongolia. The altitude in this region varies from 2000 m in the northwest to over 4500 m in the central area. In the central high-mountain area, there are largearea glaciers with a total area of ~1191 km 2 , which have shown a rapid retreating trend over the past decades [35]. As a result, glacier and snow meltwater is the major water supply for the lakes distributed around this region [36]. The precipitation decreases from northwest to southeast, with an annual average of ~250 mm. Moreover, the precipitation increases with altitude, due to the topography effect [37]. Overall, the Altai Mountains region is widely regarded as a vast transnational "water tower" with a complex distribution of lakes, which is dependent on precipitation, glacial meltwater, and upstream river supply [38]. The scale of human activities in the region is limited, due to the harsh natural environment; however, the increase of land development and industrial water consumption is having an impact on the regional ecosystem and water resource changes [39].
There are more than 40 lakes over 5 km 2 in Altai Mountains region, with a total area of more than 9700 km 2 . In this work, we selected 14 typical lakes (Table Ⅰ) larger than 20 km 2 , whose total area represents over 90% of the overall lake area in this region.

B. Optical Satellite Images
The Landsat Thematic Mapper (TM)/Enhanced Thematic Mapper Plus (ETM)+/Operational Land Imager (OLI) surface reflectance collection image series were used to extract the lake extent of the study area at a monthly scale from 2003 to 2018. The satellite revisit cycle is 16 days, and the spatial resolution of the visible and near-infrared bands is 30 m. To ensure highquality extracted lake area data, 586 scenes were selected with no cloud cover over the lake areas of interest ( Table Ⅱ). The months without a valid cloud-free image were masked as missing observations, as shown in Fig. 2. The water area was extracted based on the automated water extraction index (AWEI) [40]. Through a literature study and experimentation, 0 was used as the threshold for AWEI values to distinguish water and non-water [41].
In this work, the data collection and image processing were implemented on the Google Earth Engine (GEE) platform. Firstly, the Landsat surface reflectance images were selected considering the spatial coverage, temporal resolution, and cloud cover. Since the failure of the ETM+ airborne scan line corrector (SLC) resulted in the loss of data strips after 2003 (Landsat 7 ETM+ SLC-off), the "Gapfill" function was employed to fill the gaps within the data tiles. After water area extraction using the AWEI index, the results were visually examined and the misclassification was manually edited. Thus, the influence of hill shade and glaciers on the extraction of the lake boundaries could be partly eliminated. The paired lake area and the WL data will be used to obtain the regression relationship between the two variables, and the L-A relationship was then used to obtain an initial estimation of WL from the known lake area information for the model construction.

C. Satellite Altimetry Data
The altimetry data for the monthly lake WLs were obtained from the ICESat-1 Level 2 Global Land Surface Altimetry (GLAH14) dataset (2003)(2004)(2005)(2006)(2007)(2008)(2009) [43]. Detailed descriptions of the two altimeters are provided in Table Ⅲ. We integrated the WL data for the 14 lakes obtained by the ICESat-1 and CryoSat-2 altimetric missions to derive time-series WL data for 2003-2009 and 2010-2018, respectively. The spatial and temporal coverage of the initial data extraction is shown in Fig.  2. Due to the low temporal resolution and the exclusion of invalid data points, there are large temporal gaps in the time series. It is worth noting that only ICESat-1 and CryoSat-2 data were incorporated in this study to provide the initial data sequence. The reason for this is that the incorporation of other satellite observations would only slightly extend the overall data coverage. Given that the proposed method is designed to tackle data reconstruction in the case of large spatial and temporal gaps, the extra satellite data were used in the validation stage.
The lake WLs correspond to the elevation of earth surface regarding to the reference ellipsoid measured with satellite altimetry [44]. To maintain consistency with the CryoSat-2 data, the datum of the ICESat-1 GLAS data was converted from the T/P reference ellipsoid to the WGS84 ellipsoid. The ground height observed by the altimeter was then converted to the EGM96 geoid height to obtain the monthly variation of lake WLs [45]. Based on the extracted water surface extent, the temporally matched altimetry points within each lake were segmented. The outliers contaminated with gross errors were then excluded using the three-sigma criteria, and the remaining data values were averaged to represent the monthly WL.

D. Auxiliary Meteorological Data
The Global Land Data Assimilation System (GLDAS) merges satellite-and ground-based observational data to generate optimal fields of land surface states and fluxes, by using advanced land surface modeling and data simulation technologies. There are simulation products of different land surface models, e.g., the NOAH, Mosaic, CLM, and VIC land surface models [46]. Previous studies have shown that the main natural variables affecting lake WL changes are temperature (T), precipitation (P), evapotranspiration (E), surface runoff (Q S ), and underground runoff (Q U ) [29], [47], [48]. Thus, these five meteorological and hydrological parameters were considered in this study for simulating the water cycle process. These parameters were all derived from the GLDAS_NOAH025_M version v2.1 dataset, with a spatial resolution of 0.25 × 0.25 degrees and a temporal resolution of one month [49]. The variables were used to construct the training set as the input auxiliary information.

E. HydroWeb
To evaluate the performance of the reconstruction model, the validation was conducted from two aspects. Except for the validation conducted using the testing set randomly selected from the altimetry measurements of the ICESat-1/GLAS and CryoSat-2 SIRAL altimeters, independent validation was also conducted using the HydroWeb datasets. It is a database established by Laboratoire d 'etudes en Geophysique et Oceanographie Spatiales (LEGOS) [50]. The HydroWeb database provides monthly lake WL and storage data derived from multi-source altimetry data, including ERS-1, T/P, GEOSAT Follow-On (GFO), ERS-2, Jason-1/2, and Envisat data, for over 150 large lakes around the world, starting from 1995. In this paper, HydroWeb water level data cover five lakes including Ulungu, Hala, Dorgon, Uvs and Hyargas in the Altai Study Area.

III. METHODS
The overall flowchart of the proposed WL reconstruction method is shown in Fig. 3, which is composed of four major procedures: 1) Firstly, the data for multiple lakes were clustered based on the different climate conditions, recharge sources, and change patterns.
2) For each lake cluster, the PLSR algorithm was used to select the optimal regional meteorological variables to construct the input set for the model training and testing.
3) The L-A fitted empirical model was used to expand the training samples for each lake, and a two-step strategy was used to train the reconstruction model for each lake category. 4) Based on the well-trained model, the consecutive monthly WLs in the Altai Mountains region from 2003 to 2018 were then reconstructed.

A. Lake Clustering
The lake fluctuations result from the combined influence of climate variations, hydrological conditions, and human activities. Generally speaking, lakes distributed in the same climate zone and hydrologic catchment could show similar behaviors in storage dynamics. Correspondingly, the WL changes of lakes with different recharge sources and hydrological conditions can differ significantly [52]. In this study, the lakes were clustered into several groups for the training and prediction. In this way, the spatial homogeneity and heterogeneity of the hydrologic cycling conditions among the multiple lakes could be considered, and training on clustered multiple lakes could generate more WL samples than single-lake modeling. Following the work of Song et al. [20], we used the cosine similarity index (CSI), which calculates the cosine value of the angle between two vectors in the vector space, to identify the similarity between two lakes' hydrological conditions: where and are the vectors composed of the monthly hydrological and meteorological factors (i.e., T, P, E, Q S , and Q U , as introduced in Section Ⅱ) related to any two lakes for 2003-2018. The results range from −1 to 1, with 1 meaning the same, and 0 indicating no correlation between the two lakes. The CSI values were calculated for each pairing among the 14 lakes, and we obtained 91 non-repetitive values. For the initial cluster centers corresponding to the 14 lakes, the maximum similarity was used to merge the two closest clusters into a new category. The cluster centers were iteratively updated with merging process, and the iterations were stopped when all the lakes were grouped into groups. In this study, the cluster category was set as 4 with a preliminary judgement of the 14 lakes ( Fig. 1 and Table Ⅰ) in terms of spatial distribution, as well as climate and hydrological conditions.
The initial clustering results, as shown in Table Ⅳ, are based on the similarity between the hydrological and meteorological forcing data related to lake variation. However, the response mechanism of the lake WL changes to the environmental variables varies with the lake type, basin landform, and interactive effects. Therefore, lakes with similar environmental conditions may have different pivotal factors reflecting WL changes. Based on these facts, the initial lake clustering results were adjusted with a further consideration of their geospatial relationship, as well as the recharge and discharge conditions. The final clustering results are shown in Table Ⅳ and Fig. 4. Category 1 indicates the plain terminal lakes distributed in the southwestern arid basin of the Altai Mountains. Compared with the initial clusters, Achit Lake, which is located in the humid area and is mainly fed by a glacial river, was adjusted to Category 2. Category 2 refers to the set of lakes distributed in the humid climate zone around the mountain range being supplied from glaciers and rivers, with surrounding altitudes of 1410-1450 m. As an exoreic lake in the arid area of the Mongolian Plateau, and closely connected with Hala Lake, the response mechanism of Dorgon Lake should be more similar to that of the Category 4 lakes, which are far away from the glaciers. The six lakes belonging to Category 3 in the initial clusters were further divided into two types. One category was composed of the small lakes in the middle of the high-altitude Altai Mountains region that are mainly fed by meltwater from neighboring glaciers. The other two large alpine closed lakes located in the northwestern Mongolian Plateau, i.e., Uvs and Hyargas, are primarily affected by evapotranspiration and precipitation, and have shown a significant decreasing trend for water storage in the past decades [39]. These lakes were allocated into a separate category, referring to Category 5.

B. Environmental Variable Selection
The PLSR method was used to quantitatively evaluate the contribution degree of the environmental variables to WL changes [53]. The WL data and the environmental variables (i.e., T, P, E, Q S , and Q U ) can be organized as follows: where is the input matrix composed of the five time-series meteorological factors (i.e., = 5 ) , and is the number of samples corresponding to the effective WL observations. indicates the output WLs. Thus, and 1 denote the elements in the ℎ row and ℎ column of and , respectively. With principal component analysis and residual updating, the PLSR equation for the dependent variables can be correspondingly constructed as: where ̃( ∈ [1, ]) means the normalized vector of ℎ environmental variable, and 0 is the normalized WL dataset. For more details about the PLSR algorithm, we refer the reader to the Appendix. In our work, the regression coefficient ( ∈ [1, ]) between the normalized dependent variable and independent variables was used as the contribution degree of each variable in explaining the WL changes. The higher the coefficient, the greater role of the independent variable in reflecting the variation of the dependent variable. In other words, the lake WL dynamics have a stronger response to the change of the specific environmental forces with larger coefficients, which were selected to construct the optimal subset of forcing factors for the model input.

C. Model Construction 1) Sample Augmentation
As shown in Fig. 2, a large proportion of the WL data is missing, and few data points are available for most of the lakes before 2010. The inadequate number of training data samples and their uneven distribution would degrade the model performance. To extend the spatial and temporal data coverage, we proposed to incorporate the estimated WLs inferred from the lake area in the case of missing altimetry information. In previous works, linear or polynomial curve fitting has been commonly used to establish the relationship between the timeseries WLs and the temporally matched lake areas, and positive results have been obtained [19], [20]. Therefore, the empirical L-A statistical relationship in the Altai Mountains region was constructed using the paired monthly area and WL observations (the red grid cells in Fig. 2) for each lake. In our study, a linear function was employed [54],which can be expressed as: where and respectively indicate the measured area and WL at the ℎ month, while and are the regression parameters. The parameters can be solved by minimizing the residual sum of squares by the use of the ordinary least squares regression algorithm. The WLs can then be estimated using the established relationship with known lake areas.

2) MLP and DBN
Over the past few decades, ANN models have become popular in various geoscience-related fields for classification or prediction problems in a supervised manner [55], [56]. An ANN model is generally composed of an input layer, several hidden layers, and an output layer, where each layer is composed of several neuron units. The nonlinear relationship between the input-output variables can be learned during the training process through the multiple layers in the neural network. With the rapid development of computational power, the potential of ANNs has been improved with deeper and distinguished model structures, where MLP, DBN, and RNN are the popular networks. Among them, the MLP is a representative structure of DNN, with cross-layer fully connected neurons [57]. Through the multiple layers, the neurons receive signals from the previous layer by weighted connections. The weighted inputs are added up and produce the output of the current layer through a nonlinear transfer function, and are then input into the next layer until the output layer is reached. The model optimization is usually solved using the backpropagation (BP) method, with the output errors propagated backward through the network layers and the weights fine-tuned to reduce the summed error loss.
The DBN model proposed by Hinton et al. with greedy layerwise pretraining, was another type of the most widely used deep learning models [58]. The DBN model is composed of several restricted Boltzmann machine (RBM) layers and one BP layer, as shown in Fig. 5. Each RBM is formed by a visible layer and a hidden layer, in which the hidden layer of the first RBM is the visible layer of the next RBM, and the multiple RBMs are stacked to extract the characteristics of the input data. The RBMs in the DBN are trained sequentially, starting from the input layer. Specifically, the hidden layer trained with the visible layer in the first RBM unit can be expressed as: where is the input to the hidden layer from the visible layer, while 0, and indicate the weights and bias for the ℎ neuron. (•) denotes the transfer function, which is defined as shown in Eq. (6). The results are then used as the training input for the next RBM, and the training process terminates when all the layers are traversed. The weights in the ℎ iteration are updated following the contrastive divergence algorithm: is the learning rate. In this equation, 1 is the reconstruction result from hidden layer ℎ 1 based on Eq. (5). Similarly, ℎ 1 and ℎ 1 2 are the corresponding results from and 1 , respectively. Once the input data are fed into the model, the pretraining is firstly implemented in an unsupervised manner to generate the initial parameters. In the following fine-tuning stage with labeled data as supervision, the refined weights of the DBN are trained iteratively, backward through the network, using the BP algorithm [59].

3) Model Training and Testing
The MLP and DBN were employed as the deep learning models for the reconstruction of the missing WLs. Despite the differences in model structures, both the two models were trained and tested in the same way. As shown in Fig. 4, for each lake category, the input variables are the subset of the environmental driving parameters, the spatial location (the latitude and longitude coordinates) of the lakes, and the temporal information at a monthly scale. The output is the lake WL, which denotes the target in this study. To establish a welltrained relationship between the input forces and references, the monthly observed WLs from January 2003 to December 2018 were used for the model training. Before the model training, the L-A fitting was used to augment the amount of training data. The reconstruction model was then trained in a two-step manner. Specifically, the model was first pretrained incorporating both the observed WL data and the empirically estimated values as the target dataset. The pretrained model was then further finetuned using only the true observations acquired by the satellite altimeters, where the pretrained weights were used to initialize the network. In the testing stage, the corresponding components of the input data were imported into the final well-trained models, and the missing WLs could be correspondingly estimated.
The implementation details are as follows. Five DBN or MLP models were trained separately, corresponding to the five lake categories defined in this paper. For each lake belonging to the same category, the input variables were normalized independently. Moreover, to suppress the impacts of the time lag effect and data anomalies, the environmental variables were extracted by the use of a temporal window. This means that, for the reconstruction of the WL at month , the sequential variables at month [ − 2, − 1, ] comprise the model input.
The datasets with measured WL information were then randomly divided into a training set (80%) and a test set (20%), where the test set was used to validate the model prediction performance. The parameters employed for implementing MLP and DBN models were determined after the trail experiments. For more details, please refer to Table A-Ⅰ.

D. Model Evaluation
The determination coefficient (R 2 ), root-mean-square error (RMSE), and mean absolute error (MAE) are used in this paper [60]. The specific calculation methods for the related indicators are as follows: where is the reference, and ̂ is the prediction. ̅ is the average of the observations, and n is the number of samples.
With higher values of R 2 and lower values of RMSE and MAE, the reconstruction method is considered to obtain better results.

A. Results of the Environmental Variable Selection
With regard to the clustered lake categories, the different driving factors reflecting the WL variations were selected using the PLSR algorithm (Section Ⅲ).  As presented in Table Ⅴ, the contribution degree of precipitation is relatively small for all the lake categories, which implies that precipitation might not be the pivotal influencing factor for the variation of lake WL changes in the study area. As introduced in Section Ⅱ-A, the precipitation in this region shows a decreasing trend from northwest to southeast (Fig. 6 (a)), while the 14 lakes are mainly distributed in the central and eastern areas. The annual precipitation shown in Fig. 6 (b) shows that the interannual variation trend of the regional precipitation is not significant. Furthermore, the high-altitude precipitation above the snow line mainly falls as snow, where the contribution can be partly attributed to runoff derived from meltwater. Therefore, glaciers melting and river runoff in this alpine area played more significant roles in lake replenishment. In terms of the water supply sources, the lakes directly fed by glacial meltwater are generally sensitive to the temperature variations. Moreover, for the lakes distributed in the arid/semiarid areas (e.g., Ulungu, Uvs, and Hyargas), evapotranspiration has a significant influence on their storage variation [61]. We empirically set the threshold t as 0.1 to select the optimal subsets of the input for the reconstruction models, which are labeled as the bold values in Table Ⅴ.

B. Results of the Sample Augmentation
The L-A linear fitting method was used for the training sample augmentation, and the fitting results for the 14 lakes are presented in Fig. A1. The fitted R 2 values range from 0.48 to 0.95, which indicates that the correlation between lake area and surface elevation varies with the different lakes. Based on the established empirical relationship, the unknown WL samples with valid lake area data could be reconstructed as an initial estimation. In the second column of Table Ⅵ, we list the number of estimated samples based on the L-A linear fitting for each lake category. As described in Section Ⅲ-C, the WL data measured by satellite altimeter were randomly divided into a training dataset (80%) and a test dataset (20%). The pretraining process was conducted with the training set combining the fitted estimations and the true altimetry WL training dataset. In the fine-tuning stage, only the true observations were used as the target to obtain the final model parameters. It can be observed that the L-A fitting can increase the training samples by an average of 43%. The effectiveness of the sample augmentation with regard to the model performance is validated in next section.

C. Model Performance 1) Model Performance Analysis
The performance of the proposed WL reconstruction method is analyzed in this section, where the effectiveness of the lake clustering, forcing variable selection, and two-step training is respectively discussed. The results are summarized in Tables Ⅶ-Ⅷ, where the accuracy of the estimated WL values is validated using the same test datasets (see Table Ⅵ). In Tables Ⅶ-Ⅷ, the rows referring to "SINGLE" mean that the results are obtained with 14 models trained separately for each lake, which is the common solution adopted by the existing works. Moreover, "ALL" denote the results obtained without the lake clustering process, where the whole training dataset for the 14 lakes was used to train the model. The tables also show the model performance with different input variable sets corresponding to the different lake categories and the results are denoted as "MLP-X" an "DBN-X". Table Ⅷ lists the model performance results with pretraining using the augmented samples, referring to "P-MLP" and "P-DBN".
Firstly, the overall accuracy shows that both the MLP and DBN models can obtain favorable reconstruction results. The DBN model generally achieves a slightly better performance, with the more efficient network structure used to learn the complex nonlinear relationship among the data. The "SINGLE" and "ALL" models show no significant differences in terms of the overall statistics. However, when the characteristics of the lakes in terms of climate and hydrological conditions are considered, the performance shows an obvious improvement, for both the MLP and DBN models. Moreover, model construction based on the selected optimal forcing variables also promotes the simulation accuracy in the majority of the categories. This is reflected in the reduction of the overall RMSE value of 0.972 m for the DBN-X model compared with 1.102 m for the DBN-SINGLE and 1.084 m for the DBN-ALL model. The possible reason for the error reduction is that a better relationship can be learned by the predictors with the distracting elements excluded. On the other hand, the lower dimension of input can also improve the model training efficiency.
Comparing the results presented in Tables Ⅶ-Ⅷ, the main difference is whether augmented samples estimated using L-A linear fitting model are used for the pretraining and to obtain the initialized network parameters. In most cases, the sample augmentation can improve the estimation accuracy. It can be analyzed from the results that the effectiveness of the pretraining process depends on the quality of the fitted samples. For example, for Category 1 consisting of Ulungu and Jili lakes, the accuracy of the results derived from P-DBN-X is slightly lower than that of the original DBN. As shown in Fig. A1, it can be seen that the L-A fitted R 2 for these two lakes is 0.58 and 0.68, respectively. The relatively low regressed correlation for both lakes might indicate undesirable estimations, which would result in an inaccurate pretrained model parameter set. Overall, the proposed P-DBN-X method achieves the best results, with the lowest overall RMSE of 0.912 m and MAE of 0.647 m. The performance of learning-based models can be effectively improved by introducing the idea of spatial clustering and pretraining, with the overall RMSE improvement reached 17.0% for MLP and 17.2% for DBN compared with the SINGLE and ALL models.
Except for the overall statistics, we show the prediction results obtained with DBN using different training strategies for each lake in Fig. 7. Given that DBN-ALL and P-DBN-SINGLE obtain the results with the second best accuracy in Table Ⅶ  and Table Ⅷ respectively, the reconstruction results of those two models were presented. It can be clearly observed from the figures that the red dotted lines referring to as the results obtained by P-DBN-X model have a more similar varying trend with the referenced dark lines. This is consistent with the quantitative evaluation results presented in Tables Ⅶ-Ⅷ. Besides, the quantitative results of DBN modelling with different training strategies for the 14 lakes are presented in Table A-Ⅱ. The results further demonstrate that the proposed method with lake clustering and two-step training generally has significant improvement on the results' accuracy compared with the traditional solutions, especially for the small lakes with limited valid observations.  7. Comparison of predicted lake levels with DBN models using different training strategies with the true observations (testing data) for each lake involved in this study.

2) Comparison with the Empirical L-A Fitting
The empirical L-A fitting model has been widely used in previous works for the estimation of missing WLs. In this section, the deep learning-based reconstruction method is compared with the L-A fitting algorithm. Due to the limited spatial coverage of the sample set, only the 10 lakes (i.e., Ulungu, Jili, Markakol, Ureg, Achit, Har US, Hala, Dorgon, Uvs, and Hyargas) with sufficient samples are involved in the evaluation. The overall quantitative evaluation results are presented in Table Ⅸ, where the validation dataset for the three models is the same for a fair comparison. Because the L-A fitting method can only obtain estimations when the lake area observations are available, the data samples with both valid lake area and WL are divided into subsets for regression (211 points) and validation (80 points) in this comparative analysis. The regression set is used to obtain the L-A statistical relationship for each single lake, and the evaluation results for the derived estimations are presented in the first row of Table Ⅸ. DBN-ALL means the model trained with the whole training dataset of 10 lakes, and DBN-SINGLE indicates the model trained for each lake (L-A estimations not included). Moreover, P-DBN-X incorporate the L-A fitted samples in the pretraining stage. The model training mechanism is almost the same as that described in Section Ⅲ, except that the validation set is excluded from the training dataset for all the models involved in the comparison.
As presented in Table Ⅸ, the L-A fitting method obtains the best results in this test, obtaining the overall lowest RMSE of 0.637 m and MAE of 0.587 m. The P-DBN-X model achieves the second best results, with a similar accuracy (RMSE: 0.667 m, MAE: 0.593 m) to L-A fitting. However, the L-A fitting method is limited to the cases with accurate lake area information. Due to the cloud cover and temporal interval of the satellite observations, it is always difficult to obtain highquality consecutive lake area data. Therefore, the proposed deep learning-based method is more suitable for the reconstruction of fine-scale lake WLs at a basin scale. The  Fig. 8, the RMSE and MAE values of the P-DBN-X results are superior to the L-A fitted estimations. The P-DBN-X method shows a more robust performance, with the test R 2 values ranging from 0.40 to 0.69. In contrast, the regression performance varies with the different lakes. For example, the R 2 value of the L-A fitted results for Har US lake is only 0.02.

3) Independent Validation with the HydroWeb Datasets
Based on the above analysis, the proposed P-DBN-X model is used to reconstruct the monthly consecutive lake WL data for the Altai Mountains region. To further evaluate the accuracy of the reconstructed results, the data from HydroWeb are employed for an independent validation for the five lakes with valid observations in this area, i.e., Ulungu, Hala, Dorgon, Uvs, and Hyargas. The estimated WL values are compared with the reference values by the use of scatter plots in Fig. 9.
The evaluation results show that the reconstructed WLs are highly consistent with the measured WL data, with small RMSE and MAE values below 1 m. The R 2 values, however, differ for the different lakes. For example, the R 2 value for Uvs Lake is 0.12, which shows a relatively low correlation. It can be speculated that this is because the fluctuations of the interannual WL change of Uvs Lake are small, and the overall variation within the study period is less than 1 m. Similar reasoning can be applied when analyzing the results for Ulungu Lake. The small-range WL fluctuations for the lakes can cause low R 2 values, but the low error statistics indicate that the reconstruction model is still reliable.

4) Long-term Variation Trends of the WL Series
Based on the reconstructed WLs, the monthly consecutive lake WL time series could be obtained. The temporal WL variation curves for the 14 lakes during 2003-2018 in the Altai Mountains region are shown in Fig. 10. The curves in Fig. 10 show that the predicted WL values generally agree well with the true observations, and the intra-annual dynamics can be reconstructed well. Comparatively, the reconstructed results of single-lake DBN modelling (Fig. A2) show more violent fluctuations, where the marginal values imply the unstable performance of single-lake models. Moreover, the change trends of the lake WL variations during the study period were analyzed using the Mann-Kendall test (see Appendix), where the changes of the lakes are defined as either slight or significant (Fig. 11).
Among these lakes, Hyargas, Toblo, and Kurgan lakes are the main lakes suffering WL drops. Moreover, Hyargas Lake experienced the most significant WL drop, at ~4.4 m from 2003 to 2018. On the other hand, Ulungu and Jili lakes in the southwestern part of the region are the main lakes showing an increase. From the trend analysis mapping in Fig. 11, eight of the lakes exhibited an insignificant change trend for 2003-2018, and six lakes showed significant changes, of either an increase or decrease in WL.
It can be observed from Fig. 10 that lakes in the same category show similar WL trends of either increasing or decreasing WL. For example, both Ulungu and Jili lakes, which belong to Category 1, show a clear increasing trend, which indicates that these two large plain terminal lakes in southwestern Altai experienced expansion. As for the lakes of Category 2, the three lakes (i.e., Ureg, Markakol, and Achit lakes) maintained a steady WL and show slightly declining trends. Moreover, the lakes of Category 3 (such as Tolbo, Khoton, and Kurgan lakes) exhibited seasonal fluctuations, with the highest lake levels around September. As shown in Fig.  6, precipitation in the central Altai area is at a relatively low level. Therefore, it can be inferred that the WL changes of these lakes can be mainly attributed to the amount of glacier melt runoff. This is consistent with the results of the environmental variable selection, where temperature and runoff were found to be the pivotal driving factors related to lake WL changes of Category 3. For those alpine closed lakes located in the western Mongolia Plateau with low precipitation, e.g., Uvs and Hyargas, evapotranspiration is the main means of discharge. Among these lakes, the continuous declining WL of Hyargas Lake can be related to the hydroelectric dam established in this area. Furthermore, Hala and Dorgon lakes distributed in the southeastern study area first suffered shrinkage and then showed a rising WL during the study period, where the lowest WL occurred around the middle of 2010. The regional meteorological changes (i.e., the rectangular region in Fig. 11) are analyzed in detail in Fig. 12. With the limited precipitation in the western Mongolian Plateau, the lake variations were closely related to the synergistic effects of the regional annual average temperature and evapotranspiration during 2003-2018 ( Fig. 12 (a)-(b)). Specifically, the monthly meteorological factors (Fig. 12 (c)-(d)) show that persistent high temperature and evapotranspiration started in April 2010, and lasted the whole of the summer. Thus, the extreme climate conditions caused significant lake storage deficiency in the late spring and summer of 2010. These findings could be easily missed without the consecutive high-accuracy monthly WL observations.

V. DISCUSSION
The results and analysis in Section Ⅳ show that the spatiotemporal reconstruction of missing lake WLs based on the proposed deep learning-based method is an effective way to obtain monthly consecutive WL records. However, the main uncertainties with regard to the model performance might come from three main aspects. Firstly, the number and distribution of the training samples is one of the main influencing factors. As shown in Table Ⅷ, the overall testing error for the Category 3 lakes (i.e., MAE: 0.946 m) is larger than that for the other types. The main reason for this is that the areas of the lakes in this category are small, and thus the tracks of ICESat-1 GLAS for 2003-2009 provide few valid observations covering the lake extent (as in Fig. 10). The uneven distribution of the training samples can result in model bias. Secondly, the quality of the reconstruction results relies on the input environmental datasets. Although we aim to suppress the anomalies in the climate and hydrological datasets by extracting the inputs within a temporal window, the uncertainties of the simulated environmental elements are inevitable. However, the validation results show that the accuracy of the reconstructed WL values satisfies the demand for high-quality monitoring of lake WL changes. Lastly, the quantitative assessment of the influence of human activities on lake WL variations is rarely considered in the modeling described in this paper. The Altai Mountains stretch across four countries, and are located in a high-altitude region with a harsh natural environment. The materials related to social activities in this region are difficult to access. Fortunately, the strength of the human activities in mountainous regions with relatively low population is much weaker than that in urban areas. Moreover, the influence of water diversion and reservoir storage and release is partly reflected in the runoff variations.
From Fig. 10, it is apparent that the coverage of the original WL observations has different temporal deficiency. The lake WL variations could be monitored at an annual scale for only seven large lakes in our previous work [39]. Furthermore, the correlation between the climate factors and lake changes could only be analyzed using annual average data. As a result, it was difficult to obtain a comprehensive analysis of the dynamics of the lakes with only the limited measured samples. In this study, the monthly variations of 14 lakes in the Altai Mountains are established through the incorporation of reconstructed values. As presented in Section Ⅳ-C, the seasonal fluctuations of lake WLs can reveal the potential change mechanism behind the variations. This will help researchers to further explore the impact of climate change or human activities on the changes of water resources.

VI. CONCLUSION
In this study, we proposed a deep learning-based WL reconstruction method to fill the temporal gaps in satellite altimetry observations for lakes and reservoirs. A 192-month consecutive WL dataset for the 14 lakes larger than 20 km 2 in the Altai Mountains for 2003-2018 was generated. Based on the proposed framework, the related climate and hydrological variables were used as the model input, and the measured WLs were used as the output reference to train the model. The lakes were clustered into several categories for the model training. Moreover, the training samples were augmented using the L-A empirical model, and a two-step pretraining and fine-tuning strategy was proposed to improve the accuracy of the deep learning model simulation. The validation results obtained using the testing dataset (overall RMSE = 0.941 m for MLP, RMSE = 0.912 m for DBN) and the HydroWeb dataset indicated that the proposed method showed a robust performance in WL reconstruction in the case of a large proportion of missing data, while DBN slightly outperformed MLP with more efficient model structure. Moreover, we showed that the performance of a deep learning model can be greatly improved by introducing the idea of spatial clustering and pretraining. The comparative experiments also showed that the proposed method can outperform the L-A fitting methods. Overall, the reconstructed WL datasets can obtain the fine-scale (i.e., monthly or sub-monthly) spatial and temporal variation of lake WL changes at a basin scale. The WL data could be combined with lake bathymetry data to construct quantitative lake water storage changes.
In terms of the limitations of this work, more comprehensive studies are needed to further explore the response of lake WLs to environmental factors and human activities. To adapt the proposed method for use in other study areas with intensive human activities, social and economic variables (e.g., population data, regional gross domestic product) would need to be integrated into the proposed deep learning framework. Moreover, the uncertainty of the regional environmental data derived from the model simulation needs to be considered, and the potential uncertainty could be tackled with multi-source data fusion techniques.

A. Partial Least Squares Regression (PLSR)
The environmental variables considered in this study were temperature (T), precipitation (P), evapotranspiration (E), surface runoff (Q S ), and underground runoff (Q U ). The WL data and the environmental variables can be normalized as follows: where means the input matrix composed of the five timeseries meteorological factors (i.e., = 5), and is the number of samples corresponding to the effective WL observations. indicates the output WLs. Thus, and 1 denote the elements in the ℎ row and ℎ column of and , respectively. Moreover, ̅ is the average value of the ℎ column vector of , and ̅ is the average of . Similarly, and are the standard deviations. Subsequently, the principal components 1 were extracted from the normalized data matrices ̃0 and ̃0 : 1 =̃0 (A4) where denotes the eigenvectors corresponding to the maximum eigenvalues of ̃0̃0̃0̃0 . The extracted results represent the decorrelated components, and their covariance with the target variable 0 is maximized. The expression of ̃0 and ̃0 can then be written as [52]: ̃0 = 1 1 +̃1, ̃0 = 1 1 +̃1 (A5) where 1 and 1 are the regression coefficient vectors derived from the least squares regression, and ̃1 and ̃1 are the residuals of the regression equation for ̃0 and ̃0 , respectively.

B. Mann-Kendall Trend Analysis
The Mann-Kendall (M-K) trend analysis [62] has outstanding applicability in the non-normal distribution of meteorological and hydrological data, and the trend analysis of long-term time series is not easily affected by a few outliers. For a time sequence { | i = 1, 2, ... , n}, the MK trend test statistic is calculated as follows: where is the ℎ data value of the time series; is the length of the data sample; and is a sign function, which is defined as: In this paper, the time series of the reconstructed lake WL data from 2003 to 2018 was used to analyze the corresponding significance levels. In general, when the statistic is greater than 1.96, there is a significant trend change existing in the series.