Predicting changes in spatiotemporal groundwater storage through the integration of multi-satellite data and deep learning models

Continuous monitoring and accurate spatiotemporal groundwater storage change predictions can help support sustainable development and efficient groundwater resources management. In this study, remote sensing-based data were used to develop two deep learning predictive models, long short term memory (LSTM), and convolutional neural network-LSTM (CNN-LSTM) models. The input variables were terrestrial water storage anomaly (TWSA) recorded by the Gravity Recovery and Climate Experiment (GRACE) satellites and the GRACE-Follow-On (GRACE-FO) mission, precipitation measured by the Tropical Rainfall Measuring Mission (TRMM) satellite, average temperature and soil moisture obtained from the Global Land Data Assimilation System (GLDAS), and normalized difference vegetation index (NDVI) and modified normalized difference water index (MNDWI) acquired from the Landsat 5 and 8 satellites. A comparison of groundwater storage change predictions by the two deep learning predictive models with in situ measurements of the National Groundwater Monitoring Network (NGMN) in South Korea showed that the CNN-LSTM model was more accurate [root mean square error (RMSE) = 44.92 mm/month, correlation coefficient (r) = 0.70, index of agreement (IOA) = 0.81] than the LSTM model [RMSE = 47.44 mm/month, r = 0.62, and IOA = 0.77]. Furthermore, a comparison of the performance of the two models for three specific regions showed that the CNN-LSTM model captured spatiotemporal variations in groundwater storage better. Additionally, a comparison of cumulative change trends in groundwater storage with the NDVI, MNDWI, and TWSA data showed that changes in land cover and total water storage affect groundwater storage. These comparison results demonstrate that the use of parameters recorded by multiple satellites as training data for deep learning models can play an important role in the model’s performance.


I. INTRODUCTION
The importance of a stable water supply and systematic water resource management has increased with the frequent occurrence of abnormal weather conditions due to climate change [1]. Groundwater is a vital freshwater resource that can help overcome these problems, and continuous and systematic groundwater change monitoring is essential for the system design, development, and implementation for effective groundwater resource management [2]. In South Korea, observation networks are operated nationwide by government departments to continuously monitor groundwater and achieve sustainable use [3], [4]. Reliable and accurate spatiotemporal groundwater storage change predictions are important for gauging the amount of groundwater available and efficiently managing groundwater resources [5]- [7]. Groundwater variations can be determined based on hydrogeological properties and boundary conditions of groundwater systems as well as meteorological, hydrological, and land cover change effects. Systematic groundwater predictions are challenging because of various nonlinearities and the complex spatial groundwater variability [5].
In recent years, remote sensing approaches have been proposed for estimating spatiotemporal groundwater storage changes, especially by combining multi-satellite data such as the Gravity Recovery and Climate Experiment (GRACE) and GRACE-Follow-On (GRACE-FO) satellites, Tropical Rainfall Measuring Mission (TRMM), and Landsat based on the water balance method [7]- [10]. Although these approaches were evaluated relatively simply and reliably over a wide range of regions, they are not suitable for forecasting or nowcasting groundwater storage changes owing to the low spatial/temporal resolution of satellite data. Furthermore, since the GRACE satellite-based water balance method involves only hydrological components (Δterrestrial water storage, Δsnow water equivalent, Δsurface water, and Δsoil moisture, [9]), vegetation or anthropogenic effects are often ignored. Recently, some studies proved that vegetation or land cover changes including NDVI and MNDWI, are dominant predicting factors of groundwater recharge/potential [4], [11], [12]. This study suggests using multiple satellite data considering various influencing factors to increase result reliability.
The main challenge is to predict how changes in various earth system components affect groundwater storage changes, and therefore, it is necessary to analyze the groundwater response to such changes by quantifying the relationship between changes in climate factors, surface water extent, hydrological components, and vegetation [5], [6], [11].
Machine learning approaches such as those involving artificial neural networks (ANN), support vector machine (SVM), and random forest (RF) based on nonlinear interdependence have been developed to solve complicated problems in computer vision, image recognition, and timeseries forecasting. Various machine learning methods have been applied to the development of groundwater prediction models, e.g., hybrid feedforward neural network (FNN) [6], [13], nonlinear autoregressive models with exogenous input (NARX) [14], [15], and ANN [7], [16] models. Although there has been an attempt to predict hydrological components, groundwater storage changes are still challenging because most studies have focused on ground observation point data. In particular, the resolution with which spatiotemporal patterns can be predicted is limited by the scale of ground observations (location, time interval, etc.), which are the input data [17]. One of the studies that attempted to overcome these limitations, Reference [7] used an ANN algorithm and multisatellite data for monitoring spatiotemporal variations in groundwater storage and droughts in the Korean Peninsula. While this method was shown to be effective for monitoring groundwater storage variations with relatively good performance (training period and test period correlation coefficient r ~ 0.9 and r ~ 0.6, respectively), input parameter selection and overfitting in specific regions (e.g., coastal regions and data-scarce regions) are issues that need to be addressed.
In recent years, deep learning models, such as the long short term memory (LSTM) model, which is one of the recurrent neural networks (RNN) models, and the convolutional neural network (CNN) model have been developed to solve longterm dependency problems in complex and nonlinear prediction and they have shown good performance [18]. These models have multiple hidden layers that facilitate the use of complex and deep network structures to accurately identify the relationship between predictor and target variables [19]. Gradient vanishing (exploding) problems and overfitting issues faced in the case of machine learning models can be reduced using deep learning models. Furthermore, a rectified linear unit (ReLU) is an activation function that can be used to easily solve vanishing gradient problems through the weight update process with a high dropout rate [20]. Construction of predictive models by using a deep learning model typically involves five stages: 1) preprocessing data for use as input/output data, 2) designing of the model with an appropriate number of layers for defining the input/outputs, 3) model training by using loss functions, 4) network optimization (hyperparameter optimization), and 5) model validation.
LSTM and CNN are generally applied in many fields, including water resources such as terrestrial water storage [21], [22], surface water [23], [24], and flood forecasting [25], while have only recently begun to be applied to groundwater prediction in the last two years [5]. Reference [26] developed groundwater potential maps using SVM and CNN models and proved that CNN has better predictive ability than SVM. References [5], [27], [28] compared NARX, LSTM and CNN for predicting groundwater levels at the point scale of groundwater observations. Although the deep learning models improved the groundwater prediction skills, the model was evaluated at the point scale. In addition, the input data were considered only meteorological observation data (e.g., precipitation and temperature) and in situ groundwater data. Therefore, it is necessary to improve the spatiotemporal groundwater storage change prediction model considering various factors, such as meteorological, vegetation, and hydrological components.
In this study, we combined deep learning models with remote sensing-based data for various influencing factors (meteorological, land cover, and hydrological), which can provide continuous and spatial information for predicting spatiotemporal changes in groundwater storage. To our knowledge, this is the first attempt to use deep learning for groundwater predicting models based on remote sensing data in South Korea. Considering the potential effectiveness of a combination of deep learning models and remote sensing image data, this study aimed to (1) develop deep learning models-LSTM and CNN-LSTM models-for predicting groundwater storage changes from multi-satellite data and images, (2) generate long-term spatial groundwater storage changes maps for South Korea, and (3) identify groundwater storage changes based on land change.

A. STUDY AREA AND GROUND OBSERVATIONS
The study area covers South Korea (Fig. 1), which is located in temperate East Asia and has a monsoonal climate. There are five major river basins (Han, Nakdong, Geum, Seomjin, and Yeongsan) that are divided by mountain ridges, and most rivers flow westward because the eastern part is at a higher altitude than the western part ( Fig. 1(a)). Shallow alluvial deposits exist along major rivers, which tend to recharge in the eastern highlands and discharge in the western lowlands. The National Groundwater Monitoring Network (NGMN) in South Korea provides information on groundwater monitoring wells operated by the National Groundwater Information Management and Service Center [29]. The are two types of monitoring wells: shallow alluvial (mean well depth: 30 m) and deep bedrock (mean well depth: 70 m depth) aquifers. Only bedrock aquifer monitoring wells have been constructed in areas without alluvial deposits [30], [31]. Based on data availability and data continuity, 166 groundwater monitoring stations (95 alluvial aquifers and 71 bedrock aquifers) were selected ( Fig. 1(b)) and their groundwater levels were obtained. Monthly observations from 2003 to 2019 were used for estimating groundwater storage changes (GWSC) and for training the LSTM and CNN-LSTM models.

B. REMOTE SENSING-BASED DATA
In this study, remote sensing-based data of parameters that directly or indirectly affect groundwater storage were used as the input for the deep learning models. The inputs were precipitation (P) and average temperature (T), which were meteorological factors, normalized difference vegetation index (NDVI) and modified normalized difference water index (MNDWI), which were proxy factors for land cover, and soil moisture content (SM) and terrestrial water storage anomaly (TWSA), which were hydrological factors. Satellite products were adopted (Table 1)

1) TRMM SATELLITE PRODUCT
The TRMM provides multiple satellite precipitation products (TMPA) since 1988. TMPA3B43V7 is one of the TMPAs, and it is obtained by combining TRMM microwave imager data, geosynchronous infrared estimates, and gauge data from the Global Precipitation Climatology Center (GPCC, [32]). TMPA3B43V7 provides monthly precipitation with a spatial resolution of 0.25° (approximately 25 km). Previous studies have evaluated the TMPA3B43 products over South Korea and have found good agreement with in situ data in terms of the correlation coefficient (r: 0.94-0.96) [33], [34].

2) GLDAS REANALYSIS DATA
The GLDAS is a data assimilation system based on global land surface models such as the Community Land Model (CLM), the Noah, the Mosaic, and the Variable Infiltration Capacity (VIC) models that use ground observations and multiple satellites [35]. It provides various hydrometeorological reanalysis data using multi-satellite data and ground point data, and it is used for spatial analysis and climate change and weather forecasting worldwide. Previous studies have evaluated GLDAS products such as precipitation, evapotranspiration and soil moisture over South Korea and have found good performance with ground observation data (r: 0.7-0.9) [36]- [39]. In this study, T and the sum of the SM of each layer (0.0-2.0 m; four layers) were extracted from the GLDAS Noah model at monthly intervals with a spatial resolution of 0.25° × 0.25°.

3) LANDSAT SATELLITE-BASED LAND COVER INDEX
Landsat imagery has been widely used to study and monitor various aspects of inland cover [40]. Landsat satellites provide global coverage with a resolution of 30 m, and Level-1 T1 images covering South Korea are available from the USGS Earth Explorer [41]. A total of 2,652 images were selected from the Landsat 5 Thematic Mapper (TM) for the period 2003.01-2013.02 and from the Landsat 8 Operational Land Imager (OLI) for the period 2013.03-2019.12. Radiometric calibration and atmospheric correction to remove clouds and shadows were performed using the fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) algorithm based on metadata files from the USGS [42]. Since there were missing data after calibration, Landsat images were monthly averaged to generate NDVI and MNDVI time series and mosaicked using nearest-neighbor mean resampling over the study area. NDVI was calculated as shown in (1) for all images by using the near-infrared (NIR) band and red visible band (RED) of calibrated Landsat images, and MNDWI was computed using the green visible band (GREEN) and the shortwave infrared (SWIR) band, as shown in (2). The NDVI and MNDWI values range from -1 to 1, and high NDVI and MNDWI values reflect healthy vegetation and water bodies, respectively.
NDVI and MNDWI with a 30 m spatial resolution were converted to fractional values at 0.25° to match their spatial resolution with that of the other input variables. For fractional value extraction, thresholds were applied to vegetation and non-vegetation regions, and water and non-water features were segmented by considering both indices. The fractions of vegetation and water cover were calculated using the following equation: where and are fractional vegetation and water cover, respectively, and ( ) , (non ) , ( ), and (non )are the number of vegetation, non-vegetation, water and non-water pixels, respectively.
References [43] and [44] determined the maximum NDVI threshold for bare soil and vegetation over South Korea as 0.31 and 0.86 by using Landsat images, respectively. Furthermore, many previous studies have set the MNDWI threshold value for extracting water bodies to zero. In this study, the threshold was set to be in the range of 0.31-0.86 for vegetation and 0 for water bodies.

4) GRACE AND GRACE-FO SATELLITE MASCON SOLUTION
The GRACE mission, launched by NASA and Deutsches Zentrum für Luft-und Raumfahrt (DLR), was operated between 2002 and 2017. GRACE data include temporary gravitational field changes due to mass changes in the atmosphere, oceans, water, and ice zones. The success of the GRACE mission led to the launch of GRACE-FO in May 2018 for the same purpose [45].
Changes in the gravitational field can be converted into changes in the equivalent water thickness or terrestrial water storage (TWS). GRACE/GRACE-FO Level-3 (RL06) data include TWSA data processed using methods, which can be determined in two possible ways by using either spherical harmonic coefficients or a mass concentration function (Mascon). GRACE/GRACE-FO data are available from various institutes, including the University of Texas at Austin Center for Space Research (CSR), the GeoForshungsZentrum (GFZ), and the Jet Propulsion Laboratory (JPL). In this study, TWSA was extracted from CSR RL06M (RL06 Mascon), the latest data of GRACE and GRACE-FO [46], [47]. CSR RL06M TWSA data were provided in a 0.25° × 0.25° grid, but they represent hexagonal tiles with an equal-area geodesic grid size of 1.0° × 1.0° at the equator [47]. There were temporal gaps between GRACE and GRACE-FO with a length of approximately 14 months. Many studies have reported on the gap filling between GRACE and GRACE-FO temporal data using interpolation (linear [48], singular spectrum analysis [48], and cubic spline [49], [50]). Reference [48] applied linear interpolation to fill the gap and confirmed that the trend is positively biased by over 10% compared to the original series. The interpolation results of different studies may vary, and the original series will also affect the prediction. Nevertheless, since no data can replace the GRACE and GRACE-FO data, the interpolation method is widely used. In this study, a cubic spline suitable for time series interpolation was applied [49], [50].

1) MODEL STRUCTURE
The deep learning models employed in this study were LSTM and CNN-LSTM models. LSTM is one of the deep learning algorithms that are commonly used for regression, especially for time-series prediction, and it is also used for solving the vanishing gradient problem for long-term predictions [18]. LSTM is slightly more complex than the traditional RNN when calculating hidden states and adds cell memory. In the LSTM cell memory, three types of gates (input gates, forget gates, and output gates) delete unnecessary memories and determine what should be remembered by the network. LSTM has a conveyor-belt-like structure that contains cell states, and each cell state has a learner to update the state.
For an input sequential set of = [ 1 , 2 , ⋯ , ] and an output set, the process is expressed below. The input gate is the gate for remembering the current information, and memorizes the updated information as follows: where is the activation function, is the weight, ℎ is the hidden state, and b is the bias term.
The forget gate deletes memories, and the and the hidden state at t-1 (ℎ −1 ) pass through the activation function. The input gate and forget gate determine the new memory cell state ( ) by an elementwise product (⨀) as follows: Finally, the output gate is used to determine the hidden state at t (ℎ ). The value of the long-term state passes through the hyperbolic tangent function and becomes a value between -1 and 1. Hidden state values are directed to the output layer.
A CNN is advantageous for feature extraction and consists of convolution layers and max pooling layers. Each convolution layer contains convolution kernels. CNN-LSTM is a CNN combined with the LSTM architecture, designed specifically for the problem of sequence prediction of spatial input data such as images. CNN was used for the spatial information extraction from satellite images or data and LSTM was used for sequence prediction in the CNN-LSTM model. In other words, after extracting features between several input variables on CNN, time information on irregular trends in time-series components was modeled in LSTM based on the extracted information.
In this study, remote sensing data-driven LSTM and CNN-LSTM models were compared and analyzed. The models were constructed using TensorFlow in the Keras library of Python (version 3.6), and Fig. 2 shows the LSTM and CNN-LSTM structures. The LSTM model comprised an input layer, and the LSTM layer consisted of each cell memory, a dense layer fully The remote sensing-based input variables were normalized using the min-max scaling method to avoid disrupting convergence in the network. This method is defined by (11), which can be transformed to confine the data to the range 0-1.
The ReLU function was adopted for use as an activation function since it can converge faster than the sigmoid or tangent hyperbolic function. The loss function for updating the network was selected as the mean squared error (MSE) coupled with the Adam optimizer. The Adam optimizer is a commonly used technique for determining weights for minimizing the loss function in the task of making time-series predictions.

2) HYPERPARAMETER TUNING
Hyperparameters are variables used in deep learning models, and they are generally based on the prior knowledge of the experimenter about the optimal implementation of deep learning models [51]. Deep learning is a complex type of neural network with various hyperparameters such as batch size, training iterations, and hidden layers that should be tuned before training. Hyperparameter tuning improves the performance of deep learning models and helps avoid overfitting or underfitting problems. Hyperparameter tuning techniques include manual search, grid search, random search, and Bayesian optimization. Bayesian optimization is a method of finding hyperparameters by using Bayes rules on the basis of the experimenter's prior knowledge and it takes a relatively shorter time to determine optimal parameters for large datasets compared with grid search and random search [52].
In this study, Bayesian optimization methods based on Python's Keras tuning library were used for cross-validation to determine the optimal hyperparameters. Hyperparameter sets and the optimization range of values were configured as shown in Table 2. The time series were divided into training and test periods, and these two periods were set to 60% (January 2003 to February 2013) and 40% (March 2013 to December 2019) of the total period, respectively.

A. MODEL PERFORMANCE DEPENDENCE ON HYPERPARAMETER OPTIMIZATION
In the deep learning modeling process, training the model by setting the hyperparameters has a significant influence on the model performance. The purpose of training is to update parameters to minimize loss values through optimization methods [25]. Fig. 3 shows the loss results of both the LSTM and CNN-LSTM models for Bayesian optimization for each latency and the number of hidden layers. The LSTM model had a greater loss for 1 and 2 months of delay compared with a lag time of 3-6. For a lag time of more than three months, the more hidden layers there were, the greater the loss value.
The CNN-LSTM model performance improved when the lag time increased, and converged from six months. The final network architecture of the LSTM and CNN-LSTM models for GWSC prediction was designed to have one hidden layer with a lag time of three months and two hidden layers with a lag time of six months. The GWSC data predicted by each model were compared with observations at ground stations distributed across South Korea. Groundwater level change data ( ∆ ) from groundwater stations were converted to GWSC in terms of equivalent height by using a specific yield ( ) (12). In this study, the specific yield was set to 0.20, which was reported in previous studies [7], [8].
The test results (March 2013 to December 2019) were examined in terms of the root mean square error (RMSE), correlation coefficient (r), and index of agreement (IOA) between the observed and predicted GWSCs to assess the accuracies of the two models. Fig. 4 shows the density scatterplots showing the correlation between all cells of the predicted GWSC and in situ measurements corresponding to locations of the cells during the test period. The GWSC predicted by the CNN-LSTM model showed a higher correlation with in situ measurements ( Fig. 4(b)) and higher accuracy (RMSE = 44.92 mm/month, r = 0.70, and IOA = 0.81) than that predicted by the LSTM model during the test period (March 2013 to December 2019). While the LSTM model (RMSE = 47.44 mm/month, r = 0.62, and IOA = 0.77) was less accurate (Fig. 4(a)), its result did not differ significantly from the CNN-LSTM model's result.

B. EVALUATION OF PREDICTED GROUNDWATER STORAGE CHANGES
The deep learning models performance over South Korea is shown by the time series in Fig. 5. The magnitude and variability of GWSCs obtained from the LSTM (red dotted line; Fig. 5(a)) and CNN-LSTM (green line; Fig. 5(b)) models were consistent with those of in situ measurements (black line). The TMPA precipitation during the study period is also presented in Fig. 5. The LSTM and CNN-LSTM models captured both the monthly and seasonal dynamics of in situ GWSC associated with precipitation, apart from showing stable performance for the test period (March 2013 to December 2019). However, a comparison of the LSTM model's predictions with in situ measurements indicated that the model tended to overestimate GWSCs. Fig. 6 shows the spatial distribution maps for the RMSE, correlation coefficient, and IOA obtained by comparing grid results from deep learning models and 0.25° spatial resolution of in situ measurements by using inverse distance weighting (IDW) during the test period (March 2013-December 2019). The grids on the red or yellow scale are characterized by large errors and negative or non-significant correlation, respectively, whereas the blue scale represents cells with significant linear correlations. Comparisons between deep learning model predictions and in situ measurement showed good agreement overall, with only small areas being characterized by high RMSE values. Overall, the CNN-LSTM model's predictions appeared to be better than the LSTM model's predictions, and for both models, higher RMSE values were notably observed in the southwest coastal and eastern mountainous regions. This error appears to result from the limited number of training samples for these areas.

C. SPATIOTEMPORAL GROUNDWATER STORAGE CHANGES AND LAND COVER CHANGES
Monthly GWSC maps generated for South Korea were used to visually evaluate the performance of the proposed deep learning models through the test period results. The interpolated in situ data were compared with the values obtained from the two proposed deep learning models. Fig. 7 shows three examples of GWSC maps obtained with the predictions of the CNN-LSTM and LSTM models as well as those obtained from in situ measurements for March 2019 and August 2019. Overall, the GWSC spatial patterns for all selected examples (March and August in 2019) were well predicted by the LSTM and CNN-LSTM models (second and third columns in Fig. 7), as evidenced from a comparison with the corresponding maps generated from the in situ data (first column in Fig. 7).
The CNN-LSTM model simulated the spatial patterns of GWSCs fairly accurately. The high variability in the eastern (blue areas shown for March 2019) and northern and southeastern areas (blue and red areas for August 2019) were well matched. However, for the LSTM model, although the spatial variability pattern of the GWSC appeared to be in good agreement, the predicted maps could not capture the variability as the maps derived from the CNN-LSTM predictions and in situ measurements did. This showed the superiority of the CNN-LSTM model, which was predominantly attributed to its convolutional filter.
The GWSC, land cover index and TWSA were compared to analyze whether the influencing factors considered in the deep learning model affected the GWSCs. Land cover changes (NDVI and MNDWI) recorded by Landsat 5 and 8, and the TWSA of GRACE and GRACE-FO were analyzed by comparing cumulative GWSC (in situ and deep learning predictions) from 2003.
Seoul (A in Fig. 8), a part of Jeollanam-do (B in Fig. 8), and a part of Gyeongsangbuk-do (C in Fig. 8) were chosen as regions that use groundwater the most, and they were examined in detail. A region is South Korea's capital and the most densely populated city, and groundwater usage is 6.2% of the total amount of groundwater used in the country. The groundwater usage is 14.3% and 14.0% in regions B and C, respectively [29]. In all three regions, both the vegetation cover fraction and waterbody cover fraction decreased in 2008 compared with 2003 (Table 3 and Fig. 8), apparently because of the drought that occurred in Korea in 2008 (precipitation in   (Table 3).  Fig. 9 shows the time series of cumulative GWSC obtained from in situ measurements and the LSTM and CNN-LSTM models, and a linear trend for the A (Fig. 9(a)), B ( Fig. 9(b)), and C ( Fig. 9(c)) regions is evident. The CNN-LSTM model (green lines in Fig. 9) not only captured the decreasing trend of in situ GWSCs well (black lines in Fig.  9), but also clearly showed the highest accuracies for the three regions (r = 0.88, 0.92, and 0.82 for the A, B, and C regions, respectively). The LSTM model (r = 0.87, 0.87, and 0.81 for the A, B, and C regions, respectively) was less accurate (red dotted lines in Fig. 9). The TWSA data of GRACE and GRACE-FO had a direct influence on the hydrological dynamics, including GWSCs. Cumulative GWSCs were compared with the TWSA. The GWSC rate was inferred to be −9.09 ± 0.4 mm/year for 2003.01-2019.12 in region A, similar to the rate of decrease in the TWSA estimated from GRACE and GRACE-FO data (−5.21 ± 0.2 mm/year; Fig. 9(a)). In region B, the rate of depletion GWSCs was −4.00 ± 0.2 mm/year ( Fig. 9(b)), and the decrease in the TWSA rate was −3.24 ± 0.2 mm/year, comparable to the depletion rate. During the study period, GWSCs in the C region also decreased slightly, with a rate of −4.14 ± 0.2 mm/year, and TWSA showed a decrease rate of −8.68 ± 0.3 mm/year.
These results indicate that the predicted GWSCs are related to influencing factors of hydrological and land cover changes from satellites. Despite the difficulty in evaluating the complex variability of groundwater mechanics using various variables, the proposed prediction algorithm based on appropriate multi-satellite data is expected to provide a new direction for the development of high-performance GWSC prediction models in a relatively simple way.

IV. CONCLUSIONS
Monitoring GWSC is a nonlinear and complex task, and a large amount of data for various variables and a high proficiency of researchers are required for physical modeling. Deep learning models can provide a promising alternative to capture complex interactions between variables related to GWSC. Furthermore, satellite data for various variables can be easily incorporated to predict spatiotemporal GWSC in deep learning models. In this study, LSTM and CNN-LSTM models were used to accurately predict GWSC over South Korea by using different feature combinations from multi-satellite data. The deep learning models involved various input variables (precipitation, average temperature, soil moisture storage changes, TWSA, NDVI, and MNDWI). It is necessary to understand the sensitivity of their predictive performance and avoid overfitting problems. Hyperparameter optimization based on Bayesian optimization was performed using different parameter combinations and the prediction accuracies of the combinations were compared.
Overall, the results showed that the CNN-LSTM model predicted GWSC maps relatively better than the LSTM model in terms of the RMSE, r, and IOA. The LSTM model performed rather well in shallow neural networks (one hidden layer), and when using spatiotemporal data such as satellite data, the CNN-LSTM model with an added convolution layer appeared to be better. In the LSTM model, the optimal lag time was three months and in the case of the CNN-LSTM model,

FIGURE 9. Cumulative GWSC time series for specific regions (a) A region (b) B region, and (c) C region for facilitating a comparison of TWSA from GRACE and GRACE-FO
the optimal lag time was six months. The latter model showed better performance in a longer lag time than the former model.
These results indicate that the deep learning models based on satellite data proposed in this study (with meteorological, hydrological, and land cover change indicators) are suitable for long-term groundwater change modeling. It is concluded that considering influencing factors such as precipitation, land cover changes, and TWSA resulted in the models accurately predicting spatiotemporal changes in groundwater storage. Predictions for some areas with a relatively low accuracy may be improved by considering additional input data related to GWSCs or other hyperparameter optimization methods. Furthermore, relatively accurate predictions of GWSCs, such as those provided by the proposed models can be used to determine appropriate measures to deal with climate change, thereby rendering society competent to respond to the challenges posed by climate change.