A Novel Seepage Behavior Prediction and Lag Process Identification Method for Concrete Dams Using HGWO-XGBoost Model

Seepage monitoring is a vital task in the risk management of concrete dams. Considering the lag effect of input factors, this paper presents a novel seepage monitoring model for concrete dams and proposes an effective identification method of lag process. Firstly, extreme gradient boosting (XGBoost) were adopted to predict the dam seepage. Hybridizing grey wolf optimization (HGWO) which integrates differential evolution (DE) into grey wolf optimization (GWO) and five–fold cross validation were utilized to optimize the hyper–parameters of XGBoost. Secondly, under the same search range and four evaluation indicators, the models optimized respectively by HGWO and three other algorithms were compared to confirm the global optimization capability of HGWO. Six state–of–art methods were also introduced to verify the effectiveness and feasibility of the proposed model. Then, based on the computation method of factor importance in decision tree models, we evaluated the relative importance of each component in the proposed model. Finally, according to the factor importance, the lag process of upstream water level and rainfall was identified, meanwhile a new equivalent water level calculation method is proposed. Monitoring data from three piezometric tubes on a concrete dam were taken as the experimental object. The results show that the improved HGWO has stronger global optimization ability, and the HGWO–XGBoost model achieves satisfactory prediction for seepage in concrete dams. Compared with the traditional trial–and–error method, the lag process computation method proposed in this paper provides a better recognition effect, which is of great value to the seepage monitoring and control of concrete dams.


I. INTRODUCTION
Dam safety monitoring is an essential means to control risk and understand the operational status of a dam.A large number of instruments are embedded in dams to monitor all aspects of daily behavior [1].Seepage control is one of the most important tasks in dam surveillance [2], [3].Therefore, in order to grasp the seepage law of dams, it is essential The associate editor coordinating the review of this manuscript and approving it for publication was Claudio Cusano .
to establish a reasonable monitoring model on the basis of massive data.
Generally, the seepage is related to the loads which the dam received, and the size of dam seepage directly reflects the anti-seepage and drainage performance of dams.Previous research results have indicated that water level, rainfall, temperature and aging are main factors influencing dam seepage [4], [5].However, the water pressure transfer and dissipation of an unsaturated body take a certain time.Compared with dam displacement, the lag effect of factors makes the dam seepage model more complicated, which causes the seepage more difficult to predict.
Traditional statistical models mostly adopt multiple regression or stepwise regression methods, assuming that the variables are independent, to solve the mapping relation between dam seepage and explanatory variables [6]- [8].But this assumption is often invalid due to the existence of multicollinearity among variables, which may severely limit the robustness and accuracy of the model [9]- [11].Furthermore, considering the lag effect, a mass of previous water level and rainfall factors should be included in the statistical model, so as to make the model prone to ill-conditioned problems.
For solving those problems, Wu and Gu greatly simplified factors by exploiting the previous segmental average values or equivalent values of the upstream reservoir water level and rainfall based on the lag influence function, to establish different forms of seepage monitoring models [12].But even so, the information loss in the reduction process, as well as the nonlinear relationship between explanatory variables and dependent variables, still restrict the accuracy of the model to a certain extent.
With the rapid development of computer technology, plenty of data-driven machine learning algorithms have been proposed constantly.Although large amounts of computation are usually required, high accuracy attracts scholars to gradually shift their research focus to these methods [13]- [17].
In recent years, some research on the application of machine learning methods has been carried out to analyse the dam seepage, and positive results have been achieved.Sharghi et al. [18] respectively adopted neural network, support vector machine and adaptive neural fuzzy inference system to establish a monitoring model of piezometric heads in a earthfill dam, and integrated the three methods to improve the prediction performance.Shi et al. [19] used radial basis function neural network which optimized by genetic algorithm to predict the seepage discharge of the concrete face rockfill dam.Based on the data mining method, Hu and Ma established a zoned safety monitoring model to estimate the uplift pressures of concrete dams [20].Nourani et al. [21] proposed a statistical model of the piezometric tube water level by using feed-forward back-propagation and radial basis function neural network.Chen et al. [22] developed kernel extreme learning machine to study the dam leakage, and employed global sensitivity analysis to evaluate the importance of each input factor.Salazar et al. [23] analyzed the leakage problem of the La Baells Dam with the application of boosted regression trees and explored the interpretability of the model.Wang et al. [24] decomposed piezometric tube water level and reservoir water level into the form of base value plus daily variation, then the seepage statistical model is proposed by combining linear regression and support vector regression (SVR).
It can be seen from the literature that, compared with traditional statistical methods, machine learning algorithms can deeply explore the implicit relationship among variables and have a better performance in prediction.But at the same time, the interpretability of models is usually sacrificed.Although some scholars have made some attempts to improve the interpretability of dam seepage monitoring model (such as [22]- [24]), most of them merely evaluated the importance of the previous segmental average values of input factors.However, average value is a relatively general concept which cannot directly reflect the specific lag process of relevant variables.
In order to accurately evaluate the lag behavior of dam seepage, this study takes the piezometric tubes of a dam as an example.Based on extreme gradient boosting (XGBoost) which has been recognized as one of the most advanced algorithms in the field of machine learning in recent years [25]- [28], a monitoring model of the uplift pressure of concrete dam is established with consideration of the time lag effect.Under the premise of cross-validation, hybridizing grey wolf optimization (HGWO) is applied to optimize the hyper-parameters in the XGBoost model.Then four quantitative evaluation indicators are utilized to comprehensively evaluate the advantages of HGWO and proposed model by introducing six state-of-art baseline models.According to the growth process of decision tree [29], [30], the lag process of upstream water level and rainfall are identified, then a new computation method of equivalent water level is proposed.
The structure of this paper is as follows: Section II briefly describes the relevant methods and theories involved in this study.In Section III, combined with a specific engineering case, the research design, implementation of the model and experimental results in detail are introduced.Finally, main conclusions and future work are drawn in Section IV.

II. METHODOLOGY A. VARIABLES CONSIDERED FOR THE SEEPAGE MONITORING MODEL OF CONCRETE DAMS
The measured data shows that the uplift pressure of concrete dam foundation is mainly affected by upstream and downstream water level.The slope seepage induced by rainfall also has a certain influence on the uplift pressure.The fissure size of bedrock will change as a result of temperature fluctuation.In addition, given the degradation of impermeable material performance, the aging component should also be selected.Therefore, the traditional hydraulic, precipitation, temperature, and time effect (HPTT) statistical model of piezometric tube water level can be expressed as [2], [12]: where Y is the piezometric tube water level at the monitoring point; Y Hu and Y Hd respectively denote the upstream and downstream water level components; Y p , Y T and Y θ refer to the rainfall component, temperature component and aging component, respectively.

1) UPSTREAM WATER LEVEL COMPONENT Y Hu
where H i (i = 1, 2, • • • , n) represents the ith previous water level; w i is the weight of H i and n i=1 w i = 1, thus the upstream water level component can be obtained as: where a u is the regression coefficient; Hu denotes the equivalent value of the upstream reservoir water level.It is generally believed that the influence process of upstream water level on dam seepage basically follows a normal distribution (see Figure 1).The weight of upstream water level on the tth day before the monitoring day can be expressed as [2], [12]: where x 1 is the number of lag days of the upstream water level on uplift pressure; x 2 is the number of influence days of the upstream water level, both of them need to be calculated by trial [31], [32]; w(t) obeys 0 −∞ w(t)dt = 1.The equivalent value of the upstream reservoir water level can be expressed as: Generally, there is only one measured value of upstream water level in a monitoring day.Therefore, in practical application, continuous integral can be changed into discrete integral, and the integral interval can be taken as 2 ∼ 3 times of x 2 .

2) DOWNSTREAM WATER LEVEL COMPONENT Y Hd
The downstream water level also has lag effect on uplift pressure.But since the downstream water level is generally measured fewer times and fluctuated gently, only the downstream water level of the monitoring day is taken as a factor [2].The downstream water level component can be written as: where a d is the regression coefficient; H d is the downstream water level of the monitoring day.

3) RAINFALL COMPONENT Y Hp
During the course of rainfall, part of the precipitation causes the surface runoff which will flow into reservoir to change the water level, the rest of them produces groundwater.Then the groundwater percolates through the bedrock joints and fissures to affect the uplift pressure of dam foundation.There is obvious lag effect between the uplift pressure and rainfall.Similar to the upstream water level component, the expression form of rainfall component can be obtained as: where d i is the regression coefficient; P denotes the equivalent value of rainfall; x 3 and x 4 are the number of lag days and influence days of rainfall, respectively.

4) TEMPERATURE COMPONENT Y HT
The bedrock fissures are affected by the temperature fluctuation, thus the uplift pressure of the dam foundation changes correspondingly.The temperature of bedrock basically cycle with the annual period.In the absence of measured bedrock temperature, the simple harmonic wave of multi-period can be used as the temperature variables [2], [12].The downstream water level component can be expressed as: where n = 1 or 2; t is the cumulative days from the initial monitoring day; b 1 i and b 2 i are the regression coefficients.

5) AGING COMPONENT Y θ
The aging component is an important component of uplift pressure.It changes rapidly at the initial impounding stage, and then becomes stable with time.The common expression is as follows: where θ is the cumulative days from the initial monitoring day divided by 100; c 1 and c 2 are the regression coefficients.
To sum up, the traditional statistical model of the piezometric tube water level at a single monitoring point located in dam foundation can be expressed as follows: where a 0 is the constant term, the remaining symbols have the same meaning as above.

B. THE METHODOLOGY OF XGBoost 1) THE BOOSTING TREE METHOD
The boosting method based on decision trees is called the boosting tree algorithm, which is a binary classification tree for classification problems and a binary regression tree for regression problems.It is considered as one of the methods with the best performance in statistical learning [33].
The purpose of machine learning is usually using training data to find the mapping relationship between independent variables and response variables.Different from the traditional machine learning methods, the optimization of boosting tree algorithm is carried out in the function space.In order to find the optimal mapping function, an objective function L(y, F(x)) is usually defined, to solves F(x) by minimizing the expected value of the objective function on the joint distribution of (x, y): As for a given sample combination: where x i = {x i1 , x i2 , • • • , x ik } and y i are the eigenvector and response variable of the ith sample, respectively.When the joint distribution of (x, y) is estimated by finite data samples {x i , y i } N 1 , instead of the above parameterized form for accurate computation, E y [• | x] can only be obtained by optimizing the objective function through the following parameterized form: where h(x; a) denotes the weak learner which is the decision tree in this case; β is the weight of h(x; a); a refers to the parameters of decision trees; To prevent overfitting, Friedman [34] proposed a shrinkage method which helped the algorithm gradually approximate the result by applying a small weight to each decision tree, similar to β of equation (14).Due to the existence of the shrinkage method, after the completion of each iteration, there will be enough space left for the boosting of the subsequent decision trees.
Therefore, the boosting tree method can be expressed as the additive model of decision trees: where ρ denotes the learning rate, that is, the weight of weak learners.

2) XGBoost
XGBoost model can be expressed as: where ŷi is the predicted value of the ith sample; f k denotes the kth decision tree; ρ and a have the same meaning as the corresponding symbols in equation (15).In order to get the optimal model, the objective function can be written as: where y i is the measured value of the ith sample; T is the number of leaf nodes in a single tree; w refers to the vector of scores on leaf nodes of single tree; l is the loss function between the predicted value and the measured value of the sample; denotes the regularization term to prevent overfitting; γ and λ are coefficients.The boosting tree method adopts forward stagewise algorithm.Assuming F 0 (x) = f 0 (x) as the initial value of the model, each time a decision tree is obtained by solving the objective function to update the current model, and the optimal model F(x) is obtained after several iterations.
When the structural form of the objective function is relatively simple, equation ( 14) is easy to solve.Under the circumstances, the direction in which the current objective function approaches the minimum is the global optimal direction.But for the general form, it is difficult to solve.In terms of this issue, the XGBoost model uses the gradient boosting method to optimize the objective function.The second-order Taylor expansion of the objective function under the tth iteration can be expressed as: where When iterating to the tth round, l ŷ(t−1) i , y i is explicit.Assuming that I j is an instance of the jth leaf node, w j is the value on the jth leaf node.Hence equation ( 18) can be simplified as: Set ∂ Lt t /∂w j to be equal to 0, the optimal vector of scores on leaf nodes can be obtained as: By substituting equation ( 20) into equation ( 19), equation ( 19) can be written as: Finally, the optimization of the objective function is transformed into the optimization of equation (21).Equation ( 21) can also be used to evaluate the structure quality of decision trees under the tth round iteration in XGBoost model.When the structure of trees is determined, the quality is only related to the first and second derivatives of the loss function.

C. FACTORS IMPORTANCE COMPUTATION
There are several alternative ways to calculate feature importance.The importance of a certain feature can be described by the total number of times that it is selected during the growth of decision trees.Beyond that, the feature importance can also be defined as the proportion of the model improvement to the total improvement across all splits, when a certain feature is selected as the segmentation variable [35].The improvement indexes can be reduced error, information gain or reduction of Gini index and so on.
The variation amplitude of these indexes decrease gradually during the generation of gradient boosting decision trees.Therefore, the improvement during node splits of the first few decision trees plays a decisive role in the computation result of factor importance.However, XGBoost takes a subsample approach of samples and features in each node split.That is, when we take the magnitude of improvement in metrics as the definition of feature importance in the XGBoost model, the results may depend on the generation of the first few decision trees, which is usually contingent and not sufficient to reflect the actual situation.Given those, we describe the feature importance through the number of times that a certain feature is selected as segmentation variable in this study.
For the ensemble learning algorithms which build on decision trees, Breiman et al. [36] proposed a method to define the importance of input factors: where T m is the mth decision tree in model F; M is the total number of trees; I 2 k (T m ) denotes the importance of factor X k in T m ; J is the number of internal nodes which include J − 1 non-terminal nodes in T m ; 1 j X k is the indicator function of whether to select variable X k as the segmentation variable at node jth node.

D. THE METHODOLOGY OF HGWO 1) GWO
GWO is a new swarm intelligence optimization algorithm proposed in 2014 by Mirjalili et al. [37], which simulates the cooperative mechanism of gray wolf group during predation.The intelligent feedback mechanism and adaptive convergence factors enable it to achieve a balance between global search and local optimization.For this reason, GWO has good performance in solving precision and convergence speed with the characteristics of simple structure and easy implementation [38].
Gray wolf groups consist of α, β, δ and ω wolves.α wolves which usually denote the best solutions are the highest leaders of the pack; β wolves which assist α wolves are the sub-optimal individuals of wolves; δ wolves which represent the third fitness solutions need to carry out the orders of α wolves and β wolves, while managing ω wolves; ω wolves is the search individuals.α wolves, β wolves and δ wolves jointly assess the prey position and guide ω wolves to move, so as to realize all-round encircle of the prey, finally attack and capture the prey (see Figure 2).
GWO simulates gray wolf groups with the whole process of the social class, division of labor, and hunting prey, specifically including the following three steps: 1. Surround the prey: The gray wolves search and approach the prey gradually.The mathematical model can be written as: where t is the number of iterations; X p represents the position vector of the current prey; X denotes the position vector of grey wolf population; A and C are coefficient vectors; r 1 and r 2 are random vectors with the value interval of [0, 1], a decreases linearly from 2 to 0 with t.
Mathematically, the behavior of encircling prey can be simulated by reducing the value of a, the updated formula of a is as follows: where M is the maximum number of iterations.2. Hunting: Keep three of the fittest wolves in each iteration.According to their positions, the position vector of grey wolves can be updated.The mathematical model can be expressed as: where X α , X β and X δ respectively represent the positions of α, β and δ wolves in the current population; X is the position vector of gray wolf population as before; D α , D β and D δ respectively represent the distance between the candidate wolves and three fittest wolves.
3. Attack and search for prey: A is a random number on [−a, a].The range of which will fluctuate accordingly with the linear decrease of a.When |A| > 1, gray wolves try to disperse in their respective areas to search for prey.On the contrary, gray wolves will launch attacks to hunt prey when |A| < 1.

2) DE
DE is a global optimization algorithm proposed by Storn and Price according to the mechanism of biological evolution which mainly includes mutation, crossover and selection operations [39], [40].
1.mutation operation:three different individual vectors X r 1 , X r 2 and X r 3 (r 1 = r 2 = r 3 = i) are randomly selected from the population in each generation.A new individual is constructed by one of them as the basis vector and the difference of the other two as the difference vector, which can be expressed as: where V i (P + 1) refers to the position vector of the ith new individuals after mutation; P is the number of iterations; F r denotes the scaling factor in the range of [0,2].2.crossover operations: the test vector is generated by exchanging information between the new and former individuals, which can be expressed as: where U i,j (P + 1) represents the jth gene of the ith test vector; C r is the crossover factor with the range of [0,1]; j rand is the random integer within [1, d]   between the parent generation and the child generation.The selection rule is as follows: 3) HGWO GWO has a strong local optimization ability, whereas differential evolution can effectively help the algorithm jump out of the local optimal solution to achieve global search.Therefore, Zhu et al. [41] proposed a HGWO combining DE and GWO.
Step1: The parent population is randomly generated through equation ( 29), which can be expressed as: where X j (low) and X j (up) are the lower bound and upper bound of the jth gene respectively; i ∈ [1, k] and k is the population size.
Step2: Calculate the fitness function value of the individuals in the parent population and arrange in order to find out the positions of α, β and δ wolves.
Step3: According to the positions of three fittest wolves in the parent generation population, update the positions of individuals with equation (25).
Step4: Use equation ( 26) to get the mutant population of the parent population.
Step5: The child population can be obtained by crossing the parent population and mutant population in the form of equation (27).
Step 6: Make one-to-one selection in the parent population and child population through equation ( 28) to retain fitter individuals.
Step7: If the number of iterations is less than the given maximum, return Step2 to continue the iteration.Otherwise,  the position of α wolves in the current population is the optimal solution of the algorithm.

E. THE PROPOSED SEEPAGE PREDICTION AND IDENTIFICATION METHOD
According to the above-mentioned theories, a novel seepage behavior prediction model for concrete dams based on HGWO-XGBoost is proposed.In the light of this model, an effective identification method of lag process is proposed.Figure 3 shows the flowchart of model implementation.

III. CASE STUDY A. ENGINEERING INTRODUCTION
A concrete double-curvature arch dam is located in Panzhihua City, Sichuan Province, China.The maximum dam height is 240m and the elevation of dam crest is 1205m.In order to effectively monitor the uplift pressure, 22 piezometric tubes are arranged along the dam foundation, numbered PZ01∼PZ22.The piezometric tube water level is measured by osmometers.
There are 5 piezometer tubes located in the first row behind the impervious curtain, among which PZ05 and PZ17 have been damaged.In this study, the remaining intact piezometer tubes PZ01, PZ09 and PZ13 are selected.Figure 4 shows the layout of the piezometer tubes.
The complete monitoring sequence from 2011 to 2017 was taken as the training set, and the measured data from 2018 to August 2019 were taken as the testing set.The environmental variables and monitoring data are measured almost once a day, thus there are more than 3200 sets of sample data at each of the three piezometer tubes.The process lines of environmental factors and piezometric tube water level are shown in the figure 5 and figure 6 respectively.

B. EVALUATION INDICATORS
In order to understand the accuracy of the model more objectively, four evaluation indicators, including determination coefficient (R 2 ), mean absolute percent error (MAPE), root mean square error (RMSE) and mean absolute error (MAE), are used to evaluate the performance of the models in the training set and the testing set.The calculation formulas are as follows:  where y t is the measured value and ŷt is the output value of the model.The better model performance is, the larger R 2 or the smaller error index is.

C. PARAMETER OPTIMIZATION FOR XGBOOST
For machine learning algorithms, the selection of hyper-parameters has a crucial impact on the model performance.As well as adding a regularization term to the objective function and using the shrinkage method, XGboost also adopts the subsample method to avoid overfitting.In each iteration, a certain proportion of samples will be randomly selected for the learning of a single decision tree, and a certain proportion of features for the growth of a decision tree.This strategy makes full use of the existing data sets, but prevents strong correlation features from being applied overmuch to cause overfitting meanwhile, when processing high-dimensional data.
On the choice of lag factors, scholars usually adopt the previous segmental average values or equivalent values of the upstream water level and rainfall during the first 30 days before the monitoring day to consider the influence of these two kinds of factors on the dam seepage, and some good results have been obtained [2], [12], [19], [24].
In view of the above characteristics, in order to effectively distinguish the lag effect of upstream water level component and rainfall component, the original measured upstream water level and rainfall of the monitoring day and the first 30 days were selected as the model inputs in this paper.The forms of downstream water level component, temperature component and aging component were the same as equation (10).The factors considered for the XGBoost model are listed in table 1.
In XGboost, eight parameters need to be determined so as to control generation of decision trees and avoid overfitting.The meaning and search range of these parameters are shown in the table 2. GWO, HGWO, DE and GA were used to optimize the model respectively.The parameters of HGWO and DE are set to: population size SearchAgents_no = 10; maximum number of iterations Max_iter = 200; scaling factor F r = 0.5; crossover probability C r = 0.2.GWO has two parameters, SearchAgents_no and Max_iter, which are set the same as HGWO.In GA, SearchAgents_no and Max_iter are set the same as GWO.In addition, GA has two parameters: crossover fraction F c and migration fraction F m , which are set to 0.9 and 0.01 respectively.
In order to improve the generalization ability of the model, avoid overfitting and eliminate the influence of current data on algorithm performance as far as possible, we adopted five-fold cross validation during parameter optimization to evaluate the performance of the model under the certain parameter combination.For any piezometer tube, the training set was randomly divided into five parts with approximately the same size, four parts of which were taken as the training set and the remaining part as the testing set in turn.The average of the five results was taken as the final output of the model.The parameter optimization results are shown in table 3, where the best metric values of the models are marked in boldface.
Optimized by any of the four algorithms, the XGBoost model all shows high accuracy with little error.It is noticeable that under the condition of the same population size and iterations, HGWO has better global optimization ability than other algorithms.The performance of HGWO-XGBoost model is better than that of other models regardless of piezometer tubes and evaluation indicators, except having the same R 2 as some of the models.The other three algorithms perform nearly.By combining GWO with DE, HGWO outperforms either of them.Especially at PZ09, the MAPE, RMSE and MAE of HGWO-XGBoost model are less than 40% of those of GWO-XGBoost model.

D. MODELS PERFORMANCE COMPARISON
As can be seen from the previous analysis, compared with the other three algorithms, HGWO obviously has better global optimization capability.To demonstrate the feasibility of the proposed model, six state-of-art methods were introduced as baseline models.They include SVR, multilayer perceptron (MLP), extreme learning machine (ELM), multiple linear regression (MLR), random forest(RF) and gradient boosting decision tree (GBDT), which have been widely used in dam safety monitoring.Among them, RF and GBDT belong to the ensemble learning method based on decision tree as XGBoost.
So as to avoid ill-posed problems in traditional statistical models, equivalent values of upstream water level and rainfall were used to replace the influence of these two components.We utilized a trial-and-error method on the basis of HGWO to calculate the training set, and then the number of lag days and influence days were obtained.The selection and computation of input variables are the same as equation (10), and the computation results are shown in table 4.
For the remaining five baseline models based on machine learning, in order to objectively evaluate their ability to identify and predict the dam seepage behavior, the model inputs are the same as that of XGboost, meanwhile adopting HGWO to optimize hyper-parameters.
The key hyper-parameters of comparison models are shown in table 5.The performance of SVR models is mainly controlled by regularization parameter C and kernel coefficient gamma, the range of which is [0,1000] and [0.001,1] respectively.The key hyper-parameters of MLP are mainly including the number of neurons in the hidden layers (N e ) and learning rate (l e ), the range of which is [10,500]    respectively.For ELM, there is only one control parameter needed to be optimized: L, the number of hidden nodes with the range of [10,500].The RF model and the XGBoost model have four and six key hyper-parameters, respectively.Among them l m is the minimum number of samples required to be at a leaf node, with the range of [1,5].The other parameters have the same meaning as their counterparts in XGBoost (see table 2).The performance of the models on the training set is also evaluated by using five-fold cross validation during parameter optimization, and the computation results are shown in table 6 where the best metric values of the models are marked in boldface.The histograms of evaluation indexes are shown in figure 7.
We can see from the table 6 and figure 7 that in terms of the fitting results of the training sets, the models based on machine learning algorithms have excellent performance in accuracy.The fitting accuracy of the traditional statistical model is slightly lower at PZ09 and PZ13, but the error is acceptable, indicating that the input factors in section II-A can well summarize the causes of seepage behavior.In other words, the input factors selected in this study are reasonable.The XGBoost model and the GBDT model are the most outstanding overall, following by the RF model and the SVR model.The XGBoost model performs better at PZ09, while the GBDT model performs slightly better at PZ01 and PZ13.
As far as testing sets are concerned, models perform differently than they do on the training sets.The accuracy of the three ensemble learning models is particularly outstanding.It can be seen that the performance of the XGBoost model is the best among comparison models in all respects at all piezometer tubes.In the other four models, the MLP model have the best and most stable performance with little prediction error at all tubes.The ELM model and MLR model follow, but perform poorly at some monitoring points.
It could not be neglected that the performance of the XGBoost model on the testing sets is slightly inferior to the SVR model on the training sets, while the prediction accuracy of the SVR model is greatly reduced compared to the fitting results.In addition, we can find that the prediction accuracy of the XGBoost model is slightly better than that of the GBDT model, which is contrary to the performance they show on the training sets.It indicates that the introduction of regularization term makes the XGBoost model avoid falling into over-fitting more effectively.The optimal model results are shown in figure 8.
The methods mentioned above can be divided into two major categories.One is single model, including SVR, MLP, ELM, and MLR.The other is the integration model with more advances, including XGBoost, RF, and GBDT.The basic strategy of these three ensemble learning algorithms is to integrate multiple decision trees to complete data mining work, through bagging or boosting method.Compared with single models, the combination strategy makes the integrated models have potential to explore the deeper nonlinear relationship between factors and response variables, as well as more stable performance on testing sets.
At the same time, the bigger size of the algorithm structure makes the integration models have more hyper-parameters.Especially in the XGBoost model, the introduction of subsampling and regularization term strategy makes it have the most hyper-parameters, but the comparison results also show that it has the optimal performance.Although the integration models generate more computational costs, it is negligible in terms of the scale of data in dam safety monitoring.

E. RELATIVE IMPORTANCE OF INPUT COMPONENTS
Considering that the HGWO-XGBoost model has the highest accuracy, we only focus on this model to interpret the piezometric tube water level.We computed the importance of all input factors according to the method introduced in section II-C.Then the importance of factors was summed up according to the components they belong to.We tested the validity of the proposed method by analyzing whether the relative importance of each component is consistent with common engineering perceptions.
Figure 9 shows the relative importance of the five input components at all three of monitoring points.It can be seen from the obtained results that the upstream water level component is the most important to uplift pressure, with more than 40% relative importance in the whole of three points, which is 43.37%, 41.70% and 50.29% respectively.It is followed by the rainfall component of which the relative importance at the three points is 27.20%, 32.19% and 22.73%, respectively.The time component has a minor influence on seepage.The relative importance of the time component at PZ01 and PZ13 is close (10.71% and 11.06%, respectively), but at PZ09 which is slightly lower (6.93%).The relative importance of the temperature component is slightly greater than that of the time component.The downstream water level component has minimum influence on the uplift pressure with around 4% at each of the three points.
From the perspective of engineering experience, the magnitude of upstream water level plays a crucial role in dam seepage, following by the effect of groundwater that is closely related to rainfall.In this case study, upstream water level is 140∼180 m higher than downstream water level which has far less variation meanwhile.The correlation between the downstream water level component and the dam seepage should be far less in comparison with the upstream water level component.Therefore, the calculated results in figure 9 is accordance with practical engineering knowledge.

F. LAG PROCESS IDENTIFICATION
Whether at PZ01, PZ09 or PZ13, the upstream water level and rainfall have the greatest influence on uplift pressure.However, both of them show a strong time lag effect.In order to control the seepage behavior of the dam effectively, it is necessary to accurately evaluate the lag process of the upstream water level and rainfall.
According to the computation results of factor importance obtained in section III-E, we analyzed the upstream water level factors and rainfall factors separately.Take the upstream water level component as an example.With the passage of time, the water level factor of the monitoring day will become previous water level factor, and the previous water level factors will turn into more previous ones simultaneously.Therefore, the importance distribution of water level factors to uplift pressure on the monitoring day and the first 30 days can be equivalently regarded as the lag process of water level on uplift pressure.The distribution of factors importance is shown in figure 10.
According to figure 10, it shows some consistent lag laws at the three piezometer tubes: • The water level on the monitoring day has the most significant influence on uplift pressure.The effect of this water level doesn't disappear immediately, but gradually declined  over time.In the first 4 days or so, the effect is relatively distinct and drops rapidly, and then slowly declines to be stable.
• The lag process of upstream water level on uplift pressure accords with the expectation that it is approximately normal distribution, but obviously it is not a strict normal distribution.
• As far as rainfall is concerned, the most interesting observation is that its lag process does not obey the normal distribution at all.The lag process of rainfall shows the characteristic of multi-peak and fluctuation, which is completely different from anticipation.
In this study, the trial-and-error method based on HGWO-MLR and factor importance computation method which was on the basis of HGWO-XGBoost were utilized respectively to identify the lag process of upstream water level and rainfall.Comparing the results obtained by these two methods, it can be drawn that there is some difference.In order to explore the rationality of the results, we used these two methods to construct the equivalent values of upstream water level and rainfall firstly.
It can be observed from figure 10 that there is a clear influence period on the upstream water level.The water level factors within the period have an obvious effect on uplift pressure of the monitoring day, whereas ones outside the period have a comparatively small and average influence.The influence period of upstream water level at PZ01 and PZ09 is roughly the first 7 days, whereas which at PZ13 is 14 days.In terms of the effect of rainfall on uplift pressure, the lag process is more complicated.Thus, the 30 days is taken as the influence period of rainfall.
In view of the relative importance computation result, the weight of factors during influence period were defined, of which the sum was 1.The factors in the influence period were weighted and summed to obtain the equivalent values (Y Hu1 and Y p1 ).The equivalent values (Y Hu2 and Y p2 ) by the trial-and-error method were obtained through equation (5) and equation (7)

FIGURE 1 .
FIGURE 1.The influence process of upstream water level on dam seepage.

FIGURE 4 .
FIGURE 4. Layout of the selected piezometer tubes.

FIGURE 5 .
FIGURE 5. Collected data of environmental factors.

FIGURE 6 .
FIGURE 6. Process lines of piezometric tube water level.

FIGURE 7 .
FIGURE 7. The histograms of evaluation indexes.

FIGURE 8 .
FIGURE 8. Performance of the best model.

FIGURE 9 .
FIGURE 9. Relative importance of input components.

FIGURE 10 .
FIGURE 10.The lag process of factors at three piezometer tubes.

TABLE 1 .
Input factors considered for the XGBoost model.

TABLE 2 .
Explanation and search space of the control parameters in XGboost.

TABLE 3 .
The optimal parameters obtained by optimization algorithms.

TABLE 4 .
The computation results of lag effect by trial-and-error method.

TABLE 5 .
The hyper-parameters of comparison models.

TABLE 6 .
Performance assessment of different models.

TABLE 7 .
Validity comparison of lag process identification via different equivalent values.
respectively in the case of table 4. Then stepwise regression method was utilized to establish traditional HPTT statistical model.Four models were built at each monitoring point, include model I (Y Hu1 + Y p1 ), model II (Y Hu1 + Y p2 ), model III (Y Hu2 + Y p1 ) and