Joint Spatial and Temporal Modeling for Hydrological Prediction

The accurate and timely estimation of river discharge plays an important role in hydrological modeling, especially for avoiding the consequences of flood events. The majority of existing work on hydrologic prediction focuses on modeling the inherent physical process for specific river basins, while the geographic-connections between rivers are largely ignored. Geographically connected rivers provide rich spatial information that can be used to predict discharge amounts. In this paper, we study a novel problem of exploiting both temporal patterns and spatial connections for hydrological prediction. We construct three relationship graphs for hydrological gauges in the study area: the hydraulic distance graph, the Euclidean distance graph and the correlation graph. We fuse these graphs into one hydrological network graph, and propose a novel framework ST-Hydro which exploits Graph Convolutional Networks (GCN) for learning the spatial feature representations, and Recurrent Neural Networks with carefully designed activation functions for capturing temporal features simultaneously for hydrological prediction. Experimental results on real world data set demonstrate that the proposed framework can predict the river discharge effectively and at an early stage.


I. INTRODUCTION
With global warming, changeable weather and frequent extreme precipitation events in recent years, flood events have occurred more frequently and drastic than usual. For example, it is reported that the Midwestern United States has been experiencing major floods since mid-March 2019, and nearly 14 million people have been affected by the flooding. 1 To mitigate the detrimental effects caused by flood disasters, hydrologic prediction becomes particularly important. Hydrologic prediction is to forecast the hydrological information (e.g., runoff, water level) in a certain period of time (e.g., few hours) in the future according to the hydrological and meteorological data in the early stage. It can provide a basis for decision-making in flood control and disaster relief, and is important for the rational utilization of water resources.
The associate editor coordinating the review of this manuscript and approving it for publication was Md. Moinul Hossain . 1 https://en.wikipedia.org/wiki/2019_Midwestern_U.S._floods Therefore, it is critical to predict hydrological events effectively and timely to help hydrologists for decision making. The majority of existing methods fall into three types: conceptual model, physical model and data-driven model. The conceptual model is based on the physical concept and empirical formula of hydrological phenomena, and the physical model is based on the physical process. They can predict hydrological events for specific rivers. But these models may need a large number of hydrologic parameters, and can not be easily adapted to other water basins. Specifically, since the parameters of the physical process in different basins are quite disparate, and the parameters and even the structure of physical-based models may need to be significantly modified. Therefore, recently the data-driven hydrologic prediction model has gained increasing attention for predicting hydrological events. For example, Hwang et al. [24] proposed an improved support vector machine method for nonlinear hydrological time series prediction with better prediction accuracy and prediction performance, and Kratzert et al. [27] proposed a new Rainfall-Runoff Model using Long Short-Term Memory (LSTM) networks. However, most of the existing data-driven models focus on the learning the temporal characteristics from historical data.
In fact, hydrological data has rich spatial information. For example, the upstream of a river can directly affect the runoff of the downstream, and the two nearby rivers are likely to have similar rainfall processes, which results in similar changes in runoff. The spatial relationships across rivers provide important signals and have great potential to advance the hydrological prediction performance [10], [37]. However, most of the current data-driven hydrological forecasting methods focus on modeling the temporal patterns, without considering the rich spatial characteristics. In essence,we investigate: (1) How to build a hydrological topology map and mine spatial information; (2) How to combine temporal features with spatial features to improve prediction accuracy. The solution to these challenges results in a novel framework ST-Hydro, which incorporates Graph Convolutional Networks (GCN) for learning the Spatial feature representations and Recurrent Neural Networks with carefully designed activation functions for capturing Temporal features simultaneously for Hdrologic prediction. The contributions are as follows: • We study a novel problem of exploiting both spatial and temporal information for data-driven hydrological prediction; • We propose a principled way to jointly learn spatial features with Multi-Graph Convolutional Networks and capture temporal features with an adaptive Gated Recurrent Unit for hydrological prediction; • We conduct extensive experiments on real world datasets to demonstrate the effectiveness and timeliness of the proposed model. The paper is organized as follows. Section 2 reviews the related research about hydrological prediction and the applications of GCN. Section 3 introduces the details of the model. In section 4, we compare the performance of our model with that of other prediction models. Section 5 concludes the paper and describes the future work.

II. RELATED WORK
In this section, we briefly discuss the related work on: (1) hydrologic prediction; and (2) graph neural networks for temporal modeling.

A. HYDROLOGIC PREDICTION
The existing hydrological models are mainly divided into conceptual model, physical model and data-driven model. The conceptual model is based on the physical concept of hydrological phenomena and some formulas, such as the Xin'anjiang model [6] and the HBV model [31]. The physical model is based on the laws of conservation of mass and energy in physics, as well as the characteristics of runoff generation and concentration, to construct a set of hydrodynamic equations to simulate the changes of time and space and calculate them, such as the SHE model [11] and the SWAT model [1]. The data-driven model uses some methods to deduce the response function according to the input and output data. It produces the corresponding output results for some input data through empirical analysis, such as the ANN model [25]. Many hydrologic models are often designed for specific water basins and may not be easily adapted to other watersheds. The parameters used between different basins are quite different and not universal. When the same kind of model is applied in different watersheds, the model parameters and even the model structure may need to be greatly modified [7], [22]. In addition, hydrological time series-based prediction methods that analyze the laws in historical hydrological data and predict future trends have been developed for hydrologic forecasting recently. Compared with other hydrological models, data-driven models can make up for some shortcomings of physical model and conceptual model. For example, after training, a data-driven model can be applied to other basins without a lot of parameter adjustment. In the data age, data-driven models are becoming more and more popular [32], [38].
Yule [46] established an autoregressive (AR) model to lay the foundation for a time series model. Nowak et al. [33] first used the AR model to predict runoff. Yu et al. [44] combined chaos theory with SVM and applied it to chaotic time series analysis with large sample data records. Hwang et al. [24] proposed a least squares support vector machine (LS-SVM) method for nonlinear hydrological time series prediction with better prediction accuracy and prediction performance. Xing et al. [41] proposed a new heuristic optimization algorithm, BA algorithm, which is used to optimize SVM parameters and predict monthly average traffic in 2015. Compared with cross-validated support vector machines, the accuracy is improved. Li et al. [30]proposed an SVM flood forecasting model based on kernel principal component analysis(KPCA) and boosting algorithm. The nonlinear KPCA is applied to extract the useful information from historical flood data. Experiments show that the proposed SVM ensemble model based on KPCA and boosting learning can improve the flood forecasting accuracy effectively. Atiquzzaman and Kandasamy [4] [5] proposed a fast prediction method of hydrological time series by using the limit learning machine (ELM), aiming at the uncertainty of traditional gradient based slow learning algorithm in training and iteratively determining network parameters. Based on the basic principle of flooding formation, Chen et al. [13] proposed a data-driven hydrology forecasting model, Convolution Regression based on Machine Learning. This model could reflect the impact of hourly rainfall on the future flow changes and the flow changes are predicted by superimposing these impacts.

B. GRAPH CONVOLUTIONAL NETWORKS FOR TEMPORAL PREDICTION
Convolutional Neural Networks (CNN) [28] is a type of feedforward neural network that includes convolution calculations and has a deep structure. It is one of the representative algorithms of deep learning [21]. However, there are many VOLUME 8, 2020 graph structure or network structure data in the real world, which are difficult to deal with by traditional Convolutional Neural Networks(CNN). In 2017, Thomas N. Kipf proposed a semi supervised classification with graph structural network [26], which can process graph structure. Recently, Graph Convolutional Networks(GCN) is widely used in various domains such as social networks and information networks for feature learning [16]. GCN is mainly used in unstructured data such as social networks and information networks to extract features for learning. However, few people use GCN in temporal dependence data. For example, Yan et al. [42] proposed the spatiotemporal convolution network model ST-GCN, it solves the problem of human motion recognition which is based on the key points of the human skeleton. This model is built on the skeleton map sequence and each node corresponds to a joint of the human body. There are two sides, one is the spatial edge, the other is the temporal edge. Average the eigenvectors of all nodes in the neighborhood, and then calculate the inner product with the parameter vectors of convolution kernel. Chai et al. [12] used the inbound and outbound flows before t time to predict the inbound and outbound flows at t time. This paper constructs a variety of spatial relationships, and obtains the prediction effect by constructing multiple maps. But the structure of encoder decoder is limited, and the dimension of input and output is fixed. Zhao et al. [48] proposed a temporary graph revolutionary network, which combines GCN and Gated Recurrent Unit(GRU) to predict traffic information. Kim et al. [36] considered the spatial and temporal influence, and the influence of global variables, such as weather and weekday/weekend to reflect non-stationlevel changes. And then used graph convolutional network to predict bike demands. They obtain the time and space dependence from the T-GCN, and predict the vehicle speed in future one hour.
Spatial features are also used for processing in many aspects in hydrology (such as hydrological similarity analysis [49]). In this paper, we construct a novel hydrological topological structure, and use improved GCN to mine the spatial characteristics, improved GRU to mine the time characteristics, and further perform hydrologic prediction.

III. METHODOLOGY
In this section, we present the details of the proposed framework ST-Hydro for hydrological prediction, which mainly consists of three components (see Figure 1): (1) a spatial feature learning component; (2) a temporal feature learning component; and (3) a hydrological prediction component.

A. MODELING SPATIAL FEATURES
In hydrology, since hydrological process has a great relationship with geographical location and spatial features, it is important to capture spatial dependence between rivers. In this section, we establish the hydrological topology graph and use GCN to mine spatial features.

1) HYDROLOGICAL TOPOLOGY GRAPH GENERATION
Graph generation is an important first step for training graph convolutional network model. If the graph can not reflect the relationship between nodes well, the model will not mine information effectively. The hydrological topology graph G is represented as a weighted graph, whose nodes are stations and edges are the geographical relationships between stations. We use G = (V , E) to describe the topological structure of the hydrological network, where V is a set of station nodes, V = {v 1 , v 2 , · · · , v N }, N is the number of the stations, and E is a set of edges. The weights of edges represent the strength of relationship between stations.
From the hydrological point of view, if a gauge is downstream to another gauge in the same river, the similarity of the flow at these two gauges will be high because of their hydraulic connections [34], [40]. However, if two gauges are in different catchments without hydraulic connections (For example, A and C in Figure2), the similarity of the flow at two gauges is determined by the rainfall-runoff process in each own catchment. If the two gauges are close in space with short Euclidean distance, their discharge tends to have high similarities because of the similar climatic and land cover conditions. However, the short Euclidean distance is not the degerminator because the topographic conditions can significantly vary even in small local scale, the discharge series may still significantly differ in this case if they are not with the upstream-downstream connection.
Here we construct three different graphs to show the relationships between hydrological stations.

a: HYDRAULIC DISTANCE GRAPH
Here we use latitude and longitude to calculate the upstream and downstream relationship, and the distance of the river channel. The hydraulic connection between hydrological is estimated with regards to Digital Elevation Model (DEM). As water in each grid flows to its neighbouring grid with the steepest slope. Then, we can find the flow path of water 78494 VOLUME 8, 2020 from any point to the ocean or out of the study area. If the water path of an upstream gauge (e.g., G i ) goes through a downstream gauge (e.g., G j ), the hydraulic distance is the length of the flow path between the two gauges (dh i,j ). If they have no hydraulic connections, the dh i,j is ∞. The hydraulic distance of a certain gauge to itself is considered as 0. As an example, the dh A,B between A and B is shown in Figure 2. The matrix for the hydraulic distance is D H : The adjacency matrix of hydraulic distance graph is A H : The matrix for Euclidean distance (D e ) is similar but the hydraulic distance is replaced.
where atan2 is a function, 2 R is the earth radius, where where θ is the difference of latitude in two geographic points, α i and α j are longitudes of the two geographic points.
α is the difference in longitude. Note that the angles need to be in radians rather than in numerical latitudes or longitudes. For example, the de A,B between A and B is shown in the Figure 2. The matrix for the Euclidean distance is D E : The adjacency matrix of Euclidean distance graph is A E : The correlation between stream flow is used as a means to understand spatial patterns of stream flow dynamics [3], [8]- [10], [45]. In addition to distance, we also calculate correlations based on the daily steam flow of the stations for the last five years. We use Pearson 's correlation to calculate the correlation. c m,k is the correlation result between station m and station k. The adjacency matrix of correlation graph is A C : 2) GRAPH FUSION Then we need to merge different graphs into one graph. We combine different graphs by the weighted summing their adjacency matrices at the element level. We use the channel distance between the upstream and downstream rivers, the Euclidean distance between the common stations and correlation values between stations to build the adjacency matrix.
where a i,j is the elementary of the ith row and jth column in the adjacency matrix A. Because the hydraulic distance is more important than the Euclidean distance [34], we take VOLUME 8, 2020 0.4 for α and 0.2 for β and 0.4 for γ . Then we can obtain the adjacency edges and nodes of the graph, and construct the hydrological topology graph.

3) GRAPH CONVOLUTION
In this section, we need to do graph convolution. The essence of discrete convolution is weighted sum. Convolution in CNN essentially uses a filter with shared parameters to extract spatial features by calculating the weighted sum of central pixels and neighboring pixels. However, CNN cannot process the non-structure data. Different from CNN, GCN can extract the spatial features of topological graphs. Here we use the GCN to capture the spatial features from the hydrological information(shown in Figure 3). In the graph structure data, we should consider the characteristic information and structure information of the nodes at the same time, that is, the adjacency matrix A and the characteristic matrix X . For example, when predicting runoff, X is daily runoff data and rainfall data. The model can be defined as: where l is the number of layers, X (l) is the feature of the node of the first layer, A is adjacency matrix. Then, the feature transformation of nodes is carried out first, and the adjacency matrix is normalized by the degree matrix. After adding the self cycle, the relationship between each node and the adjacency node is considered, and the specific model formula is as follows: j is the feature of all neighbor nodes of node (including itself)in layer l, σ is a nonlinear transformation, A is the adjacency matrix,Â represents self circulation,D is the degree matrix corresponding toÂ, N i is all neighbors of node i (including itself), W (l) is the weight of layer l, b (l) is the intercept of layer l.

B. LEARNING TEMPORAL FEATURES
We have demonstrated the spatial feature extraction of hydrological topological structure, now we demonstrate how to learn temporal features from time series. Gated Recurrent Unit (GRU) [14] can improve the gate design of the Long Short-Term Memory [19](LSTM) to overcome the gradient disappearance problem. In GRU, the activation function is very important. Existing commonly used activation functions include sigmoid, tanh, softsign, and ReLu. Sigmoid activation function [43]: The sigmoid function is soft saturated, which can be guided everywhere in the definition domain. When x takes a large value, the corresponding y value will not be very different. In other words, when the input value tends to be positive infinity or negative infinity, the gradient will approach zero. So the gradient will disappear. In addition, the mean shift problem is one of the shortcomings of the sigmoid function.
Tanh activation function [18]: The activation function tanh has a value range of -1 to 1, so its output mean is close to 0, overcoming the mean shift problem generated by the sigmoid function. It makes the stochastic gradient fall closer to the natural gradient, reduces the number of iterations required to solve the network parameters, shortens the training time of the deep network. However, it still has soft saturation, which is easy to cause the gradient to disappear.
Softsign activation function [17]: The softsign function is also antisymmetric, and can be differentiated. Its value range is between -1 and 1, and the output mean is close to 0, which overcomes the mean shift problem generated by sigmoid. Its curve is flatter than the tanh function. This makes the derivative decrease slower and can better solve the gradient disappearance problem.
ReLu activation function [2]: When the input value falls into the negative half axis, the gradient of the neuron is always 0, and no activation is activated for any data, which means that the neuron is dead. This will cause the calculation result to not converge. When the input value falls into the positive half axis, its derivative value is always 1, which can keep the gradient from falling and avoid the gradient disappearing problem. The value range of the ReLu function is non-negative, so the output mean is greater than 0, it has the disadvantage of the mean shift.
Although the ReLu function can effectively avoid the gradient disappearance, it also has some shortcomings. Since the result of the ReLu function is always 0 in the case of x < 0, the output mean is greater than 0, resulting in a mean shift phenomenon. The output value of the softsign function on the negative half axis is less than 0, and the derivative decreases slowly. Combining the softsign function with the right half of the ReLu function not only mitigates the mean shift problem, but also prevents the left half of the neurons from dying. Therefore, in the case of x < 0, the softsign function is used instead, and we obtain a new activation function SoLu, which is defined as follows (β is 0.25): The SoLu function combines the advantages of two commonly used activation functions, alleviating the gradient disappearance and output mean shift problems, and the robustness is also better.
The new activation function SoLu proposed in this paper is applied to the GRU instead of tanh. We propose the SoLu-GRU model to capture the temporal dependence. The model uses the SoLu to improve the output activation function of the GRU model in the hidden layer(shown in Figure 4).
In Figure 4, x t is the input of the current time neuron, h t is the output value of the current time neuron, h t−1 is the output of neurons in the previous moment, σ is the sigmoid function.
The construction steps are as follows: Step 1: Build an update gate.
W 1 is the weight of the update gate,h t−1 is the output of neurons in the previous moment. When the value of Z t is larger, the less information neurons can leave in the previous time, and the more information neurons can leave in the current time.
Step 2: Build a reset gate.
W 2 is the weight matrix of reset gate. When the value of r t is 0, it means that the useless information transmitted by the neuron at the previous moment is discarded, and only the input of the neuron at the current moment is reserved as the input.
Step 3: Build pending output section.We use Solu instead of Tanh, that is: W 3 is the weight vector of update door.
Step 4: Build the output section.

C. HYDROLOGIC PREDICTION
The ST-Hydro model proposed in this paper adds spatial features to time series prediction. GCN is used to capture spatial dependence, while GRU, which is improved by SoLu activation function, captures temporal features.

1) SPATIAL FEATURE EXTRACTION
We take a 2-layer GCN as an example: where X t is the feature matrix, A is the adjacency matrix, σ is a nonlinear transformation, we can use ReLu or other functions,Â represents self circulation,D is the degree matrix corresponding toÂ, W 0 is the weight of first layer, W 1 is the weight of the second layer.

2) LEARNING TEMPORAL FEATURES
Based on the spatial feature, we use the improved GRU to extract the temporal feature: where W z , and W r are the weights of the update gate and reset gate.

3) LOSS FUNCTION
We use Y t and Y t to donate the observation value and predicted value of runoff. The loss function is shown as follows: Then we get the predicted value through the fully connected layer: O(·) is a linear function, and W c is the weight.

IV. EXPERIMENTS AND RESULTS
In this section, we present the experiments to evaluate the effectiveness and timeliness of the proposed ST-Hydro framework. Specifically, we aim to answer the following evaluation questions: • EQ1: Can ST-Hydro improve the prediction performance of hydrological prediction by modeling the spatial and temporal features simutaneously?
• EQ2: How robust is ST-Hydro for hydrological prediction in one certain flood?
• EQ3: Whether the multi-graph convolutional network is better than single graph convolutional network?
• EQ4: Whether the proposed activation function is more effective for temporal modeling?

A. DATA DESCRIPTION
In this section, we introduce the data sets used in the experiment, the hydrological data sets of China Jiangxi Province. The experimental data includes runoff data, longitude and latitude data, and rainfall data of China Jiangxi Province.
To eliminate the missing data in stations, station relocation, stoppages of stations, we finally selected 139 stations as the research stations in Jiangxi Province(see in Figure 5). The discharge and rainfall data of summer floods from 1998 to 2010 are selected as the experimental data. Because part of the rainfall is incomplete, we use the linear interpolation method commonly used in hydrology to supplement the missing rainfall. We take each hydrological station as a node, the input adjacency matrix is the station relationship, and use the characteristic matrix to represent the historical runoff flow and rainfall information. We use 80 percent of the dataset as the training dataset and 20 percent as the testing dataset.

B. EVALUATION METRICS AND BASELINES
We use the widely-adopted evaluation metrics that are based on the flood forecasting indicators given in the ''Hydrological Information Forecasting Specification (GB/T 22482-2008)'', including flood peak flow, flood peak occurrence time and flood process, etc.

1) EVALUATION METRICS a: THE DETERMINISTIC COEFFICIENT
The deterministic coefficient (DC) represents the degree of coincidence between the predicted value and the measured value. The closer the value is to 1.0, the better the prediction accuracy is.The formula is: where y p (i) is the predicted value, y(i) is the measured value, y is the average value of the measured value, n is the number of samples.

b: THE ROOT MEAN SQUARE ERROR
The root mean square error (RMSE) represents the deviation between the predicted value and the actual value. The calculation formula is:

c: THE FLOOD PEAK ERROR
The flood peak error(error) represents the difference between the maximum value and the maximum value of the actual value in the flow forecast of a flood peak.
where y max p is the maximum value of the predicted value, y max is the maximum value of the measured value.

d: THE TIME DIFFERENCE OF THE FLOOD PEAK
The time difference of the peak (time) represents the difference between the moment when the maximum value of the predicted flow value in the flood occurs and the moment when the maximum value of the true value appears.
where t 1 is the time when the predicted maximum value appears, and t 2 is the time when the real maximum value appears.

2) BASELINES
Here we introduce some baselines.

Support vector machine(SVM) [39] Support Vector Machine
(SVM) is a generalized linear classifier that performs binary classification of data in a supervised learning manner. In recent years, SVM has been used in many fields, including hydrology [15].

b: LONG SHORT-TERM MEMORY
Long Short-Term Memory (LSTM) [23] is a kind of time-recurrent neural network, which is specially designed to solve the long-term dependency problem of general Recurrent Neural Network. Many researches [27], [35], [47] find the potential of the LSTM for hydrological modelling applications.

c: GATED RECURRENT UNIT
Gated recurrent unit (GRU) was proposed by Chung et al. [14] to make each recurrent unit to adaptively capture dependencies of different time scales. Because of its simplicity and low computational overhead, it was quickly applied to many fields and also achieved good results in the hydrological field [20], [29].

C. EVALUATING OVERALL EFFECTIVENESS
To answer EQ1, we conduct the prediction performance comparison of different models. The Poyang Lake Basin in Jiangxi Province is the region with the most frequent floods and floods in the middle and lower reaches of the Yangtze River. The analysis of the hydrological data in this basin can provide strong support for the prevention and control of flood disasters. We choose the waizhou station in the Poyang Lake Basin as the display of hydrologic prediction results.
Training parameters of SVM model are configured as follows: penalty coefficient C interval is [0. The results are shown in Figure 6 and Table 1. Figure 6 is the runoff prediction results of WaiZhou station, and Table 1 shows the evaluation metric values of different models in the prediction.
It can be seen from the Figure 6 that the prediction result of SVM model is more jittery, and the prediction curve obtained is not as stable as LSTM, GRU, and ST-Hydro. The prediction results of LSTM and GRU are similar,ST-Hydrois better than GRU, and is more stable than LSTM.
From Table 1, we can see that the 1 hour runoff forecast is relatively the most accurate, and the later the forecast time is, the lower the accuracy is. ST-Hydro is the best model in all different forecast periods. The performance of LSTM and GRU is slightly better than that of GRU, but the difference between them is not significant. SVM performed the worst. Through the evaluation metric of the prediction results in the table, it can be seen that the DC coefficient of ST-Hydro is the highest, the difference between LSTM and GRU is not significant, and SVM is slightly worse. Therefore, the ST-Hydro model is the best in the overall prediction results.
This may be because GCN captures the spatial characteristics of runoff flow, adds some rainfall of several related stations in the space to the prediction, and the function proposed in this paper avoids the problems of gradient disappearance and mean shift.

D. EVALUATING EFFECTIVENESS OF ONE FLOOD
To answer EQ2, we choose one flood in Waizhou station to show the result details of different models. The results are compared in Figure 7 and Table 2. Figure 7 is one flood prediction of different models, Table 2 is the results of the same flood prediction for different future periods.
In the process of hydrological prediction, the time of flood peak appears is very important, that is, the peak time. If the predicted peak time is ahead of the real peak time, it can prevent the occurrence of flood in advance and prepare for the flood control work. The lag of predicted peak time will bring some difficulties and challenges to the flood control work.
From the Figure 7 and Table 2, we can see that the effect of SVM is the worst and the peak time lag in the data set, even 5 hours' prediction lag in Waizhou station. LSTM, GRU and ST-Hydro can reach the peak in time or one hour in advance, which provides guidance for flood control operation. The peak error of ST-Hydro is the smallest and the accuracy is the highest, so ST-Hydro can predict more accurately in all aspects.
We also can see from the Table 2 that SVM always has the problem of delay in peak time prediction. Both LSTM and GRU can reach the peak value exactly or one hour in advance. The most surprising thing is that ST-Hydro is able to reach the peak point two hour ahead of time in the 5hours prediction, which is very meaningful, and the peak error of ST-Hydro is smaller. So the effect of ST-Hydro is better.

E. COMPARING DIFFERENT GRAPHS PERFORMANCE
In the experiments, to answer EQ3, we predict the same data by using different graphs in capturing spatial feature. Here we use the data set of Jiangxi Province to compare the hydrological prediction models established by different maps. We use hydraulic distance, Euclidean distance, correlation and fusion graphs for experiments. The discharge and rainfall data of summer flood from 2000 to 2010 are selected as the experimental data, and we take the average RMSE and DC of 15 stations for comparison. The result is shown in the Figure 8. From Figure 8, we can see that the model using Euclidean distance graph has the worst performance, it may be because the Euclidean distance mostly shows the difference of rainfall processes, but we have added some rainfall information as the input in the whole prediction model. The models using hydraulic distance and correlation have the similar effect, and the model using correlation is slightly better. The model using fusion graph is the best.

F. ASSESSING ACTIVATION FUNCTIONS
In the experiments, to answer EQ4, Tanh, softsign, ReLu and SoLu are respectively used as activation functions to predict.
Because the training results do not converge when sigmoid function is used as activation function of the network, the experimental results are compared by Tanh, softsign, ReLu and SoLu.
It can be seen in Figure 9 that under the condition of 50 iterations, the training convergence time of SoLu activation function is the shortest and and the training error is the lowest. The order of error values with different activation functions is SoLu < ReLu < softsign < Tanh. Therefore, using SoLu as the activation function can not only effectively shorten the convergence time, but also reduce the training error.

V. CONCLUSION AND FUTURE WORK
In this paper, we propose a novel river discharge prediction model ST-Hydro which incorporates network-driven method Graph Convolutional Networks (GCN) with the data-driven method improved Gated Recurrent Unit (GRU). GCN is used to capture the spatial features and GRU is used to capture the temporal features of the discharge samples. We also use multi-graph to construct the GCN and use proposed activation function SoLu to improve Gated Recurrent Unit. Experiments show that the ST-Hydro model outperforms several state-ofthe-art baselines.
There are several interesting directions need to be explored. For example, when building hydrological topological structure map, seasonal factors can be considered, different maps can be built in different time periods. In addition, we will consider using GCN to capture spatial features for the analysis of ungaged basins. QUN   HUAN LIU (Fellow, IEEE) is currently a Professor of computer science and engineering with Arizona State University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the Faculty with the National University of Singapore. His research interests are in data mining, machine learning, social computing, and artificial intelligence, investigating interdisciplinary problems that arise in many real-world, data-intensive applications with high-dimensional data of disparate forms, such as social media. He is also a coauthor of Social Media Mining: An Introduction (Cambridge University Press). He is also a Founding Organizer of the International Conference Series on Social Computing, Behavioral-Cultural Modeling, and Prediction, Field Chief Editor of Frontiers in Big Data and its Specialty Chief Editor in Data Mining and Management. He is also a Fellow of ACM, AAAI, and AAAS. More can be found about him at http://www.public.asu.edu/~huanliu.