Daily Water Flow Forecasting via Coupling Between SMAP and Deep Learning

Hydrological models are essential tools to forecast daily water resources’ availability, which are used to plan the short-term electrical systems’ operation. However, there is a trade-off when choosing a given model. Complex models may provide good results depending on very complicated analytical and optimization procedures beyond sophisticated data, whereas simpler models offer reasonable results with much more amenable tuning approaches. To improve the quality of simpler models this article proposes the coupling of the Soil Moisture Accounting Procedure (SMAP) hydrological model with a Deep Learning architecture based on Conv3D-LSTM. In the proposed methodology, the SMAP is first optimized to obtain general parameters of the hydrographic basin. This optimized model’s output is used as input to the Conv3D-LSTM estimator to provide the final results. This gray estimator model can generate fast and accurate results. Studies whit the goal of forecast the natural flow seven days ahead are carried out for two large Brazilian hydroelectric plants to validate the method. The results obtained by the architecture are better than those obtained with decoupled techniques.


I. INTRODUCTION
The generation of electric energy through hydropower plants is one of the main forms of generation applied in the world. This source is the most used in countries that have considerable water potential, due to its low operating cost, when compared to other sources [1].
Brazil is one of the countries that has an electrical matrix composed predominantly of hydropower plants. In this case, the thermoelectric plants operate in order to complement the generation deficit, when the hydropower generation and alternative sources do not supply the demand of the electric system. Thus, it is necessary to carry out studies to ensure a better use of this source and to reduce the dependence on generation by thermoelectric plants, which have higher operating costs.
In the planning stages of the operation of electrical systems composed of hydroelectric plants, it is necessary to know the The associate editor coordinating the review of this manuscript and approving it for publication was F. R. Islam . availability of water resources for energy production days ahead. Thus, it is important to predict the amount of water that will be available and, consequently, the energy that can be extracted.
Based on climate forecast data, forecasts of affluent flow in rivers can be obtained through mathematical models that characterize the drainage basin in which the plant is inserted or through application of machine learning techniques.
From the point of view of hydroelectric plants, a good forecast of affluence and good planning can provide: greater generation at times when energy is more expensive; less spillage; optimization of the dispatch of the plant's machines; safety of the dam structure.
In Brazil, the electrical system is interconnected and managed by the National System Operator (ONS -Operador Nacional do Sistema), whose objective is the optimization and planning of energy generation and transmission, considering different time horizons.
Therefore, the definitions of the operation are developed by ONS and one of the challenges is the determination of future flows according to the hydrological characteristics that influence the dynamics of the water along a basin: soil, relief, temperature, evapotranspiration, rain and moisture.
One of the main aspects that must be considered in the planning is the precipitation that increases the water level of the reservoirs, whereas the drought leads to an increase in generation by the thermoelectric plants due to a lower availability of generation by the hydropower plants.
Mathematical models have been studied and developed in order to improve the way a watershed is represented with the objective that the model presents a behavior close to the real one. Thus, knowing future climate forecast data and the characteristics of the basin, it is possible to obtain an approximation of the volume of water that will be available for the dispatch of the plant days ahead.
The objective and contribution of this work is the application of the Soil Moisture Accounting Procedure (SMAP) hydrological model with DNNs. The developed architecture produces a forecast of the natural inflow up to 7 days ahead and is based on a coupling between CNN in 3-Dimensions (Conv3D) and LSTM.
SMAP will be applied for processing soil data and generating input data for DNN. The SMAP model was chosen, since it is the model used by ONS for planning the entire Brazilian electrical system and is easy to be implemented. In addition, it is applied to several watersheds, which directly influences the planning, operation and price of energy throughout the national territory. The proposed methodology will be validated for two Brazilian regions where the HPP Peixe Angical (498 MW) and HPP Mascarenhas (198 MW) are located.
In addition to the main contribution of this work, which is an efficient architecture for predicting water flow, some important contributions are presented below, which were also developed throughout this article and which should be highlighted: • application of precipitation data for the MERGE/CPTEC product, instead of the application of data observed by pluviometric stations; • calibration structure of the SMAP model considering historical data for the representation of watersheds; • local SMAP calibration at each forecast stage with a refinement of the parameters that provide a better characterization of the watershed in moments close to the forecast. This calibration is performed by the adapted Twiddle algorithm; • precipitation data processing in grids through convolutional layers in three dimensions, aiming at a compaction of information and a space-time compensation of each measurement.
Finally, this article contributes, not only to studies that aim to make better use of HPP, but also to studies that aim to predict the availability of water resources in river basins.
The remainder of the paper is divided as follows. Section II shows related works and the state of the art regarding hydrological predictions. Section III describes the proposed methodology and its respective steps. Section IV presents the data considered for the simulations and the respective results and analyzes. Finally, Section V presents a summary of the research conclusions and possible improvements for future validations.

II. RELATED WORKS
In recent years, several studies have been developed in order to predict hydrological behavior. In these studies, the learning machine techniques are the most used, mainly in works that have the objective of predicting river flow [2], [3] In [4], water availability was predicted by forecasting the water level with the application of the Variable Infiltration Capacity (VIC) hydrological model for the Mekong River in Asia. In [5], a Feed Forward Neural Network (FFNN) was applied and compared with Long Short-Term Memory (LSTM). Its aim was also to forecast water levels, but applied to Polish lakes. In [6], Multilayer Perceptron (MLP) was applied for the same purpose, but for a navigable river and, in [7], the authors apply the level forecast for a HPP and point out that this study also contributes to ensuring the safety of the dam structures.
In addition to the water level, studies to predict precipitation are also very common. In [8] artificial neural networks (ANNs) and the Support Vector Machine (SVM) were applied, together with wavelet transformation for preprocessing of the input data.
Rasouli et al. [9] also apply Deep Learning (DL) techniques in order to predict river flow up to 7 days ahead. The techniques used are SVR, Bayesian Neural Network (BNN), Gaussian Process (GP) and Multiple Linear Regression (MLR) and are compared with respect to efficiency in forecasting.
Yaseen et al. [3] present an extensive review of the literature and demonstrate the applicability of DL techniques for predicting water flow in rivers and other hydrological aspects. In addition, they apply the SVR to compare to an extreme learning machine (ELM) model in predicting water flow in a Malaysian river.
The available rivers water flow is highly dependent on climatic aspects and must be considered during the forecasting process. In [10], Rokaya et al. show that the freezing and thawing of rivers impact the observed inflows in cold regions. For the forecast considering these aspects, they applied artificial neural networks (ANN) and showed that this tool is also widely used when considering the freezing effects of rivers.
In [11], water flow forecasting was also applied considering low temperature regions. However, the authors' approach considers snow melting from winter season and applies a sensitivity analysis of the initialization of soil moisture parameters.
The application of a Deep Beliefe Network (DBN) with variational mode decomposition (VMD) was the strategy applied in [12] in order to predict the flow of the Han River basin, in China. For the DBN calibration, the improved particle swarm optimization (IPSO) was used and the forecast was made considering 1, 3, 5 and 7 days ahead. The same region was the subject of a study in [13], in which the LSTM was applied with the VMD.
In [14], an eigenmodel was used to model the behavior of flows in a basin as a function of precipitation and evapotranspiration. The model parameters were estimated based on probability density functions using a Markov chain Monte Carlo method.
Belvederesi et al. [15] apply the single-input sequential adaptive neuro-fuzzy inference system (ANFIS) to predict flow in a Canadian river. In [16], the authors apply a Convolution Regression based on Machine Learning (CRML) to predict the water flow of a river located in a Chinese hydrological basin.
Different mathematical models that represent a watershed have been applied in some studies. In [17] the following models are compared for river flow forecasting: Sacramento Soil Moisture Accounting (SAC-SMA), modèle du Génie Rural à 4 paramètres Journalier (GR4J), McMaster University Hydrologiska Byrans Vattenbalansavdelning MAC-HBV, the Hydrologic Engineering Center Hydrologic Modeling System (HEC-HMS) and the University of Waterloo Flood Forecasting System (WAT-FLOOD). The Dynamically Dimensioned Search (DDS) method was used for the calibration of all models.
In [18], the Soil Moisture Accounting and Routine (SMAR) was compared to the following DL strategies in relation to the flow forecasting accuracy of the Heihe River in China: back-propagation neural network (BPNN), general regression neural network (GRNN) and a rotated GRNN (RGRNN) model.
Other approaches have also been the subject of studies for the prediction of flows. The authors of [19], [20] apply a rain-runoff model to determine soil status for short-term water flow forecasting with strategies that apply the ensemble Kalman filter (EnKF) to assimilate soil data. Reference [21] is also an example of the application of this method in problems of forecasting hydrological aspects, but with a focus on water quality. The authors of [22] apply EnKF to assimilate data for river flow forecasting using the Soil and Water Assessment Tool (SWAT) hydrological model.
In [23], the Mallow's coefficient was used with the M5Tree and MARS preprocessing data models for river flow forecasting. In [24], a technique based on chaos theory was applied in conjunction with a variation of genetic programming, with the same objective.
The application of river flow forecasting for regions where there are HPP can lead to a better use of the energy that can be generated. These studies also allow planning for the dispatch of machines, in addition to guaranteeing a future perspective regarding the price of energy for commercialization [25].
In [26], ANNs were applied with the objective of predicting the water flow 3 days ahead for a river in Slovenia, where a small capacity HPP is installed. In [27], an approach was proposed in which a probability density function was obtained from the Folker-Planck-Kolmogorov (FPK) equation resolution. The main objective was to obtain the forecast of the monthly inflows of the Betania Hydropower reservoir, in a river in Colombia.
The coupling of techniques has been used in several studies in order to develop models for predicting hydrological aspects. The idea of this coupling is to obtain a greater forecasting capacity instead of applying the techniques separately. In [28], the predictability of droughts in different regions of China was investigated using a series of statistical, dynamic and hybrid models. Statistical and Dynamic models were coupled, through Bayesian model averaging that achieved the best results.
Zhou et al. [29] developed a forecast 4 days ahead of the inflow of the Three Gorges HPP reservoir. In this study, the unscented Kalman Filter (UKF) was applied with two DL techniques: BPNN and a non-linear auto-regressive with exogenous inputs recurrent neural network (NARX).
Machine learning techniques are widely used in isolation and in these models where there are couplings of different tools. In [30], the authors applied machine-learning quantile regression algorithms for probabilistic hydrological post-processing of flow calculated by the GR4J model.
A Brazilian hydrological basin was the subject of a study in [31], in which water flow was forecast in a river in which a 396 MW HPP is installed. The strategy used by the authors consisted of a forecast through stochastic optimization.
The Soil Moisture Accounting Procedure (SMAP) rain-flow model is one of the tools used by ONS to create flow scenarios for Brazilian basins. This deterministic and centralized model directly influences the planning of the operation of the electrical system and the dynamics of the Brazilian energy market.
In [32] the SMAP model was applied for the Paraopeba River region and the calibration was performed using the Dynamically Dimensioned Search (DDS) and Shuffled Complex Evolution (SCE) algorithms.
In [33] the Genetic Algorithm -AG -was applied to calibrate the parameters of the São Francisco River basin. The GA was identified as a robust tool with the capacity to obtain calibrations with a high degree of adherence between the observed and calculated flow curves.
In [34], the SMAP model was applied in a region of the Brazilian Northeast semiarid. The model calibration was performed using versions of AG, PSO and hybrid models with the Nelder Mead algorithm. In [35], the authors applied the model for the monthly forecast of water flows for HPP Água Vermelha. The model is applied in [36], to make monthly rain forecasts in the Três Marias basin (Brazil). The SMAP model can be used in short-term forecasts in its daily version. In [37] SMAP was used with the Weather Research and Forecasting (WRF) model to forecast floods in Rio de Janeiro.
The authors in [38] coupled a conceptual hydrological model in series with a machine learning tool. The research proposes a hybrid model through a post-processing of the state variables of the hydrological model named Tank Model through the SVM.
As can be seen in the state-of-the-art related papers, most of the work regarding hydrological prediction focus or on hydrological models or learning strategies. However, as hydrological models are mathematical representation of a real and complex systems they provide just a good approximation of the problem.
On the other hand, learning strategies tries to map the entire system by co-relating the input/output. In this last approach, the size and complexity of the problem generates a huge solution space to be mapped and, as consequence, it is very difficult to any approach to have the necessary amount of data to proceed with a correct estimation.
Nevertheless, the idea presented by this article is to use the generalizability of deep learning approaches in smaller solution spaces. The hypothesis is that simpler hydrological models can have decent representation of a given region. This representation can explain and respond for several situations reducing the non mapped solution space. This new reduced space may be mapped by powerful deep learning approaches. This coupled strategy provides fast training models with excellent results.

III. PROJECT DEVELOPMENT
This work proposes the application of the SMAP model to assess the state of the soil reservoirs, which will be considered as input to the DNN. Based on a trained DNN and a well-calibrated SMAP model, the flow in the basin can be calculated as a function of precipitation and evapotranspiration. In the following sections, the methods applied and the structure of the developed tool will be described.

A. THE SMAP MODEL
The SMAP (Soil Moisture Accounting Procedure) model was originally developed by Lopes (1982) [39] which describes mathematically, in a simplified way, the behavior of watersheds. In this model, the water flow is calculated through flows from reservoirs that characterize different elements of the basin.
As the model was applied in studies, modifications were proposed in order to make it more suitable for a better characterization of the behavior of the water. In this work, the version used by ONS for planning studies of the Brazilian electrical system will be applied [40].
One of the input data for the SMAP model is the average rainfall observed in the basin for the period under analysis. The total rainfall in the basin is calculated as the average of the measurements of pluviometric stations located in the region. as follow: in which Pb(t) is the average precipitation observed in the basin, in the instant t (mm/day); P1(t), P2(t), . . . , Pn(t) are the precipitation observed at the stations pluviometric considered in the basin; and ke 1 ; ke 2 ; . . . ; ke n are the spatial representation coefficients of each pluviometric station and respect the following condition: Then, the model calculates the average precipitation of the day t (Pd(t)), which is defined by a weighted average of the observed precipitation of the days close to t, as shown in the following equation: in which kt −n , kt −n+1 , kt 0 , kt +1 and kt ( + 2) are the time representation coefficients and n is the number of days in the past that will be considered in this time weighting.
Finally, the precipitation considered P(t) is obtained by multiplying Pd(t) by the factor Pcof , as shown in (4). Pcof must be calibrated along with the other parameters, in order to guarantee the water balance.
However, rainfall data from rainfall stations are susceptible to measurement errors and failure to acquire and record data. In addition, their positions may not be the most appropriate from the point of view of the basin. Therefore, in this study, the precipitation data considered will be obtained by the product MERGE/CPTEC [41].
The precipitation provided by the MERGE product is freely accessible and is made available by the Brazilian National Institute for Space Research (INPE) [43]. The data are based on observations from GPM satellite and rainfall stations. In this product, the world is divided into small 10 km grid cells and daily average precipitation data is provided for the grid center. Through geographic coordinates, data related to regions of interest can be selected. This observed precipitation presents one more advantage in relation to the data obtained through stations, since the meteorological forecast products also present data in this format. Thus, the training and forecasting stages will present the same input data structure. As the MERGE product grids occupy the entire basin area in a uniform manner, spatial weighting is not necessary as is the case when data from sparsely located rainfall stations are used. Thus, this study uses the average of precipitation data from the grid located in the region of interest as input to the SMAP model, as it is a more robust approach and less susceptible to errors.
In addition to precipitation data, SMAP also requires daily evapotranspiration information to characterize the basin and flow calculations. In this work, the Hargreaves and Samani [42] method will be applied. In this methodology, only the maximum and minimum values of daily temperatures, which were obtained from the National Institute of Meteorology (INMET) [44], are needed.
The daily SMAP model applied by ONS considers the representation of the basin by four reservoirs (Rsoil, Rsup, Rsup2 e Rsub). The model for this version is shown in Figure 1, The model of this version is shown in figure 1, in which the necessary parameters for the representation of the basin are also identified.
Potential evapotranspiration is responsible for reducing the Rsoil and Rsup2 reservoirs. After calculating the Eto evapotranspiration based on the described methodology, the values of Epr and Epmarg for the respective reservoirs are obtained as shown in the following expressions: where Ecof e Ecof 2 are adjustment coefficients that must be calibrated.
At the beginning of the simulation, it is necessary to define the initial states of the reservoir. Thus, the values of the basic flow Eb(0), the moisture content Tu(0) and the superficial flow Sup(0) must be defined, to obtain the initial states of the reservoirs Rsoil, Rsub and Rsup, as shown in the following equations.
Rsup 2 (0) = 0 (10) in which Str is a constant that must be calibrated and AD is the drainage area of the watershed. The constants kk e k2 are dependent on parameters KKt and K 2t, as shown in the following equations: Once all the parameters described have been defined, the iterative process of defining daily reservoir flows and states begins. The first parameter to be defined is the moisture content Tu(t) as a function of the state of the Rsoil reservoir, as shown in the following equation: Then, the runoff on the day t (Es(t) ) is obtained, which will be different from zero, if the precipitation observed on the day t is greater than the soil abstraction Ai. The following expression shows the calculation of Es(t) in this case: The actual evapotranspiration Er(t) will be equal to the potential Epr(t) , if the difference Precp(t) −Es(t) is greater than the potential evapotranspiration Epr(t) . Otherwise, Er(t) will be calculated as shown in the following expression: The underground recharge Rec(t) is a flow that models the water being absorbed by the soil and displacing to underground regions, represented by the underground reservoir Rsub. Rec(t) is zero if the state of the Rsoil(t) soil reservoir is less than the product between field capacity Capc and soil 204664 VOLUME 8, 2020 saturation Str. If Rec(t) is higher than the product, Rec(t) is obtained as shown in the following equation: (16) in which Crec is an underground recharge constant and must also be calibrated. The excess water that is stored on the banks of the river is represented by the Rsup2 reservoir and is characterized as an overflow from the Rsup reservoir. To calculate this Marg(t) overflow, it must be checked whether Rsup(t) is greater than the H 1 height of Rsup. If so, Marg(t) is calculated as shown in the following equation, otherwise Marg(t) is set to zero: in which H 1 it is included among the variables that must be calibrated and k1 is a dependent constant on K 1t, as shown in the following expression: Once the states of the reservoirs and the characteristic parameters of the basin are defined at the time t, the runoff from each reservoir that compose the water flow calculated by the model are calculated. The flow Ed(t) is related to the reservoir Rsup, as shown in Figure 1, and is obtained through the following expression: (19) where H must be calibrated and represents a minimum height of the Rsup reservoir for the runoff Ed3(t) to be different from zero. k2 is a constant dependent on the K 2t parameter, as shown in the following equation: The flow Ed3(t) represents an existing flow when the surface reservoir Rsup(t) is greater than H , which represents a high storage, characteristic of large water flows. Thus, if Rsup(t) is greater than H , the flow Ed3(t) is obtained from the following expression and, otherwise, Ed3(t) is set to zero: (21) in which k2t2 is a dependent constant on K 2t2, as shown in the following expression: The flows Ed2(t) and Eb(t) , from the reservoirs Rsup2 and Rsub, respectively, can be obtained by the following expressions: where k3 is a constant obtained by the following expression: Thus, the water flow Q(t) is calculated at time t using the following expression: (26) in which 86.4 is applied for unit conversion. Finally, the reservoir states for the next instant are calculated as shown in the following expressions:

B. HYBRID SMAP -DEEP LEARNING MODEL
Deep neural networks (DNNs) have a great capacity for modeling and pattern recognition. The main applications of DNNs in this area are for short-term water flow predictions and have observed inflows and rain predictions as inputs.
In this study, the integrative hydrological model SMAP was used in parallel to provide an estimate of the distribution of water in the hydrographic basin, in order to discriminate possible redundancies related to the observed inflow values. From the point of view of architecture training, the input variables obtained by SMAP contribute substantially. The training process of a rain-flow model based on Neural Networks consists of an optimization algorithm covering the solution region with several local minimums. The insertion of a mathematical model that partially characterizes the problem reduces the solution region and improves the training stage.
The dynamic characteristics of the hydrographic basin directly impact the observed flow and its behavior due to the precipitation in the region. Using these state variables of the SMAP model as input to the DNN gives the network better information about the basin and improves the water flow forecasting process.
This study does not use the calculated flow of the SMAP model as input data for DNN. The state variables of the SMAP model, which characterize the basin at each time t, are sent to DNN. Hence, the post-processing will be done in relation to the basin conditions and not to the observed flow, as it is commonly applied in coupling studies between hydrological models and Machine Learning techniques.
The first step of the proposed methodology is the calibration of the SMAP model based on a long history of inflow and precipitation. At the end of this process, the parameters obtained provide a good modeling of the watershed and a VOLUME 8, 2020 good approximation between the calculated and the observed flow. It is a step that must be performed only once.
The second step is the local calibration of the SMAP, which corrects integrative errors and adjusts the parameters obtained in the previous step in order to improve the behavior of the basin over a period of 14 days and define the states of the reservoirs at the end of this period. In this step, the observed rainfall is also adjusted throughout the optimization process. Performing an iterative process, the reservoir states are obtained through an optimization for each day in the history. These states will be used as a training set for DNN.
The third stage is DNN training, which will be held for each day in the history. The observed rainfall data from 7 days ahead, the soil reservoir states and the observed flow are the input data. Water flows of 7 days ahead are the output data.
The SMAP model dynamically models the distribution of water in different soil layers, whereas the DNN has great potential for forecasting time series. The research applies the two models in order to take advantage of their intrinsic characteristics in a complementary way. Figure 2 presents an overview of the methodology, and the details of each step will be described in the following sections.

1) SMAP CALIBRATION
The first step is to obtain the parameters of the SMAP model from a set of consolidated historical data of rain, flow and evapotranspiration. Some parameters of the SMAP model are dependent on geomorphological characteristics, which define specific values or intervals for each region of study. In this study, all model parameters are calibrated through the optimization process and the geomorphological characteristics are considered when defining their limits.
For this calibration, the differential evolution algorithm was used, a global search solver available in the SciPy library [46]. The default configuration of the solver was applied for this step in order to minimize the Mean Absolute Error (MAE) between the observed and calculated flow.
The behavior of the basin is dynamic and this characteristic can lead to errors when considering a model with fixed parameters. In order to adapt the method to this characteristic, small adjustments are applied to the previously calibrated parameters for a specific period. These adjustments are made by the Twiddle algorithm [47], [48], a simple implementation method, based on local calibration and that does not require extra information about the function that must be minimized.
In addition to adjusting the model parameters, adjustments are also made to the observed precipitation data. Thus, the calculated flows will be closer to the observed flows and, therefore, the reservoir states will be better defined. Os integrative errors of the SMAP model With this local calibration, the integrative errors of the SMAP model are also mitigated. Thus, the application of simplified techniques for the estimation of daily evapotranspiration and precipitation data is sufficient for a good performance in the forecast and does not represent a limitation of the developed tool. This step is illustrated by Figure 3. Thus, considering the optimization applied for a day t, the goal of Twiddle is to improve the assertiveness between the flows observed and calculated for the period of 14 days before. As mentioned, the applied variations are smooth, so that the parameters obtained for the period are still close to those obtained in the previous step, due to the multimodal characteristic of the SMAP model. The variations allowed in this step are as follows: • precipitation data Precp: ±20% of the observed values. This modification aims to correct possible measurement errors and ensure water balance; • watershed parameters: ±5% of the values obtained in the calibration of the first stage.
• initial states of the reservoirs: ±20% of the values obtained for this day in the last iteration. To apply Twiddle in the calibration of the SMAP model, some modifications were implemented in the original algorithm [47], [48], as shown below: 1) A scaling was performed based on the values of lower and upper bounds (lb, ub) of each variable. Thus, the variables will be limited between 0 and 1. 2) The update rate for each variable is restricted between dxmin and dxmax. Thus, there is no dissipation or explosion of any rate. The pseudo-code is shown in Algorithm 1. x 0 is the initial solution, x is the current scaled solution, dx is the array of x variations, dxmin and dxmax are set to 10 −4 and 10 −2 , respectively, and fob() is the function that calculates the MAE between the observed and calculated flow through the parameters in x.

2) DNN INPUT DATA
After the calibration step of the SMAP model, the states of the reservoirs R soil(t) , R sub(t) , R sup2(t) and R sup(t) obtained for each instant t will be used as input data for a DNN. The forecasting step requires that the DNN input data be the observed flow rate, as they are normally applied in post-processing studies using Deep Learning techniques. In addition, the purpose of this work is that other dynamic characteristics of the basin are also used as input of the DNN, in order to better define the conditions in which the observed water flow occurred.
Therefore, at each forecast step, a local calibration of the SMAP model must occur in order to obtain the state variables Rsoil(t), Rsub(t), Rsup2(t) and Rsup(t) for time t. This calibration aims to guarantee a better water balance of the model, a smaller error of these dynamic characteristics of the basin and an attenuation of integrative errors of the model.
In addition to the data obtained by SMAP, the DNN also has input data: the observed flow, which will be the starting point for the forecast curve; the daily precipitation data, in matrix format, with the concatenation of the data from the previous day t − 1 to the data of seven days ahead t + 7. Precipitation data from the previous day is considered, since the dynamics of the basin at a given time may be associated with the precipitation observed at previous times.
The MERGE product data provides daily rainfall in a matrix format, where each value is associated with rain in the center of each cell. For the treatment of these data, the first step is to identify the cells that are in the region of interest through the latitude and longitude coordinates. Thus, a cut is made in the MERGE data based on the minimum and maximum coordinates of the basin. Then, the squares that are outside the contour of the basin are identified. These boxes are not of interest to the study and, therefore, their values will be changed to 0 so that DNN ignores these points in the input matrix. Figure 4 shows the precipitation data entry of the model, forming a rectangular arrangement in three dimensions. In the methodology validation stage, data of observed precipitation will also be considered for the calculation of the flow 7 days ahead and will be compared to the flow observed in the basin in the same period. However, the application of the method as a forecasting tool must be based on forecasting data. Using a network that processes data in the format provided by MERGE represents an advantage over the application of telemetric station data, since the forecast data is also provided in the same format.

3) DNN ARCHITECTURE
As previously described, after the treatment of precipitation data through geographic coordinates, a set of matrix data is generated and will be applied as input data for the developed architecture.
Due to the large size of the precipitation data, three 3D convolutional layers (Conv3D) were used interspersed with VOLUME 8, 2020 FIGURE 5. Scheme of the developed architecture. Input data: rainfall data and soil reservoir states. The architecture has 3 Conv3D + Pooling layers and a TimeDistributed Flatten layer for processing precipitation data. The LSTM layer processes the data so that a DNN can forecast the runoff data 7 days ahead.
pooling layers. In this step, the objective is to gradually reduce the size of the data and filter the relevant information. In addition, the spatial and temporal disposition of each grid is also considered in this processing, as there are clusters of data located nearby.
A TimeDistributed Flatten layer is applied to the output of the last Conv3D, in order to temporally separate the results obtained from the previous layers. It is important to highlight that this processing on the precipitation data is equivalent to the procedure applied by the SMAP model described in Equations 1 to 3. That is, the layers described process the temporal and spatial differences in the model's response.
The output from the TimeDistributed Flatten layer will be applied as input data to an LSTM layer, for temporal processing. In addition, the reservoir states at the time t now of execution are also input data for this layer. The applied LSTM layer is the current state-of-the-art. It is a very applied tool for forecasting time series and widely described and applied in other works [4], [5], [13].
The processing of data by the LSTM layer obtains results that will be applied as input to a DNN layer, the results of which are the expected flows 7 days ahead. Figure 5 illustrates the architecture described.
The Relu activation function was applied in all layers due to its simplicity and speed in training. The parameters used for each layer were defined based on empirical and conceptual studies, being dependent on the size of the basin. Thus, the neuron and filter data will be presented in Section III.

4) TRAINING
The cross-validation method was used with the split series series [45] for training and validation of the developed architecture. The training, validation and test sets are divided into chronological order. If the sets were randomly divided, the integrative characteristics and the time dependence of the system could reduce the tool's forecasting potential.
In the Time Series Split method, the training step starts with a portion of the data. The data is added with more recent samples at each run and the neural network is retrained. At the end of the process, the average performance of the test sets for all iterations is obtained.
From the data history, the network is always validated with the last 14 samples, whereas the rest of the history is used for training. The number of samples in the validation was defined empirically and with the purpose of reducing the distance between the moment of the forecast and the training set.
At the end of each training and validation, a test is applied at the instant t in order to predict flows up to 7 days ahead. In this study, the simulations presented aim at a comparison between the flow predicted by the developed architecture and that observed for the same period. Thus, the execution of the tool will be carried out in such a way as to foresee intervals of 1 week, until data for comparison are obtained over the entire specified period.
The size of the training history will be increased by 7 dice, each run, whereas the validation and test intervals will be shifted. Figure 6 illustrates the behavior of the training, validation and test intervals throughout the runs.
The Adaptive Moment Estimation (Adam) method was the optimizer used for training the architecture. The metric used for validation was also the Mean Absolute Error (MAE), as in the SMAP calibration.
In each iteration, the DNN obtained in the previous iteration is retrained with a patience of 10 epochs, that is, the process is interrupted after 10 epochs without improvements on the validation set.

C. MODELS VALIDATION
The comparison between the results obtained by the proposed architecture and the flows observed in the region will be made using statistical metrics. The metrics considered are as follows: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Nash-Sutcliffe coefficient of efficiency (NSE), Root Mean Square Error (RMSE) and Pearson's correlation coefficient (r).
The coefficient NSE is the most used to evaluate the performance of hydrological models. The r coefficient indicates the correlation between the observed and predicted flow data, the MAE represents the average of the absolute error whereas the MAPE represents the average of the absolute error relative to the observed, in percentage. Finally, the RMSE represents an average quadratic error, which weighs the errors in order to intensify those that are higher. The statistical metrics described are calculated as shown in the following expressions: where Q represents the flow forecast, Qobs the observed flow, Q and Q obs the respective averages and N the amount of data. The metrics considered have different limits and optimal values as shown in Table 1. For the calibration of the SMAP model and training of the architecture, the MAE was applied, as this metric is not influenced by drought periods such as MAPE nor by flood periods such as RMSE. However, all metrics are used to compare the performance of the forecast and calibration.

IV. RESULTS AND DISCUSSIONS A. CASE STUDIES DESCRIPTION
To validate the developed architecture, two Brazilian hydrographic basins were considered in which HPP are installed: HPP Peixe Angical and HPP Mascarenhas. The objective is to apply the methodology for predicting daily unimpaired inflow over its reservoirs. HPP Mascarenhas is located in the state of Espirito Santo in a region with a tropical coastal climate. It is part of the Rio Doce basin, and its drainage area has approximately 75, 500 km 2 . Its operation started in 1974 and has an installed capacity of 198 MW. HPP Peixe Angical is located in the state of Tocantins, in a predominantly semi-arid region. It is part of the Tocantins -Araguaia basin, with a drainage area of approximately 121, 500 km 2 . Its operation started in 2002 and has an installed capacity of 498.8 MW. Figure 7 shows the drainage areas of each basin and their respective locations. The inflow of each HPP is obtained from the National Water Agency (ANA -Agência Nacional de Águas) [49].
Each grid of precipitation data provided by the MERGE product is a square of approximately 10 km from the side. Thus, a large amount of rainfall data is identified within each basin, due to the territorial extent of the drainage area. 1,116 daily precipitation data are identified for the region of HPP Mascarenhas and 2,420 for HPP Peixe Angical.
The large amount of daily data for each region justifies the application of Conv3D and Pooling layers for treatment and processing. The definition of the characteristics of each layer is dependent on the data from each basin. Thus, the architectural parameters for each basin were obtained empirically and are shown in the Table 2.

B. SMAP CALIBRATION
The first step in the application of the developed architecture is the calibration of the SMAP model considering a long history of flows, precipitations and evapotranspiration. The objective of this stage is the definition of parameters that model the watershed by comparing the observed and the calculated water flow.
The period from 09/2012 to 11/2016 was considered for this stage. Table 3 shows the minimum and maximum limits of each parameter considered in the calibration and the values obtained in this step for each hydrographic basin. Figure 8 shows the behavior of the calibrated and observed flow over time and the data dispersion graphs for each watershed. The statistical metrics of the data for each region are presented in Table 4.    The results demonstrate that the SMAP model is a tool with excellent capacity to model watersheds. The next step is to adjust the parameters so that the representation of the basin in a shorter period is the closest to that observed. Thus, states of reservoirs that are less susceptible to errors are obtained and can be applied as input data for the proposed architecture. The next subsection presents the results obtained for the water flow forecasting step.

C. SMAP-DNN RESULTS
Initially, the entire history is calibrated by the Twiddle algorithm to obtain the reservoir states at each time point t. These data are used as the initial state of the reservoirs in the forecast period by the architecture.
For the cross-validation stage, the initial history from 09/2012 to 11/2016 was considered. The forecast was performed at intervals of 7 days. Thus, in the first run, the forecast is made from time t to time t + 7. In the next run, the forecast is made for time t + 8 to t + 15 and, thus, successively until 05/2020. Therefore, with each new execution, the training set increases by 7 samples and the validation and test sets are moved. For the validation of the proposed model, a long history is selected for the tool to perform the water flow forecast. This rainfall history, provided by MERGE, is an approximation of what happened in reality and is similar to prediction data in relation to the format in which they are made available and the accuracy of the data. Hence, the MERGE data are used in this validation stage and play the role of the predicted precipitation in this period, since the validation must be done in a period in which observed flows are available for comparison with the DNN outputs.
To validate the methodology, the expected flows will be compared with the observed flow for the same period. In addition, the proposed architecture (SMAP-DNN) will be compared with two other forecasting strategies: • the SMAP model, after local calibrations, will forecast based on future precipitation and evapotranspiration data; • Another DNN model, with the same CONV3D-LSTM architecture, will make the forecast without applying the reservoir states as input data. For the forecast, the SMAP model is fed with future data on rain and evapotranspiration. These data were not used for the calibration of the SMAP model or for training the network.
Due to the random nature of the training, strategies involving DL techniques were performed 20 times each. The Twiddle algorithm has a deterministic characteristic and, therefore, the SMAP model was calibrated only once at each forecast interval. The distribution of the results obtained by each technique are shown in Figure 9.
The results obtained demonstrate that the DNN has a better forecasting capacity than the SMAP model, when the two techniques are performed in an uncoupled manner. In addition, the graphs show that using the SMAP state variables as input data for the DNN improves the forecasting capacity, since these variables complement the flow observed with the dynamic characteristics of the basin at the time of the forecast.
The results of HPP Peixe Angical show that the application of the SMAP-DNN strategy reduces the mean and median  This difference between the results of the two regions can be explained by the following aspects: • the SMAP model can better represent the region of HPP Mascarenhas, due to the intrinsic characteristics of the hydrographic basin; • climate data for a region can be more accurate; • characteristics of a watershed can make it more difficult to obtain optimal points through the calibration process; • the hydrographic basin may have presented a different behavior throughout the forecast period than that observed in the calibration of the SMAP in the first step.
Despite the difference between the results of the two basins, both demonstrate that the proposed architecture is effective for the flow forecasting process in the short term. The flow data obtained by the proposed architecture are presented in Figure 10 and 11 for HPP Mascarenhas and HPP Peixe Angical, respectively. The black dotted line represents the observed flow and the colored lines represent the 7-day intervals that were generated in this forecasting step. The results show that there is great assertiveness in periods of drought and good forecast of the trend in periods of flood. This behavior can also be observed in the scatter plots of Figure 12.
Statistical metrics were also applied to assess the best forecast obtained for each region. The behavior of this forecast was analyzed according to the number of days ahead as shown in Table III-C. The results show that the correlation of HPP Mascarenhas data is stronger than that of HPP Peixe Angical data. In addition, all metrics show that the Mascarenhas forecast performed better. However, the results obtained for both basins demonstrate that the developed architecture is an excellent tool for forecasting affluent flows. Table III-C also shows that the forecasts for the first day have a very low error and high correlation between the observed and calculated flow data. As expected, performance tends to decrease as it distances from the moment of execution. Even so, the latter also show a strong correlation between the data.

V. SUMMARY AND CONCLUSION
Forecasting the availability of water resources is an important study for several sectors, including hydroelectricity. A good estimate of future flows allows planning the operation of plants and the electrical system. This planning includes the forecast of generation and pouring in the short and long term. This work proposed an innovative tool to improve the assertiveness of water flow forecasts 7 days ahead in a watershed.
The methodology coupled the SMAP rain-flow model and a Deep Learning architecture Conv3D-LSTM. The Twiddle algorithm was applied for a smooth local adjustment of the SMAP model. Modifications were done in the algorithm to guarantee a gradual process and respecting the restrictions. Thus, the reservoir data was made available for architecture training in a more accurate way.
The 3D convolutional layers (Conv3D) showed great potential for obtaining spatio-temporal features of the rain. The developed tool was tested for reservoirs of two large plants in Brazilian basins with divergent characteristics. The results showed that the union of the two models showed better assertiveness in relation to the application of decoupled methods. The proposed architecture was built with the objective of defining only one scenario for each execution and this is the focus of this study. Due to the random nature of the DNN training process, stochastic analyzes can be developed, such as the use of quantile regression in training stage.
The comparison of water flows observed and predicted by the model demonstrated the existence of a strong correlation, validating the forecast stage. In addition, statistical analyzes were applied to the forecasts for each day of the forecast week. The results showed that the forecast assertiveness is greater at the beginning of the forecast. However, the past few days have presented acceptable results for planning and in accordance with the ideal limits of the statistical metrics considered.
For the simulations presented, rainfall data observed in a grid format provided by the MERGE/CPTEC product were used. For the application of the network as a forecasting tool, meteorological forecast data is required. The application of MERGE data represents an advantage from this point of view, since the forecast data has this same format. GUILHERME M. MACIEL received the degree in electrical engineering with emphasis on robotics and automation and the master's degree in electrical engineering in the field of electrical energy systems from the Federal University of Juiz de Fora, in 2016 and 2018, respectively, and the degree in mathematics from the Claretiano University Center, in 2020. He is currently pursuing the Ph.D. degree. He has worked as an Engineering Intern with ArcelorMittal Brazil, from 2015 to 2016. He is currently a Professor of robotics with the Military College of Juiz de Fora, and a Researcher with INESC P&D Brazil. He has also been a collaborating member of the Local Robot Competition Team called Rinobot, since 2018. His research interests include optimization, machine learning, assistive robotics, and educational robotics.
VINICIUS ALBUQUERQUE CABRAL received the B.Sc. and M.Sc. degrees in electrical engineering from the Federal University of Juiz de Fora (UFJF), Juiz de Fora, Brazil, in 2017 and 2020, respectively, where he is currently pursuing the degree with the Postgraduation Program of Electrical Engineering (PPEE). He is currently a member of the Heuristic and Bioinspiration Optimization Group (GOHB), UFJF. His research interests include optimization, evolutionary algorithms, and power system analysis.
ANDRÉ LUÍS MARQUES MARCATO (Senior Member, IEEE) received the master's and Ph.D. degrees in electrical engineering from the Pontifical Catholic University of Rio de Janeiro. He is currently an Electrical Engineer with the Universidade Federal de Juiz de Fora (UFJF), Brazil, where he is also a Full Professor and the Head of the Electrical Energy Department. He has visited the Imperial College London and the Faculdade Engenharia da Universidade do Porto, as a Postdoctoral Student, in 2012. His research interest includes optimization techniques applied to hydrothermal coordination, operation, and expansion planning electrical systems. He coordinates or coordinated research projects with many companies in the Brazilian Electrical Sector, for example, Petrobras, CEPEL, Light, CESP, Cemig, and Duke Energy. He is currently an Associate Professor with UFJF. His current research interests include evolutionary algorithms, probabilistic methods, optimal power flow, robotics, autonomous vehicles, fuzzy logic, pattern recognition, and optimization. VOLUME 8, 2020