Correlated Time-Series in Multi-Day-Ahead Streamflow Forecasting Using Convolutional Networks

Information about future streamflow is important for hydropower production planning, especially for damless hydro-power plants. The river flow is a reflection of various hydrological, hydrogeological, and meteorological factors, which increases the direct modeling difficulty, and favors the use of data-driven methods. In this paper, we propose the use of one-dimensional convolutional neural networks (1d-CNN) for multi-day ahead river flow forecasting and we present a multi-input model using correlated-input time-series. The proposed model was applied at the Madeira River, the Amazon’s largest and most important tributary, near the Santo Antônio damless hydro-power plant. We compared the proposed correlated-input 1d-CNN to a single-input 1d-CNN model and some baseline models. Furthermore, we conclude that 1d-CNN performed better than all baseline models and that the correlated-input forecasting model is 5 times smaller than the single-input equivalent with accuracy improvements.


I. INTRODUCTION
Streamflow forecasting is an important activity that can help in hydro-power scheduling, reservoir planning, and flood management. Therefore, streamflow forecasting plays an important role in the environmental and economic sciences. In what concerns the Brazilian electric system, river flow forecasting is extremely relevant, considering 66.6% of domestic electrical energy was supplied by water sources in 2018 [1]. Future streamflow data could be used to model the amount of energy that would be possible to generate from water sources and use this model to optimize the use of resources, especially in systems that use damless hydroelectric plants. Indeed, the runoff is the most important input for the hydropower system and one of the most decisive parameters in hydropower production planning and decision making [2], [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Yunjie Yang . However, the river flow forecast depends on a wide number of variables such as precipitation, evapotranspiration, ambient temperature, soil characteristics of the catchment, and urbanization [4]. Moreover, modeling river dynamics based on these variables requires a large number of hydrological, hydrogeological, and meteorological parameters [5]. In this context, the use of data-driven techniques has been growing for river flow forecasting [6]- [13].
Traditional data-driven techniques are based on statistical models that use past time-series data to fit a function and forecast its future values. Examples of these techniques are the linear autoregressive (AR), nonlinear autoregressive (NAR), autoregressive moving average (ARMA), and nonlinear autoregressive moving average (NARMA); and they have been successfully used in river flow forecasting [6]- [8].
In the last decades, machine learning has been extensively employed in data-driven forecasting models for river flow [10]- [19] and other environmental parameters [20]- [24], substituting the function parametric fitting of traditional data-driven techniques by a machine learning model. Among the vast number of machine learning algorithms, the artificial neural network (ANN) has been widely used and proved to be equivalent to the well-accepted statistical models, such as ARMA, NAR, and NARMA [25]- [27]. Although ANNs are equivalent to these statistical models, ANN models offer the designer a flexible way of implementation; model complexity can be varied by changing the hidden layer size or the neuron's transfer function and the number of input/output parameters can also be easily changed. Moreover, ANN algorithms showed better performance for natural resources forecasting than statistical models [28], [29]. Hence, due to these advantages, ANN-based forecasting algorithms have been extensively used in the past years [30]- [36].
These data-driven models gained more power, flexibility, and ease of implementation as the state-of-the-art machine learning algorithms have moved towards deep learning, a type of machine learning algorithm that uses several layers to analyze data. This multilayer characteristic of deep learning structures allows multiple layers of data abstraction and feature extraction [37], [38]. Therefore, the task of feature extraction that once was performed by the machine learning scientist could now be incorporated into the machine learning model.
Although the rise of deep learning was mainly due to its two-dimensional feature extraction abilities, deep learning has recently been applied to unidimensional data such as time series, in different fields, such as sensor-based human activity recognition [39], [40], trading tendencies identification [41], [42], and medicine [43].
In this work, we present a unidimensional convolutional neural network (1d-CNN) river flow forecasting model using correlated-input time series. We investigated the use of turbidity and river flow as the input of a 1d-CNN to forecast the river discharge multiple days ahead. To the best of our knowledge, the use of turbidity and discharge as the input of a 1d-CNN for river flow forecasting has never been investigated before.
The proposed correlated-input CNN was applied at the Madeira River at the Santo Antônio power plant, an important Amazon basin damless hydropower plant. Furthermore, we compared the proposed correlated-input CNN model to a single-input CNN model and baseline models: linear model between the correlated time series, seasonal persistent model, and multilayer perceptron (MLP).

A. DATA DESCRIPTION AND ANALYSIS
The study presented in this paper was conducted using discharge and water turbidity data collected at the Madeira River from 2011 to 2018 at the Santo Antônio hydroelectric power plant. A section of the northern region of Brazil can be seen in Fig. 1, the main rivers of this region are shown in blue and the Madeira River is highlighted in red, the site where data was collected is also highlighted by a cross near the city of Porto Velho.  Both time series had 24 hours sampling rate and can be seen in Fig. 2, where t [n] represents the river turbidity time series and d [n] the river discharge time series. Note that the power plant staff collects this data daily at the same site, therefore we can ensure that both time series are synched in both location and time. Moreover, the model implementation does not require more data acquisition, only the implementation of the processing algorithm.
This data was sequentially separated into three timedependent subsets: train, validation, and test. The first 45% data points were assigned to the train set, the next 25% to the validation set, and the remaining 30% (most recent data) to the test set. We used the first two subsets to train and select the forecasting models and the last subset to evaluate and compare the models. This sequential separation of data was made to ensure that all data used to train the forecasting models were older than the data used to evaluate them, therefore avoiding biased evaluation due to time dependence.
Before model selection, we analyzed the data to choose the number of steps for model input and output. Firstly we obtained the time series frequency content to measure its periodicity. Therefore, we calculated the turbidity and discharge time series discrete Fourier transform (DFT) and measured their fundamental frequency. It is known that during hydrological events, river flow and sediment concentration are related, in a way that peaks of discharge could occur synchronously with sediment concentration peak, before or after the correspondent discharge one [44]. And by visual inspection of Fig. 2, we estimated that both time series have the same fundamental frequency, but with turbidity peaks occurring before the discharge peaks.
Moreover, we searched for the mean lag that the discharge time series has concerning the turbidity time series. We lagged the turbidity time-series t [n] by k steps and calculated the correlation between t [n − k] and the discharge time series d [n] (corr (t [n − k] , d [n])), this process was repeated for several k values. The maximum value of ) occurs when k is the number of days that the turbidity time series is ahead of the discharge time series, we denoted this value by k lag .
Given the periodic characteristic of both t [n] and d [n], we expected the corr (t [n − k] , d [n]) to increase to a certain peak and start to decrease, passing to the point where the correlation was lower than k = 0 correlation value. We used this point to determine the forecasting model input size. This choice was made to take advantage of the time steps in which the signals were more correlated. The number of steps ahead to forecast the river flow was chosen as k lag , to take advantage of the high correlation between today's turbidity and river discharge k lag days ahead.

B. BASELINE MODELS
To evaluate and compare the 1-d CNN forecasting model, we propose some simple baseline models using the discharge periodic characteristic and the correlation between the studied time series, the Seasonal persistent model (sp), and the turbidity-to-discharge linear model (t2d), respectively. We also used a multilayer perceptron (MLP) as a machine learning baseline model.

1) SEASONAL PERSISTENT MODEL
The first baseline model was based on the naive repetition of the last season value, i.e. we took advantage of the periodic character of the river discharge and assumed that the discharge at a given time will be equal to the discharge of the last season equivalent. Let T d be the river discharge period and y sp be the k lag days-ahead river discharge estimated by the sp model, y sp , is given by:

2) TURBIDITY TO DISCHARGE LINEAR MODEL
The next baseline model used the correlation between the t [n] and d n + k lag to estimate d n + k lag . We used the train and validation data to fit a first-order polynomial that relates the turbidity to the future discharge, therefore the turbidity to discharge linear model is given by: where y t2d is the k lag days-ahead river discharge estimated by the t2d model, x t[n] is today's known turbidity, a and b are, respectively, the angular and linear coefficients calculated by the first-order polynomial fitting of the training and validation data.

3) MLP FORECASTING MODEL
The most complex baseline model we used was the MLP-based forecasting model. It is a one hidden layer multilayer perceptron that receives the k max last discharge values and estimates the k lag days ahead of river discharge. We choose the MLP as a machine learning baseline model because neural networks were widely used for forecasting river parameters and the MLP is an important neural network structure due to its simplicity and universal approximation capability [45]. We trained several MLP topologies using the RMSprop training algorithm with a 0.001 learning rate and using early stopping with 10 epochs patience. We searched for the hidden layer size testing this parameter from 10 to 90 neurons, in 5 neuron steps. Both input and output data were normalized to the [0, 1] interval and rescaled to perform the model evaluation. The best configuration, and our MLP baseline model, was a 70 neuron MLP with a sigmoid activation function, trained until epoch 28.

C. CNNs IN TIME SERIES PROCESSING
Backpropagation convolutional neural networks were first introduced to solve the automatic handwritten digits identification problem [46]. While the traditional fully connected layer used by the MLP learns patterns scattered over its coordinates, convolutional layers were designed to find local patterns, such as objects in a scene, regardless of its position [37], [38]. Another advantage of convolutional layer architecture is that the filter coefficients are shared for all input positions, while dense layers (another name for fully connected layers) have connections between every neuron in adjacent layers.
These same characteristics can be extended to 1d data, such as time series. Therefore, a pattern learned in a given time can be recognized in another time frame, regardless of when the pattern happened [47]. This approach to time series forecasting has been explored to air pollution and electric load estimation [47]- [49], solar power generation forecast [50], rainfall [51], and multi-location streamflow forecast using stacked time series [52].
The scheme of a 1d convolutional input layer used for time series feature extraction can be seen in Fig. 3; in this illustration, a convolutional layer with two kernels of size two is shown. In this work, we used a single convolutional layer followed by two dense layers. The first dense layer receives the flattened feature maps and the last one outputs the multi-day ahead discharge estimation. Therefore, the last dense layer has one neuron and the hidden dense layer a configurable hidden layer size. We used the ReLU (rectified linear unit) activation function in both the convolutional and hidden layers.
At the CNN input, part of the historical data is windowed and used as the convolutional layer input. This layer convolves the input time series with M convolutional kernels. After this convolution, a bias b j was added and the result fed into a ReLU function that outputs: where v x,i j is the value of the j th feature map at the position x concerning the i th input time series, w p jm is the p th value for the P sized convolutional kernel, and s x,i is the i th input sequence at position x. The v x,i j values were fed into a max-pooling layer of size Q, described by: The a x,i j values were then flattened into a vector of α k components, where k represents each component of the feature maps and is fed into a fully connected layer, the hidden layer. The output of this layer i th neuron is: where w i,k is the synaptic weight connecting α k to the H i neuron. Finally, the predicted value in terms of H i and the weights connecting the hidden layer to the output layer is:

D. PROPOSED CNN MODELS
We propose the use of correlated time series as the input of the 1d CNN forecasting model equated in section II.C. The use of correlated time series as input to a convolutional layer was previously studied in [53], where the authors showed using correlated time series as the input of a forecasting model using a convolutional input layer followed by a recurrent layer increased the forecasting performance. As discussed in section II.A., the river turbidity and discharge are highly correlated; therefore we used these two parameters in the correlated-input 1d CNN. Hence, two forecasting models based on one-dimensional convolutional layer were designed and evaluated: the baseline CNN model named CNN-Q, which used only the river discharge of k max days to forecast the k lag days ahead river discharge; and the correlated-input CNN model (named CNN-QT), which used last k max days of turbidity and discharge to forecast the river discharge k lag days ahead.
The designing procedure for both models was the same, we searched for the best hyper-parameter combination in the subset of the CNNs hyper-parameter space: M ∈ {6, 8, 12} kernels with size P ∈ {2, 4, 8}, pooling window of size Q ∈ {1, 2, 4}, and hidden layer with N h ∈ {10, 30, 50} neurons. We trained all 324 models five times using Adam [54]. The training aimed at the reduction of the forecasting mean squared error (MSE) and we selected the smallest model that had a good validation mean absolute percentage error (MAPE) over the five tests.

E. EVALUATION METRICS
All baseline and CNN-based models were evaluated using the test set data. The metrics used to evaluate and compare these models were the MAPE, the normalized root mean squared error (NRMSE), the coefficient of determination (R 2 ), and the Nash-Sutcliffe model efficiency coefficient (NSE). We also compared the models visually by plotting a graph with the real time-series alongside the forecasted time-series and a scatterplot comparing the observed and estimated river discharge.

A. TIME SERIES ANALYSIS
The frequency-domain characteristic of the turbidity and discharge time series is shown in Fig. 4 by the discharge VOLUME 8, 2020 (|D (f )|) and turbidity (|T (f )|) time series DFT amplitude. Note that the river turbidity and discharge DFT amplitude peaks together at f 0 = 0.0028 day −1 . Therefore, the turbidity and discharge from 2011 to 2018 at the Madeira River has the same period T ≈ 362 days. We also calculated the correlation between |D (f )| and |T (f )| as 0.96, indicating that the minor peaks also occurred at the same frequencies, with highly correlated amplitudes; therefore the main difference between both time series was each frequency component phase. The correlation between t [n − k] and d [n] with k ∈ 0, 1, . . . , 90 days, used to design the input and output size of the forecasting models can be seen in Fig. 5. The correlation increases to a maximum value of 0.74 at k lag = 38 days and decreases to the initial value at k max = 75 days. Therefore, our forecasting models were configured to an input window size of 75 days to predict the river flow 38 days ahead. This input window selects the last 75 data points of the historical data as model input; see the example illustrated by Fig. 3

B. CNN MODELS CONFIGURATION
The model selection results for the proposed correlated-input and single-input 1d CNN models are shown in Table 2. For both CNN-Q and CNN-QT, we chose the hyper-parameters combination that resulted in the best mean MAPE with minor variation over the five tests. The validation performance was 22.6% and 25.6% MAPE for the CNN-Q and CNN-QT models, respectively. But note that the single-input model has more filters and hidden neurons, and therefore, more trainable parameters. Note also that a unit pooling size is equivalent to no pooling at all, since the convolution outputs are grouped one by one, and this also increases the number of connections and the parameter counting.
On the other hand, fewer hidden neurons were used by the CNN-QT model, which indicates that the convolutional layer could perform better feature extraction than the CNN-Q convolutional layer, using fewer filters. The similar validation MAPE observed during the model selection and the fewer trainable parameters indicated that the use of correlated time series in 1d-CNN could indeed facilitate the forecasting task and the use of smaller 1d-CNN models for the River Madeira discharge forecasting.

C. BASELINE MODELS PERFORMANCE
The metrics calculated by the proposed baseline models are summarized in Table 3. The most simple model (Seasonal persistent or sp) obtained a good performance due to the discharge fundamental frequency high influence on the signal power, note in Fig. 4 the amplitude difference between the discharge fundamental frequency peak and higher frequency peaks.
The turbidity to discharge linear model (t2d), on the other hand, performed worse than expected. Due to the high correlation between t [n − 38] and d [n] seen in Fig. 5, we expected the t2d model to perform better than the sp model, but we estimated the t2d MAPE as two times worse than sp model. We could attribute the high error and low R 2 obtained by the t2d model to the turbidity high-frequency components than can be seen in Fig. 4.
Among the baseline models, the MLP performed better, with low MAPE, NRMSE, and R 2 closer to unity. Moreover, for 38 days ahead forecasting, the 33.60% error is a good approximation, but a poor accuracy improvement to the seasonal persistent model given the MLP complexity. A visual comparison between all baseline models can be seen in Fig. 6. In Fig. 6a the seasonal persistence could be easily seen as the first peak was repeated by the second one. The scatter plot in Fig. 6b showed the sp model outputs followed the ideal model curve slightly below; therefore, this model tended to sub-estimate the discharge value on the tested data, especially for higher values.
On the other hand, the linear model that relates the turbidity to the 38-day-ahead discharge tended to be less accurate for smaller values as can be seen in both Fig. 6a and Fig. 6b, and to over-estimate the discharge. In Fig. 6a the high-frequency characteristics of the turbidity could be observed by the fast-changing discharge sub-peaks estimated by this model. Moreover, Fig. 6b showed that the estimated values did not follow a straight line concerning the observed data, as seen by the R 2 = 0.66; furthermore, the model saturates at around 10,000 m 3 /s and provided a poor estimative of lower discharge values.

D. CNN MODELS PERFORMANCE AND COMPARISON
The performance metrics estimated using the test set data for the two CNN-based models designed in this work are shown in Table 4. The traditional approach to forecasting using 1d-CNN (model CNN-Q), obtained great improvement on the accuracy when compared to the baseline models, with almost 30% improvement on the MLP accuracy. The proposed correlated-input CNN model proposed in this work obtained an accuracy improvement concerning the CNN-Q model that represents a 20% accuracy improvement, with similar R 2 , at the cost of a higher NRMSE. This accuracy improvement represents more than 40% more accuracy than in the baseline models. Fig. 7 shows the CNN models forecasting results and the scatter plots of these models. Note that both models are visually similar as can be seen in the time plots shown in Fig. 7a. Data dispersion was also similar, as showed in Fig. 7b, the estimated values are near the ideal model line with a similar slope, as seen numerically by the R 2 = 0.93 and R 2 = 0.92. Comparing the CNN models with the MLP model, the best baseline model presented in this work, we could see that the CNN models presented better accuracy, an improvement on slope-change detection, and are easier to implement than MLPs (no feature extraction is needed). Furthermore, although the CNN models could provide better estimative of the target time series behavior compared to the CNN-Q model, the CNN-QT model had better accuracy, especially for low discharge values. Table 5 summarizes the performance of all models including the NSE and confidence interval. We estimated the 68% C.I. by the standard deviation of each model error. We could see that although the sp and MLP models presented relatively close MAPE, the C.I. of the sp model was roughly five times bigger than the C.I. estimated for the MLP model, indeed note the sp and t2d models data spread in Fig. 6(b). Additionally, a negative NSE was obtained for these two models. This indicates that the mean value of the discharge time series is a better estimation than the one provided by the sp and t2d models.   From this table, we could also see that the CNN-based models also presented lower uncertainties, as expected by the R 2 and the low data spread seen in Fig. 7(b). Comparison to the well-accepted machine learning forecasting model (MLP) tested in this work, showed the accuracy improvement observed by the CNN-based models didn't compromise the forecast confidence, nor the NSE.
Therefore, results showed the use of CNNs in time series could improve the forecast quality and that the use of correlated-input indeed increases accuracy, as shown in [53].
Although the CNN-QT model had more input data, leading to more features to extract and more patterns to learn, the training time was 11.5±6.6 seconds for CNN-Q and 10.7±6.0 seconds for the CNN-QT model, showing no mean training time difference. Because despite the greater amount of data incorporated into CNN-QT, the number of trainable parameters of CNN-QT was substantially smaller, which could have favored the training time.
Indeed, as discussed in section III.B., the main advantage of the CNN-QT model over the CNN-Q model is its size. The CNN-QT was almost five times smaller than the CNN-Q model, concerning the trainable parameters. CNN-QT had a smaller hidden layer, fewer filters, and fewer connections from the convolutional layer to the hidden layer due to the bigger pooling size. And despite the smaller size, the CNN-QT showed better forecasting performance. Hence, incorporating the turbidity time series to the 1d-CNN forecasting model improved the model simplicity and the resulted accuracy.
Furthermore, the hidden layer size of the CNN-QT model (30 neurons) was smaller than the CNN-Q model (50 neurons) and the MLP model (70 neurons). Therefore, if we apply transfer learning to the convolutional layer of the CNN-QT model, among the machine learning based models presented in this paper, the CNN-QT is computationally easier and faster to train.
Moreover, the performance of the proposed CNN-based model met the performance showed by other studies found in the literature for data-driven techniques towards multi-day ahead streamflow forecasting. In [55] the authors showed a comparison between ANN, WNN (wavelet neural networks), ANFIS (adaptive neuro-fuzzy inference system), and WNF (Wavelet neural networks model) forecasting models for 5 days ahead river flow forecasting and obtained a peak detection error as low as 3% for the WNN and as high as 72% for WNF and 98% for the ANN and ANFIS models, respectively.
The work presented in [56], on the other hand, showed an ANFIS forecasting model with good performance, measured by an NSE of 0.823 for 8 days ahead streamflow forecasting. The authors of [57] showed a comparison of several multi-step forecasting models and obtained a MAPE ranging from 14.41% to 43.99% for 7 days ahead river flow forecasting. And in [58] the authors showed a comparison between several regression models for river flow forecasting and the results showed great variability in the peak detection error, this error varied from 4% to 89% absolute discharge peak error. And the best model evaluated in this study obtained an NSE of 0.819 for 3-day ahead river flow forecasting.
As one can see the performance obtained by the proposed forecasting method agrees with the performance found in the literature for multi-day ahead streamflow forecasting. Although the results may vary a lot due to the different number of steps ahead or river studied the ∼20% MAPE and ∼0.9 NSE obtained in this work showed that the use of CNN in river flow forecasting is promising and could increase the performance of future multi-day ahead streamflow forecasting models.

IV. CONCLUSION
This study presented the use of correlated time series as the input of a unidimensional convolutional neural network for streamflow forecasting at the River Madeira. Analyzing Madeira's time series we concluded that the river discharge periodicity was 362 days and that it was 38 days lagged concerning turbidity. We used this information to design and evaluate several forecasting models.
Comparison between the proposed correlated-input 1d-CNN and a single-input 1d-CNN showed the use of turbidity and discharge could decrease the Madeira's river flow forecasting model by a factor of five. Furthermore, the CNN-based models presented better estimation accuracy than all baseline models tested, and the CNN-QT obtained slightly better performance, despite its reduced size.
Since our results showed turbidity was ahead of the river discharge and that the use of these two parameters improved the forecasting model, we believe that the approach we took on this paper should work for any river that the discharge is lagged from the turbidity. Hence in future works is interesting to evaluate the proposed method for different rivers.
Therefore, the use of turbidity and discharge in CNN-based Madeira's river flow forecasting model could represent a step forward in deep learning for river flow forecasting. A smaller model could be trained with fewer data and still be reliable, and when training with large datasets the training is faster and computationally cheaper.
Moreover, the results obtained in this work may encourage machine learning scientists working on CNN-based forecasting models to study the use of correlated time series. Although more investigation on the forecasting of different time series and parameters is needed, the results showed that the use of correlated-input is promising in what concerns the model reduction.