Recurrent Neural Network-Augmented Locally Adaptive Interpretable Regression for Multivariate Time-series Forecasting

Explaining dynamic relationships between input and output variables is one of the most important issues in time dependent domains such as economic, finance and so on. In this work, we propose a novel locally adaptive interpretable deep learning architecture that is augmented by recurrent neural networks to provide model explainability and high predictive accuracy for time-series data. The proposed model relies on two key aspects. First, the base model should be a simple interpretable model. In this step, we obtain our base model using a simple linear regression and statistical test. Second, we use recurrent neural networks to re-parameterize our base model to make the regression coefficients adaptable for each time step. Our experimental results on public benchmark datasets showed that our model not only achieves better predictive performance than the state-of-the-art baselines, but also discovers the dynamic relationship between input and output variables.


I. INTRODUCTION
Time-series forecasting is a crucial in many time dependent domains including inventory control, customer management and distribution to finance and marketing [1]. In addition, it is also important to explain the dynamic relationship between input and output variables rather than performing accurate time-series forecasting for some application areas. For example, in a business, it is important to determine how much the price goes up when the supply goes down [2]. Therefore, classical statistical time-series forecasting approaches are still more widely used than deep learning and machine learning methods [3,4].
Recent advances in artificial intelligence (AI) have achieved superhuman performance in many computer vision applications, including image classification, speech recognition and machine translation. However, this improved predictive performance often increases the complexity of the model, turning such systems into "black box" models. As a result, it becomes hard to understand the decisions of the model and to interpret their predictions. This ambiguity has made it difficult to use the black box models for time-series data. Although very few interesting deep learning-based interpretable architectures have been proposed in recent years to achieve high predictive performance on time-series tabular data [1,[5][6][7][8], these models have not addressed the identifying relationship between explanatory variables and output. 2 VOLUME XX, 2017 In order to address this issue, we propose a novel adaptive interpretable deep learning architecture that can identify the relationship between input and output variables for multivariate time-series data. Our proposed model consists of two key aspects: an interpretable base-learner and a meta-learner. A simple linear regression is chosen as our base-learner. Linear regression is the most explainable model because it explains the linear relationship between input and output variables [9]. The sign of a regression coefficient determines whether there is a positive or negative correlation between each input variable and the target variable [10]. A positive (negative) coefficient indicates that as the input variable's value increases, the predicted mean value of the target variable also tends to increase (decrease). In other words, by determining the impact of input variables on the target variable, we can explain the behavior of models by capturing the relationship between input variables and their direction. However, the major drawback of linear regression is linearity as well. The linear relationships of input and output variables are hardly restricted and they usually oversimplify how complex reality is; therefore, the predictive ability of linear regression is often not good. Therefore, we augment our base-learner by a meta-learner to improve its predictive performance. We use long short-term memory (LSTM) network as our meta-learner. Our meta-learner re-parameterize a base-learner at each time step. In other words, we re-parameterize our base-learner using a meta-learner to determine a local linear function that "best fits" data at each time period. Once we find a local linear function for each observation in data, it is easy to explain the relationship between input and output by measuring the impact of each input variable on the target variable.
The overall framework of our proposed architecture is shown in Figure. 1. We first perform the OLS (base-learner) to obtain regression coefficients and their standard errors. Second, we apply LSTM neural networks (meta-learner) to predict the probabilities for finding the Gaussian critical value to update each regression coefficient for each observation. We use two inputs, which are non-normalized and normalized inputs. Our base-learner receives nonnormalized inputs to explain the logical and global relationship between input and output variables. Our metalearner takes normalized input to adapt regression coefficients locally, and these adapted regression coefficients demonstrate the local relationship between input and output by measuring the impact on target variable for each observation. Based on the predicted probabilities and the standard error of regression coefficients, we calculate the local regression coefficients using the confidence interval formula. Finally, we rebuild the linear regression equation based on the local regression coefficients for each observation.
We first train our proposed model on the public electricity and traffic datasets used for time series forecasting benchmark to compare to the state-of-the-art time-series models [11]. Our model achieved better predictive performance than other state-of-the-art baseline time-series forecasting models.
In addition, we extensively studied the predictive performance and the model interpretability on several timeseries datasets coming from different domains. As a result, our model also showed greatly higher performance than machine learning and regression baselines. These experiments on the benchmark datasets proved that our proposed deep learning-based explainable architecture can be one of best predictive methods for time-series data.
The main contributions of this work are summarized as follows: 1) We proposed a novel interpretable deep learning architecture to achieve both high predictive accuracy and model interpretability for time series data. 2) We proposed a novel augmentation method to improve the predictive performance of linear regression with keeping its interpretability. 3) We used recurrent neural networks to augment the linear regression to parameterize the family of linear functions. 4) We make the linear regression coefficients adaptable within their confidence intervals; therefore, our model cannot be overfitted nor misrepresent the relationship between input and output variables. Overview of our proposed model, where is input variables, ′ is normalized input variables and is the target variable at time period. We first perform a linear regression to obtain our base-learner. In this case, data should not be normalized to obtain meaningful parameters to explain the relationship between input and output variables. We then train our meta-learner using normalized input to update the parameters of our base-learner.
3 VOLUME XX, 2017 5) Our proposed model can determine the dynamic relationship between input and target variables by measuring local impact of the input variables on the target variable. 6) We provided evaluation of the proposed model on benchmark datasets in terms of predictive accuracy and model explainability. This paper is organized as follows: Section 2 discusses related work. Then, Section 3 presents concept of the proposed model. Section 4 demonstrates datasets and experimental results. Finally, Section 5 summarizes the general findings from this study and discusses possible future research areas.

II. RELATED WORK
Time-series forecasting methods have been developed into a significant and active research area [12], in which interpretable deep learning based models have also been widely studied [1,5,[13][14][15][16]. Initially, attention-based neural network models were proposed to identify noticeable portions of input variables for each time step using the magnitude of attention weights for time series data with interpretability motivations [6,18].
In addition, authors have been applied post-hoc explanation methods on pre-trained deep learning models to obtain the model understandability and increase humans' trust [19,20]. Ribeiro et al. [19] proposed the LIME technique, short for Local Interpretable Model-agnostic Explanations, in an attempt to explain any decision process performed by a black-box model. Another popular method for explaining black-box models is SHapley Additive eXplanations (SHAP) [20,21]; SHAP are Shapley values representing the variable importance measure for a local prediction. They are calculated by combining insights from 6 local feature attribution methods. Unfortunately, these post-explainable models can make misinterpretations on unseen data because these models are usually based on the permutations, which are randomly sampled from the marginal distribution. According to Rudin [22], the best explanation should be provided by the model itself. In other words, the explainability should be incorporated into the architecture to allow the model to make the correct predictions with the logical correlations. Although a number of interpretable modeling approaches have been proposed for time-series forecasting, these studies have only focused on explaining the variable contributions [5] or on decomposition for analysing univariate time series [1] and they have not explain relationship between input and output variables.
In conclusion, deep learning-based model that explains the logical relationship between input and output variables for time series data has not been proposed. On the other hand, the linear regression models are considered to be relatively explainable [23], especially when the regression coefficient has a particular meaningful value. Therefore, we propose a novel explainable deep learning architecture by improving linear regression predictability based on the recurrent neural networks.
Another related line of work focuses on meta-learning. Utilizing one neural network to produce parameters for another neural network has been studied earlier in metalearning field [24][25][26][27]. Our proposed model is built on this method of establishing a meta-learning model. We train meta-model (neural network) to explain its underlying basemodel (linear regression) parameters. Recently, Munkhdalai and Hong [28] proposed Meta Networks (MetaNet) that learns to fast parameterize underlying neural networks for rapid generalizations. Our method is based on the idea of the MetaNet that uses fast-weights, which has successfully been used on images, text, and audio data [29][30][31][32]. In order to successfully apply this approach for time-series data, we use meta-learner to estimate fast probabilities for finding the Gaussian critical value for each regression coefficient. Generally, we attempt to encourage predictive ability of interpretable model using neural networks. We will present our methodology in detail in the next section.

III. THE PROPOSED MODEL
Our proposed model consists of two main phaseslinear regression (interpretable base-learner) and recurrent neural networks (meta-learner). We first perform simple linear regression on training set to obtain unbiased regression coefficients and their standard errors (see Figure. 2). Second, we train recurrent neural networks as a meta-learner on normalized training set to predict the probability for locating percentile of Gaussian distribution for each regression coefficient. Finally, we reconstruct the linear regression equation by using the updated local regression coefficients at each time step. In addition, there are two kinds of input; nonnormalized for base-learner and normalized inputs for metalearner. Neural networks usually take normalized input for faster convergence and higher predictive performance [33,34]. However, data scaling leads to misinterpret the relationship between the input and output variables. Therefore, our base-learner takes non-normalized inputs to explain the logical and global relationship between input and output variables, and our meta-learner takes normalized input to predict the probabilities for finding the Gaussian critical value to update each regression coefficient locally. These adapted regression coefficients demonstrate the local relationship between input and output by measuring the impact on target variable for each observation.

A. BASE-LEARNER
Given a set of dataset ( 1 , 1 ), … ( , ) of timestamps, a linear regression model estimates the coefficients that provide the best linear fits between the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. ). The model for linear regression is [35]: where is independent, identically distributed ( . . . ) random variables with { } = 0, { 2 } = 2 and bounded third moment.
The regression coefficients can simply be computed by using the OLS estimator: where = [ 1 ⏉ , … ⏉ ] ∈ ℝ × is the design matrix and = [ 1 , … ] ∈ ℝ . The regression coefficients estimated from data are subject to sampling uncertainty. In other words, the true value of the regression coefficient can never be estimated from the sample data. Instead, we could construct confidence interval for each regression coefficient: where is the significance level, is the inverse Gaussian distribution and (̂) is the standard error of the regression coefficient ̂. The main idea of our proposed model is to show a better performance on point forecasting by adapting the linear regression coefficients within their confidence interval for each time step. In order to find the "best fits" local linear function for each time step, we must determine the appropriate value in the confidence interval for each of the linear regression coefficients. In order to achieve better performance, we design meta-learner model to find the appropriate regression coefficients for each time-step based on the formula for calculating confidence interval. Our metalearner model predicts the appropriate significance level for each regression coefficient to make it adaptable.

B. META-LEARNER
We use Long short-term memory (LSTM) neural network architecture as a meta-learner model. LSTM network is an extension of recurrent neural networks. It was proposed by Hochreiter & Schmidhuber [36] as a solution to the vanishing gradient problem. LSTM helps to solve longterm dependencies by extending their memory cells and utilizing a gating mechanism to control information flow. The memory cell consists of three gatesinput, forget and output gate as shown in Figure. 3. These gates decide whether or not to add new input in (input gate), erase the unnecessary information (forget gate) or to add it impact the output at the current time step (output gate). Theoretically, these gates are represented as: denote the weights of the hidden and output layers, , , , are the bias vectors, ( ) is the vector of inputs, ( * ), ( * ), ℎ ( * ) are the activation functions, ℎ ( ) , ℎ ( −1) denote the output of hidden layer neurons at time t and t-1, and , , , denote the weights that connect the hidden layer neurons to the recurrent layer and output, respectively.

C. RECURRENT NEURAL NETWORK-AUGMENTED LOCALLY ADAPTIVE INTERPRETABLE REGRESSION
As described above, input of our meta-learner can be normalized independent variables ( ′ ) ∈ × , = 1, … , and = 1, 2, … , , and output should be the predicted probability ( ) corresponding to the Gaussian critical value. Since we predict probability for finding the critical value of Gaussian distribution, the activation function of output layer can be sigmoid (σ) because it produces value between 0 and 1. Thus: where ′ is normalized input, denotes the predicted probability, LSTM is a LSTM layer, FC is a fully connected linear layer and , , , and denote the weight parameters of our meta-learner model.
We also make additional smoothing parameters on the output of sigmoid function to control significance level of confidence interval.
where , ( > ) are smoothing parameters and these parameters should be close to 0. We can set upper and lower confidence intervals for the regression coefficients by adjusting these smoothing parameters. For example: at = 0.005, = 0.006 , the significance level of the lower confidence interval is equal to 0.005 and the upper is equal to 0.999. Recall that we pick the estimated regression coefficients and their standard errors as numerical input after performing linear regression. So we can easily reconstruct the original regression equation during the learning process of the metalearner: where is i-th independent variable (not normalized).

D. MODEL TRAINING
As explained in Section III.A, we first perform OLS estimator to obtain unbiased regression coefficients and their standard error. Then we apply our meta-learner to improve the predictive power of base-learner by adapting their regression parameters locally. If we do a little modification on Eq. 7, we can obtain a simple model that can be trained by using stochastic gradient descent (SGD) optimization algorithm.
From Eq. 8, we can compute the prediction performed by the OLS estimator. If we subtract the prediction performed by the OLS estimator from the actual value, the error value ( ) remains on the left-hand side of the equation, which should be predicted by our meta-learner. Then it can be as follows: 6 VOLUME XX, 2017 We can consider Eq.10 as a nonlinear part of our model, which is similar with these studies [37,38]. However, the main advantage of our model is that the prediction performance is improved by updating the parameters of base-learner without compromising its interpretability.
We now can design our loss function as follows: where is the mean squared error (MSE) and ( * ) is our proposed model with parameters and , ̂, and (̂) are the estimated regression coefficients by OLS, and e is error values.
In addition, the output of meta-learner should be equal to the number of independent variables, and our architecture can now easily be trained with SGD optimization with the backpropagation algorithm. The model training algorithm for our proposed architecture is as shown below: Algorithm: Model training algorithm Input: Training set:

A. DATASETS
We used 3 public benchmark time-series datasets (see Table  1) to compare the predictive performance of our proposed model to the other state-of-the-art time-series baselines [11]. We also chose additional 3 multivariate time-series datasets (see Table 2 and 3) to demonstrate the interpretability of our proposed model [6].   Electricity set: this dataset contains the electricity of consumption of 370 customers and aggregated on an hourly level [11].
Traffic set: traffic dataset describes hourly occupancy rate, between 0 and 1, of 963 car lanes of San Francisco bay area freeways [11].
The electricity, traffic, energy use of appliances and air quality datasets from UCI Machine Learning Repository [39]. The financial dataset was downloaded from The  [40] published this dataset. The dataset contains 12 variables as summarized in Table 3 We considered CO(GT) and NO2(GT) variables as target variables to predict them the same as De Vito et al., [40]. Data length consists of 9,357 hourly data points that recorded between 3/10/2004 18:00 and 4/04/2005 14:00. Energy set: the energy use of appliances dataset was introduced by [41]. They investigated data-driven predictive models for this dataset. The dataset consists of 27 time series variables including appliances energy consumption as shown in Table 4. The data were recorded every 10 minutes and 19735 data points were generated between 1/11/2016 17:00 and 5/27/2016 18:00. In accordance with the focus variable of Candanedo et al., [41] the appliances was chosen as a target variable.

B. BASELINE MODELS AND HYPERPARAMETER
For the state-of-the-art time-series baselines, we directly used the predictive performances from Yu, Rao & Dhillon [11]. They presented a temporal regularized matrix factorization (TRMF) framework that supports data-driven temporal learning and forecasting. We also directly compared our results to DeepAR model introduced in [7]. According to our benchmark the state-of-the-art models: o TRMF-AR: Temporal Regularized Matrix Factorization model [11] o SVD-AR(1): Explained in [11] o TCF: Matrix factorization with the simple temporal regularizer proposed in [42] o AR(1): n-dimensional AR(1) model [43] o DLM: the code provided in [44] o DeepAR [7] For other 3 datasets, we aimed to demonstrate how our model can be interpreted on these models. In addition, we presented how our model improves the predictive ability of linear regression models such as linear, lasso, ridge and Bayesian regressions. We also compared the predictive performance of our proposed model to deep learning baselines, introduced in this work [6] such as AIS-RNN, AIS-GRU and AIS-LSTM. In addition, TabNet model [45], which is the most popular and high-performance deep learning-based model for tabular data, is used as baseline model. In our proposed model, we need to define neural network architecture and other hyper-parameters for meta-learner. LSTM neural network was chosen as our meta-learner for time-series data. For Electricity and Traffic datasets, our meta-learner consists of two hidden layers, each layer consisting of 64 neurons since these datasets contain large number of entities. For other 3 datasets, we chose a larger meta-learner architecture, which consists of 2 hidden layers and each layer has 512 neurons.
We set the learning rate to 0.001 and the maximum epoch number for training to 1000. In addition, we can optimize the hyper-parameters of our meta-learner using additional experiments to improve the predictive performance, but the chosen hyperparameters render desirable performance in this study. Therefore, we did not make any additional experiments for hyperparameter optimization to save the computation time.
In addition, an Early Stopping algorithm was used for selecting the best model based on given hyperparameters. The smoothing parameters are chosen as =1e-06 and =1e-05. We configured the same model settings for all datasets, and datasets are partitioned into three parts; i.e., training (the first 70% of total data), validation (the next 10% of total data) and test (the last 20% of total data) sets, based on the time sequence.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  (14) where ̂ denotes the i-th predicted value, denotes the i-th actual value and is the number of observations. In addition, in order to make the same settings to directly compare the forecasting performance to these studies; [7], [11], [42], and [44]. Normalized Deviation (ND) and Normalized root mean square error (NRMSE) metrics were used to evaluate the predictive performance.  where ̂ denotes the i-th predicted value, denotes the i-th actual value, and is the number of observations.

D. PREDICTIVE PERFORMANCE
We first evaluated the predictive performance of our proposed method to compare other state-of-the-art baselines. We made the same settings to directly compare the forecasting performance with these studies [7,11]. ND and NRMSE metrics were used to compare the predictive performances.
As shown in Table 1, these two datasets are very large and consist of numerous number of entities. In order to apply our model to these datasets, we trained our model on each entity; 370 models were trained on electricity set and 963 models were trained on traffic set. We also need to set maximum sequence (maximum lag) length for our meta-learner (input variables for base-learner).
For electricity dataset, maximum sequence is equal to 2 and input variables includes one and two lags of 'power_usage', 'hours_from_start' and 'hour', 'day', 'day_of_week', 'month' variables. The 'hour', 'day', 'day_of_week', and 'month' were used as categorical variable and we used one-hot encoding for categorical variables.
Regarding traffic dataset, we chose maximum sequence is equal to 3 and one and two lags of 'values' and 'hours_from_start' are used as input continuous variables and 'time_on_day' and 'day_of_week' are used as input categorical variables. We also used one-hot encoding for categorical variables.
Our proposed model outperformed the state-of-the-art baselines on electricity and traffic datasets in terms of NRMSE metric, and showed the comparable results for ND evaluation metric (see Table 5).
We reported the predictive performance comparison between the state-of-the-art baselines and our proposed model in Table 5. Our model outperformed the state-of-theart baselines by achieving 0.791 NRMSE on electricity dataset and by achieving 0.407 NRMSE on traffic dataset. For ND evaluation metric, DeepAR model showed the best performance by achieving 0.07 and 0.17 ND on Electricity and Traffic datasets, respectively. Second, we performed the additional experiments to compare our proposed model with deep learning and regression baselines. In order to prove the effectiveness and robustness of our proposed model for time-series data, 'Nasdaq' from the finance dataset, 'CO(GT)' from the air quality dataset and 'Appliances' from the energy use of appliances dataset, were chosen as target variable. In addition, the maximum sequence is equal to 5 for all datasets and all independent variables were chosen as an input for meta-learner for experimental analysis.
For regression baselines, we chose lags (from 1 to 5) of independent and target variables as input to make the same settings with our proposed model.
Regarding deep learning baselines, we directly compared our predictive performance to our previous work [6]. Munkhdalai et al., [6] proposed an end-to-end recurrent neural network architecture equipped with an adaptive input selection mechanism, named AIS-RNN, to improve the prediction performance for multivariate time series forecasting. The proposed AIS-RNN model outperformed the baselines including Elman RNN, Gated Recurrent Unit (GRU), LSTM, Support Vector Machine, Random Forest, AdaBoost and Decision tree models by up to 38% on these 3 benchmark datasets.
To obtain unbiased regression coefficients, we selected input variables based on t-test and the variance inflation 9 VOLUME XX, 2017 factor (VIF) for our base-learner. We selected the variables whose p-value for t-test is less than 0.10 and VIF value is less than 10. The VIF value of selected variables for Finance, Air quality and Energy datasets were displayed in Figure 4-6, respectively.
Based on the selected base-learner models, we trained our meta-learner to improve their predictive performance. Table  6 reported the predictive performance for all models in terms of RMSE. We observed that our proposed model outperformed the baseline models including deep learning and regression baselines on the air quality dataset by achieving 0.62 RMSE, and showed the comparable results on energy dataset by performing 59.94 RMSE. While AIS-GRU performed the best on Finance dataset by achieving 60.70 RMSE, AIS-LSTM architecture achieved the lowest 59.81 RMSE on Energy dataset. Our proposed model for time-series data outperformed the OLS model on all datasets. For MAE and MAPE evaluation metrics, our proposed model outperformed the baseline models on air quality datasets by achieving 0.41 MAE and 30.71 MAPE, as shown in Table 7 and 8. We also displayed the actual and predicted target variables on Finance, Air quality and Energy test datasets in Figure 7, 8 and 9, respectively.   10 VOLUME XX, 2017 Typically, the predictive accuracy of linear regression is weaker than the state-of-the-art models, but after the augmenting by neural network, its predictive performance is improved dramatically. The most significant contribution of this work is that we improved the predictive power of linear regression without compromising its interpretability for time-series data.
The experiment results proved that our proposed model can achieve high predictive performance on time-series data, and it can be one of deep learning-based interpretable architecture for time-series forecasting problem. The next part of the experiments will show the interpretability of our proposed model.

E. MODEL INTERPRETABILITY
In this section, we consider 3 real-world datasets, which are the finance dataset [6], the air quality time-series data [40] and the energy use of appliances dataset [41]. We aim to explore the dynamic relationship between the target variable and input variables on these datasets using our proposed model.
Finance dataset: We select Nasdaq (stock index) variable as target variable and maximum sequence length (max lag) is equal to 5. The result of our base-learner is reported in Table 9. We can now easily interpret this result; for example: we can say that if the DTWEXB_1 (1 day lag of trade weighted U.S. dollar index) increases by 1 point the previous day, the Nasdaq index will fall by an average of -0.58 points. In addition, figure 10 showed the dynamic relationship between input and output variables on test set. We can see how the impact of the variable DTWEXB_1 on Nasdaq index has changed over time. The impact of DTWEXB_1 was highly volatile in 2015-2016, but since 2017, the impact has increased to 0.65. In addition, the coefficients of input variables are moved depending on the change of the target variable over time. We can also explain other input variables same as DTWEXB_1. Air quality dataset: CO(GT) (True hourly averaged concentration CO) variable was chosen as target variable for air quality dataset. The result, as shown in Table 10, we can see the impact of each variable on the CO(GT). For example, CO (GT) is positively correlated with its lag of one hour, and if CO (GT) increases by 1 point an hour earlier, it will be increased 0.82 points an hour later. We also displayed the dynamic relationship between input variables and CO(GT) for 24 hours in Figure 11. We now can see that the impact of CO(GT)'s lag of one hour is equal to 0.85 between 12:00AM and 5:00AM and less than 0.80 between 5:00AM and 10:00AM. Furthermore, the impact of CO(GT)'s lag of one hour is increased from 10:00AM to 5:00PM and started to decrease after 5:00 PM. We can make similar conclusions for other input variables. 14 VOLUME XX, 2017 In addition, we displayed the hourly local impact of input variables on CO(GT) over time for Air quality dataset in Figure 13.
Energy dataset: Appliances energy consumption was chosen as target variable. Table 11 presented the result of linear regression. From the result, we can conclude that if 10 minutes lag of T2 (Temperature in living room area) is increased by 1 celsius, appliances energy consumption will be increased by 1.85Wh. This may be due to the use of air conditioners or fans. In addition, Figure 12 showed the dynamic relationship between input variables and appliances energy consumption for 24 hours. We can see that the impact of 10 minutes lag of T2 is constant at night, while during the day its impact is highly changeable. We also displayed the 10 minutely local impact of input variables on Appliances over time for Energy test dataset in Figure 13.
In the end, our experimental results showed that our proposed model can suggest a promising direction for interpretable machine learning that can combine the linear regression and neural networks.

V. CONCLUSION
In this work, we introduced a novel locally adaptive interpretable regression for time series data. We augmented a linear regression by recurrent neural networks that predicts percentile of Gaussian distribution for each regression coefficient to make them adaptable. We conducted an extensive set of experiments to show the interpretability and predictive power of our proposed model. Our model significantly improved the predictive performance of linear regression without comprising its interpretability, and demonstrated the good predictive performance to compare with the state-of-the-art time series models and regression baselines. We also applied our model to finance, air quality and energy time-series data to explain the dynamic relationship between input and output variables. As a result, we displayed how the input variables affects the target variable over time.
A more general AI-based solution to the interpretable issue is to train another model to learn to explain the main predictive model. Our proposed architecture is the first attempt to design interpretable model that has high predictive performance and interpretability for time-series 18 VOLUME XX, 2017  Thang University, Vietnam, as well as an Emeritus and Endowed Chair Researcher with Chungbuk National University, South Korea, and also an Adjunct Professor with Chiang Mai University, Thailand. He is also an Honorary Doctorate of the National University of Mongolia. He is not only the Director of the Database and Bioinformatics Laboratory, South Korea, since 1986, but also the Director of Research Group, Data Science Laboratory, Vietnam, since March 1, 2019. He is the former Vice-President of the Personalized Tumor Engineering Research Center. He has published over 1000 referred technical articles in various journals and international conferences, in addition to authoring a number of books. His research interests include databases, spatiotemporal databases, big data analysis, data mining, deep learning, biomedical informatics, and bioinformatics. He has been a member of the IEEE, since 1982 and a member of the ACM, since 1983. He has served on numerous program committees, including roles as the Demonstration Co-Chair of the VLDB, as the Panel and Tutorial Co-Chair of the APWeb, and as the FITAT General Co-Chair. In 2008, he founded the FITAT International Group for providing a professional community the opportunities for publications, knowledge exchange, teaming, and cooperation.