The Use of Mutual Information to Improve Value-at-Risk Forecasts for Exchange Rates

In this paper, we show a simple but novel approach in an attempt to improve value-at-risk forecasts. We use mutually dependent covariate returns to create exogenous break variables and jointly use the variables to augment GARCH models to account for time-variations and breaks in the unconditional volatility processes simultaneously. A study of hypothetical mutual dependencies between volatility and the covariates is first carried out to investigate the levels of the shared mutual information among the variables before using the augmented models to forecast 1% and 5% value-at-risks. The results provide evidence of some substantial exchange of information between volatility and the lagged exogenous covariates. In addition, the results show that the estimated augmented models have lower volatility persistence, reduced information leakages, and improved explanatory powers. Furthermore, there is evidence that our approach leads to fewer violations, improved 1% value-at-risk forecasts, and optimal daily capital requirements for all the models. There is, however evidence of relative superiority of the majority of the models for the 5% value-at-risks forecasts from our approach, although they have relatively higher failure rates. Based on these results, we recommend the incorporation of our approach to existing risk modeling frameworks. It is believed that such models may lead to fewer bank failures, expose banks to optimal market risks, and assist them in computing optimal regulatory capital requirements and minimize penalties from regulators.


I. INTRODUCTION
Risk measurement is one of the most important tasks in financial risk management for banks, corporate treasuries and portfolio management firms as well as other financial institutions and practitioners. In financial institutions, risk estimates are used to compute capital requirements -the amount of capital that must be added to a position to make its risk acceptable to regulators [21]. It is also employed in decision-making regarding hedging of assets and portfolio optimization. One of the most commonly used risk measures among financial institutions is value-at-risk (VaR) with the underlying asset's volatility as an input. Inaccurate volatility forecasts may lead to underestimation or overestimation of the actual VaR forecast and financial institutions may lose the opportunity cost or would not be able to recover VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ losses at crisis periods [52]. Since VaR computation requires volatility input, VaR accuracy is dependent on accurate volatility forecasts. Accurate volatility forecasts yield optimal VaR forecasts and less VaR violations, thus VaR forecasting assessment provides an indirect assessment of the predictive abilities of competing volatility models [15]. A common approach for computing the volatility input of VaR models is the use of GARCH models [5], however, findings from competing GARCH models demonstrate seemingly poor volatility forecasts [16]. This problem has been partly attributed to the models' failure to account for breaks and time-variations in the unconditional volatility [2], [43]. Models, which fail to account for breaks in the unconditional variance lead to sizable upward biases in the degree of persistence in the estimated GARCH models. The fitted models thus may fail to track changes in the unconditional variance and produce systematically underestimated or overestimated volatility forecasts on average, over long horizons [43].
Risk premium theory links the returns of an asset to its volatility [13] and asset returns move in tandem. Taking into account of exchange rate co-movements and the volatility-returns relationship of an asset, there is a possibility of an implied reverse relationship between volatility and inter-market 1 covariate returns. Due to the perceived reverse relationship and the fact that the inter-market covariate returns are exposed to similar uncertainties in the market, the covariates may be adequate proxies for uncertainties in the markets which could also be used to construct break variables. Several attempts in literature have addressed the problem of structural breaks and time-varying unconditional volatility, the use of inter-market covariates to proxy market uncertainty, and its break variables to account for time-variations and breaks in the unconditional volatility of GARCH models, however, have not been given much attention, thus a gap in literature exist. The paper is an attempt to narrow this gap.
In narrowing this gap, the proxies (inter-market), and the break variables (constructed from the proxies) are used to augment GARCH models to account for changes, and breaks in the unconditional volatilities. Empirical studies suggest that changes in the levels of uncertainties in the market affect the unconditional volatility of assets [2], [3], thus this approach has empirical support. One advantage of the approach is that, due to the high levels of co-movements between assets, the proxies and the break variables may share substantial levels of mutual information with volatility, thus it is anticipated to yield improved volatility and valueat-risk forecasts. The approach does not require structural modifications of existing models, thus its implementation is simple and could be easily integrated into various volatility and value-at-risk modeling frameworks. 1 Inter-market covariates are assets traded on the same market platform with the same delivery date. For example, USD/ZAR, and EUR/ZAR rates are considered inter-market covariate when they are traded on the same market platform (such as the inter-bank forex market) and the same delivery date (such as daily).
A pre-study of the hypothetical mutual dependencies between pre-estimated volatilities and the exogenous variables (proxies together with their respective breaks) is undertaken to investigate the levels of the dependencies and their possible significance. The Jackknife-bias corrected Kernel density estimation approach is used in computing the mutual information while Pearson's method is used in computing the strength of the linear dependency. Simple t-tests are used to test the significance of the linear dependencies. In assessing the individual accuracies of the VaR estimates, the conditional coverage test of [17] and the unconditional coverage test of [35] are used. Competing models, which pass both tests, are then ranked using the MCS procedure of [28]. We also employ several other model comparison tools to assist in selecting the best models. To assess the usefulness of the estimated models to banking institutions, we conduct capital requirement analysis based on the regulations laid down by the Basel accord II and III accords.
The paper is a contribution to literature on value-at-risk forecasting for emerging markets. It would also provide further evidence in support of theoretical and empirical studies, which advocate that structural breaks have potentially important implications for estimated GARCH models and value-at-risk forecasts. Volatility and returns are linked by risk premium, hence, evidence of mutual dependencies between the covariates and volatility will indirectly support the traditional theory of return co-movements. For the rest of the paper, the concept of VaR is discussed in section two. In section three, we formally establish the theoretical link between inter-market covariates and volatility as well as the construction of the exogenous break variables. Data and methodologies are presented in section four. We present the results and discussions in section five and summarize the paper in section six.

II. THE CONCEPT OF VALUE-AT-RISK AND METHODS OF ESTIMATION
Value-at-risk is concerned with the possibilities of losses associated with a portfolio, at a given time. It is a downside risk metric, which measures the risk as to whether the actual return will fall below or above the expected returns. In simple terms, it is the uncertainty about the magnitude of the differences in returns and the expected returns. It is the preferred risk metric by many experts because of its perceived superiority in backtesting the estimated losses [45]. Technically, value-at-risk is defined as the maximum portfolio loss at a given confidence level, α within a time interval [54].
Consider a scenario where there is a portfolio of risky assets held over a fixed time in the horizon . If the loss distribution associated with this portfolio has distribution function F L (l) = P (L ≤ l), the maximum possible loss inf {l ∈ R : FL (l) = 1} evaluates the level of risk associated with holding the portfolio over the horizon . This scenario leads to the technical definition of VaR below adapted from [45].
Definition 1: Given a confidence level α ∈ (0, 1), the VaR of a portfolio is given by the smallest number l such that the probability that the loss L exceeds l is no larger than (1 − α), that is: In probabilistic terms, value-at-risk is a quantile of the loss distribution. Given losses L, the generalized inverse F ← is called the quantile function of L such that: Alternatively, value-at-risk can be constructed from the probabilistic function of the underlying returns, which is given in definition 2 [45]. Definition 2: Consider the returns of an asset r t with the change in the value of the asset over the next k periods defined by r t = V (k) = r (t + k) − r (k). The value of VaR over time horizon k associated with the left tail probability α of the returns' distribution is defined as: Value-at-risk is a function of time and the left tail quantile of the distribution with probability α. Throughout this paper, any computations and assessments relating to VaR models are based on definition (2).

III. THEORETICALLY LINKING VOLATILITY TO RETURNS OF INTER-MARKET COVARIATES
In linking volatility to the returns of inter-market covariates, we make the following assumptions: 1. Exchange rate returns are second-order stationary processes. 2. Exchange rate returns are co-integrated. 3. The relationship between two or more exchange rate returns evolve via an ARMA process. 4. The expectation of the square of the ARMA process in assumption (3) is analogous to an exogenous GARCH process. The plausibility of assumption (1) is derived from empirical studies such as [53] among others. Assumption (2) can be indirectly inferred from the empirical evidence of exchange rate cointegration, which can be found in studies like [20], [34]. Assumption 3 can be deduced from Wold's representative theorem in conjunction with assumption 2. Finally, assumption 4 can be statistically derived from assumption 3.
Consider the endogenous exchange rate returns R t and a set of exogenous returns r j,t , where j = 1, 2, . . . k and k is the total number of currency pairs. If R t and r j,t are second-order stationary cointegrated processes, then, there exist integers p, q and real coefficients λ 1 , λ 2 . . . λ k , ϕ 1 , ϕ 2 . . . ϕ p and θ 1 , θ 2 . . . θ q such that by assumption 3 (adapted from [54]): θ r a t−r + a t (4) where R t−i is the autoregressive component, a t−r is the moving average term, and a t is the mean corrected return (innovations). By squaring both sides of (4), it can be shown that: For stationary asset returns, the unpredictability of the direction of the returns suggest that the autocorrelation function up to lag 1 is ρ = 0. Applying the law of total variance and taking expectation with respect to the information set F t−1 , the volatility h t can be expressed as It can clearly be seen that equation 6 is analogous to the exogenous version of [14], where the exogenous term is represented by the last set of terms. These terms move in tandem with volatility by the indirect implications of the asset-return co-movement theory, thus, their absolute values are used to proxy the degree of uncertainties in the exchange rate market in this paper. The use of the absolute returns of the exogenous terms as proxies instead of the original values are due to the fact that market uncertainty is a positive measure. If one considers the positive returns from the k exogenous exchange rates with a sample of size N , then, the lag-one break variables B j,t−1 are constructed using the definition below: . . N and j = 1, 2, . . . k (7)

A. DATA
We use daily inter-bank closing spot-ask prices from the rand forex market to illustrate the empirical significance of our approach. The rand forex market was chosen because being an emerging market and ranked as the 18th world's most traded currency [8], it has not received much attention in terms of value-at-risk forecasting in comparison to other emerging and developed markets. The period of the data spans from and EUR/ZAR among others were selected. Due, however, to problems such as autocorrelations and serial correlations of some pairs and the fact that some pairs from developing and emerging economies have not received substantial attention in literature when it comes to volatility and value-at-risk forecasting, only ten currency pairs were selected for the empirical exercise. The daily returns used in the study are continuously compounded as indicated in equation (8).
An upper case letter denotes an endogenous return while lower case letter denotes an exogenous return.

B. MUTUAL INFORMATION
Information theory attempts to study communication systems. It can be employed in the study of statistical dependency between random variables via mutual information (MI). Mutual information measures the reduction in the uncertainty about a random variable conditioned on the knowledge of another variable, thus it is more closely related to the concept of entropy (a measure of the uncertainty of a random variable or the expected amount of information contained in a variable) introduced by [49]. There are several ways of measuring dependency [58] although, MI is by far the best statistic in several ways [46]. Unlike linear correlation, MI is more general in the sense that it contains all information about the dependency of variables including linear and non-linear and it is very effective in measuring any kind of relationship [18]. MI also has a straightforward interpretation, grounded in information theory and insensitive to the size of data sets. Mutual information is a function of the expected amount of information contained in a variable, called entropy [47]. Given a pair of continuous random variables (X , Y ) with values over the space χ ×y the entropy of X denoted H (X ) is defined by [18] as where b is the base of the logarithm, f (x i ) is the probability density function of X ; and X = x 1 . If y = y 1 , the joint entropy of X and Y is defined as The mutual information between X and Y [18]: Mutual information between bivariate variables can either be zero (corresponding to independence) and positive (corresponding to dependency). When MI (X , Y ) = 0, observing Y tells us nothing about X , thus the variables are independent. MI is a positive unbounded measure i.e. MI ∈ [0, ∞) with unit in bits (base 2) or nats (base e).
Among the commonly used estimation approaches for mutual information are the probability density-based methods such as the Burg's maximum entropy method (Burg's MEM), the kernel density estimation (KDE), and the nearest-neighbor approach. For instance, [9], [10] used the Burg's MEM to model the flow of information between financial variables. Reference [42] also developed a new KDE for application to large high-dimensional datasets frequently used in genomic experiments. In as much the Burg's MEM of [9], [10] and the KDE of [42] are non-parametric estimators and are suitable for high-dimensional datasets; however, they differ in three major aspects.
In the first instance, the new Kernel density estimation is based on Shannon's definition of entropy and joint entropy. The probability density function is estimated by filtering the data with a kernel, which is then normalized with an integral of one, which is usually symmetric and localized. The Burg's MEM on the other hand is based on the extrapolation of the autocorrelation function of variables by using the entropy rate definition after which the estimation of the power spectral density is estimated by the Fourier transformation of the extended autocorrelation function.
Secondly, the new Kernel density estimation is purposely designed to model static relationship between variables while the Burg's MEM is designed to model dynamic relationship and to predict future entropy of a time series by exploiting the unknown but predicted autocovariance function of the future time interval.
Thirdly, the Burg's MEM relies on the assumption of second-order stationary. If this hypothesis is not satisfied, the series can be partitioned into smaller epochs, which are approximately stationary, or the series can be represented by alternative functions instead of the usual sine and cosine functions. Simulation studies, however, have shown that the Burg's approach to spectral analysis is robust in the presence of non-stationarity. Unlike the Burg's MEM, the KDE does not rely on stationarity assumption, although, it relies heavily on the choice of the tuning parameters, thus the corresponding estimators may be very unstable or seriously biased. This problem is addressed by using the Jackknife-bias corrected algorithm. The Jackknife-bias corrected Kernel density estimator automates the bandwidth selection such that the optimal bandwidth is estimated. This helps to reduce the bias at the boundary region and thus improve the efficiency of estimation [58]. Unlike the randomized resampling approaches for correcting bias in Kernel estimation, the Jackknife approach is deterministic in the sense that it gives the same result when re-applied to any given data. In addition, by restricting the resampling to a specific group of n subsamples, substantial computational costs can be avoided. The approach puts an upper limit on the number of subsamples and the relationships between Jackknife repetitions can be exploited to avoid redundant computations [42].
Lastly, unlike the Burg's MEM, the Kernel density estimation models the distribution of a continuous variable as a mixture of conditional distributions for each level of a categorical variable, thus it is suitable for the estimation of mutual information between a mixture of continuous and discrete variables. Due to this, KDE can be used to model the relationship between a mixture of discrete or categorical and continuous variables without the need for variable transformation.
In comparison to other density-based estimation approaches such as the mirrored KDE, ensemble KDE, copula-based generalized nearest-neighbor graphs and the mixed generalized nearest-neighbor graphs, the Jackknifebiased corrected KDE is more computationally efficient. In addition, the procedure is completely data-driven and there is no need for a predetermined tuning parameter. It does not necessitate boundary correction and yet it retains the same estimation efficiency because the boundary biases are eliminated automatically. Furthermore, the estimates are numerically stable. Due to these advantages over existing methods, the Jackknife-biased corrected KDE is employed in this paper.

C. VALUE-AT-RISK ESTIMATION
There are several metrics for computing risk but in this paper, we use the VaR metric. VaR metric corresponds to an amount that could be lost at some pre-selected probability. It also measures the risk of the risk factors and the risk factor sensitivities. The metric applies to all activities and types of risks in financial institutions and it can be compared across different markets at different exposures. The metric can also be measured at any level, from a single trade or portfolio case up to a single enterprise-wide metric covering all the risks in the firm. It can be used to find the total VaR of a very large portfolio in aggregated form or to isolate component risks corresponding to different types of risk factors in disaggregated form. The metric, moreover, accounts for the dependencies between the component of assets or portfolios [1]. The methods used in computing VaR includes the historical simulation method [33]. One of the advantages of this approach is that it determines the joint probability distribution of the market variables and avoids the need for cash-flow mapping; however, it is computationally slow and does not easily allow volatility updating schemes to be used [30]. Notwithstanding the disadvantage of the historical simulation method, the study use this approach to estimate and forecast VaR, because the conditional volatility models used here are built on historical returns. The GARCH framework is used to specify the volatility input. By definition [32] VaR is given by where VaR D t denotes the VaR estimate based on volatility model with appropriate distribution D,ĥ t is the forecasted volatility at time t and µ is the estimated mean from the volatility model. In considering the returns series R t we estimate the mean, µ and the volatility h t using the ARMA-GARCH representation [54], (13) where ϕ 0 , ϕ i and θ j are parameters, R t−i is the i th lag autoregressive term with corresponding order p, ε t−j is the j th lag moving average component with order q and ε t term is the mean corrected return (innovations). The η t is a sequence of i.i.d random variables and h t is the conditional variance.
In this paper, we consider the conditional volatilities for GJRGARCH, NAGARCH, SGARCH, TGARCH, and EGARCH. The break variable in (9) and the absolute values of the exogenous covariates in (8) are passed to the unconditional volatilities of the GARCH models to augment the models. The augmented form of the standard GARCH model of [14] is defined by adding the market uncertainty proxy identified in equation (6) and the break variables from equation (7), hence, To ensure that h t > 0 the parameters are conditioned, such that α 0 > 0, α i ≥ 0 and β i ≥ 0. The necessary and sufficient condition required for the existence of the second moment of the returns is that The GARCH model is popular because of its simplicity in modeling very complex GARCH processes. The GJR version [25] is formulated as where I t−i is an indicator function, which takes the value of 1 if a ≤ 0 or 0 otherwise. To ensure that the variance, h t > 0, the parameters are constrained such that α 0 > 0, α i ≥ 0, β i ≥ 0 and α i +γ i ≥ 0. The γ i parameter provides information about asymmetric effects. If γ i = 0, there is no volatility asymmetry, if γ i > 0 negative shocks will increase volatility more than positive shocks of the same magnitude and if γ i < 0, positive shocks increase volatility more than negative shocks. The persistence parameter is where κ is the expected value of the standardized residuals z t below zero (effectively the probability of being below zero) The exponential GARCH version [41] is formulated as α i , β j and γ n are real constants with p and q being the respective orders. To ensure the existence of a second moment, α is conditioned such that q j=1 α j < 1. The main advantage of the EGARCH over the standard GARCH and GJRGARCH is that the EGARCH model does not require any artificially imposed non-negativity constraints on the model parameters.
The threshold GARCH version [57] of the augmented model is where all notations are the same as the GJRGARCH.
The nonlinear asymmetric GARCH specification [19] is formulated as: where δ i is a parameter, which controls shift (asymmetry for small shocks) in the news-impact curve. The existence of the second moment requires that Due to numerical difficulties encountered by the Quasi Maximum Likelihood Estimation (QMLE) method, we use variance targeting to alleviate the degree of these difficulties [22].

D. BACKTESTING METHODS FOR EVALUATING VAR ESTIMATES
Backtesting techniques require the simulation of VaR models on past returns and the predicted losses from VaR calculations are then compared to the actual realized losses at a given time horizon. The comparison identifies periods where the portfolio losses are greater than the expected VaR. If the expected return is less than the estimated VaR, a violation or exception occurs, thus backtesting techniques are used to systematically count the number of these violations or exceptions and compare them to acceptable rates at pre-selected confidence intervals. Consider the realization of asset returns over a fixed time interval R t,t+1 with VaR estimated at time t and probability of α defined by VaR t (α). The hit function as defined in [32] is If the loss at day (t + 1) is larger than the predicted VaR estimate, the hit sequence returns 1 or 0 otherwise. Reference [17] explains that the hit sequence of VaR estimates needs to pass the tests based on the unconditional coverage and independence properties before they can be deemed accurate. Under the unconditional coverage property, the probability of loss at day (t + 1) being larger than the predicted VaR estimate should be exactly (1 − α) or equivalently, the probability of loss at day (t + 1) being smaller than the predicted VaR estimate should be exactly α. Mathematically a VaR model has correct unconditional coverage if: It also has correct conditional coverage if: π is the sample average. The first null corresponds to the unconditional coverage test while the last one corresponds to the conditional coverage test. In this paper, the unconditional coverage Kupiec test [35] and the Christoffersen's interval forecast test [17] are used.

E. MODEL COMPARISON TOOL
Due to large set of models available for use by financial institutions, model comparison has become an integral part of the model-building process. The Model Set Confidence (MSC) test of [27] is convenient for comparisons where there is no natural benchmark. This advantage of MCS's procedure makes it more suitable for use in this study because the reference models in this study are not natural benchmarks. Even though the MCS procedure does not require a natural benchmark model as in the case of multiple comparison procedure with controls, the procedure can still rank the models in order of superiority after selecting the superior set of models. One main advantage of this procedure is that after a set of superior models have been selected, the models can be aggregated and used to forecast future volatility levels, predict future levels of observations, conditioned on past information [11] or forecast value at risk levels [12].

A. DESCRIPTIVE ANALYSIS
Descriptive statistics are reported in Table 1 while time plots of the log-returns of the entire length of the sample are  displayed in Figure 1. A look at Figure 1 reveals that high returns are more likely to be followed by another high returns. This is an indication of the presence of asymmetry in the log returns. The Figure also depicts time-varying volatility clustering of the log-returns. These observations confirm the suitability of GARCH models for the data. In Table 1, relatively extreme negative returns appear to be more pronounced than extreme positive returns for the currency pairs: BWP/ZAR and NOK/ZAR. In addition, it is observed that all the returns are slightly and positively skewed except for BWP/ZAR and MWK/ZAR pairs, which are negatively skewed. The excess kurtosis for all the returns is large and far from zero and the Jacque-Bera tests statistics are very large. This is an indication of non-normality of the return with associated heavy tails, thus heavy-tailed distributions may be appropriate in modeling the volatility of the returns.

B. ESTIMATION OF HYPOTHETICAL MUTUAL INFORMATION
One of the main reasons we advocate the use of inter-market covariates and their break variables to model time-variations and breaks in the unconditional volatility of GARCH models is their perceived mutual dependence with volatility, thus in this section, we investigate the levels and the significance of these dependencies. The latent nature of volatility implies that actual shared information cannot be computed, thus the mutual information computed in this paper are hypothetical in a broader sense. Since we intend to compute value-at-risk for the currency pairs: MWK/ZAR, BWP/ZAR, BRL/ZAR, ILS/ZAR, and SEK/ZAR-it is more appropriate to use pre-estimated volatilities for these pairs using similar specifications (refer to Table 4), as that will be employed in estimating the value-at-risks. Individual estimation of the volatilities of the above currency pairs is time-consuming, thus the simultaneous estimation approach implemented in rugarch package [24] is utilized. The GARCH estimation procedure used for computing the value-at-risks in this paper is based on a moving window, thus simple lag-one moving averages of the variables are computed. The lagging is required because we are interested in using the variables for forecasting. Since MI is a positive unbounded measure, to be able to compare two MI estimates we normalize the estimates using equation (23)adapted from [18].   The empirical results of the mutual dependency estimation are reported in Table 2, 3, and Figure 2. From In Table 3, the endogenous break variables for MWK/ZAR and SEK/ZAR respectively share more percentages of mutual information with their respective volatilities than the corresponding exogenous break variables. In relation to the volatility of MWK/ZAR, the non-normalized shared information with the exogenous SEK/ZAR break variable is, however, slightly higher than that of the endogenous break variable. In respect of the volatility of BWP/ZAR, the break variables for SEK/ZAR and INR/ZAR have the highest percentage of exchanged information with the volatility. A similar statement can be made for the break variables for MWK/ZAR, and SEK/ZAR with respect to the volatility of ILS/ZAR although the non-normalized exchanged information is relatively higher for the break variable of SEK/ZAR. Finally, the break variable for MWK/ZAR exchanges the highest percentage of information with the volatility of BRL/ZAR. In general, among the variable pairings considered in the paper, the exogenous break variables tend to exchange relatively higher mutual information with volatility in comparison to the endogenous break variables. This suggests that break variables constructed from exogenous returns have higher likelihoods of volatility predictive abilities, hence, are more likely to be adequate than the endogenous break variables in accounting for breaks in the unconditional volatilities of exchange rates.
There are quite a lot of variable pairings with insignificant linear dependencies although their respective BCMI are substantial. For example, the strength of the linear dependency between the returns of BWP/ZAR and the volatility of MWK/ZAR is abysmally low and insignificant although the normalized BCMI is about 32%. This suggests that, it is very unlikely that the two variables will move linearly together, thus the nature of the relationship between the two variables is more likely to be non-linear. This observation is consistent with the rationale behind the concept of mutual information in measuring dependency [18]. Based on the foregoing deduction, the nature of the relationships between the variable pairings in Tables 2 and 3 (indicated by cells with asterisks) are more likely to be non-linear, hence non-linear volatility models may be more appropriate for estimating volatility of the exchange rate returns.
When the joint distribution of paired variables is a bivariate normal, there is an exact logarithmic relation between mutual information and linear correlation coefficient ρ which is defined by [23] MI = −0.5 log 1 − ρ 2 (24) It is observed from Figure 2 that, the relationship between MI and correlation coefficient does not assume the form of equation (24) but rather it evolves randomly. This implies that the joint distributions of the paired variables are not bivariate normal, hence, since the individual variables are characterized by heavy-tailed distributions (see the data description section); the bivariate joint distributions may be a mixture of heavy-tailed distributions. In addition, the relationship suggest that increased levels of mutual information are not associated with increased strength of linear dependencies, thus there may be low levels of shared information between variables, although, the strength of the linear dependency structure may be comparatively higher.
In conclusion, the levels of exchange mutual information between volatilities and their respective lagged covariates provides substantial evidence of the predictive abilities of the covariates and these may be adequate in modeling variations and breaks in the unconditional volatilities of GARCH models for exchange rates. The results indirectly support the findings of returns dependencies from [40].

C. VOLATILITY AND VALUE-AT-RISK ESTIMATIONS
In this and the next section, the VaR estimation procedures discussed in the methodology section are applied to data from VOLUME 8, 2020   Table 4. Hereafter, the augmented VaR models are represented as VaR A while the benchmark (reference or non-augmented) models are represented as VaR B . For each model, one-step-ahead VaR forecasts at 5% and 1% confidence levels (corresponding to the risk metrics methodology and the requirements of the Basel II accord) were obtained. The rolling window with a re-fitting interval of 100 days was used. From the empirical characteristics of the data under the descriptive analysis section, the return data were found to be heavy-tailed with high peaks, thus the innovations from the standard VaR-GARCH, VaR-NAGARCH, and VaR-GJRGARCH models are modeled with the Jonson's SU reparametrized distribution (JSU). The skewed generalized error distribution (SGED) is used to model the innovations from the VaR-EGARCH model while the normal inverse Gaussian distribution (NIG) is used to model the innovations from the VaR-TGARCH model.

1) IN-SAMPLE VOLATILITY ESTIMATION
In this section, we discuss the in-sampling performances of the volatility models. The empirical results are reported in Table 5. With the exception of the GJRGARCH model, at least one of the regressors has a significant impact on volatility. For instance, the break variables have significant effects on the volatilities of BWP/ZAR, SEK/ZAR, and BRL/ZAR while proxy 1 has significant effects on the volatilities of ILS/ZAR, and BRL/ZAR. These observations suggest that the levels of uncertainties surrounding the exchange rate markets indeed have significant impacts on the estimated volatility. Furthermore, they suggest that extreme events as captured by the break variables have significant impacts on the estimated volatility. The aggregated impacts of all the regressors are seen in the improved statistical efficiencies of the estimated models as discussed below.
The log-likelihood estimates from the unrestricted models (augmented) are significantly higher than the values from the restricted versions (non-augmented). Consequently, the absolute AIC values for the unrestricted models are higher than the values for the restricted versions. The results suggest that, the incorporation of the mutual information into the volatility modelling process has the tendency to reduce information leakages in GARCH models. Furthermore, it is observed that, shocks persist more in the non-augmented models than they do in the augmented models. Consequently, it takes a shorter time for half of the shocks in the augmented models to decay than it takes in the non-augmented models. The augmented models also yielded lower estimated unconditional variances. The adjusted R-squares from the Mincer-Zarnowitz regression suggest that the explanatory powers of the augmented models are higher than the non-augmented models. By implication, the consistent lower MAE and RMSE values observed from the augmented models indicate that the models have relatively superior predictive abilities over the non-augmented versions. The volatility persistence results as well as the predictive accuracies are in full support of the results from [43].

2) OUT-OF-SAMPLE VOLATILITY ESTIMATION
Summary statistics of the forecasted volatility components of the VaR models and their performance metrics are reported in Table 6. It is evident from the Table that the augmented models have larger log-likelihoods and these are significant in three of the models. These suggest a possibility that the augmentation may have controlled significant information leakages from the forecasted models. In addition, the volatility persistence of the augmented models is consistently smaller, thus they may be suitable for modeling series that exhibit IGARCH effects. These results are consistent with the in-sample observations. In terms of the RMSE and MAE values, the non-augmented models for SGARCH and GJGARCH are relatively superior to the augmented versions. Similarly, the augmented models for NAGARCH  and TGARCH are relatively superior to the non-augmented versions, however, neither versions of the EGARCH model is superior to each other. This is because the augmented version is superior in terms of the MAE while the non-augmented version is superior in terms of the MSE, thus there is no clear superiority of any of the versions. Based on the overall observations, we cannot generalize the superiority nor the inferiority of the augmented models in forecasting volatility. The results are in full agreement with [42] in terms of volatility persistence but in partial contrast to the same study in terms of the accuracy of the forecasts. Inspections of the plots give impressions that there were very few VaR violations in all the models (represented by the red dots) for the 99% quantile but relatively more violations for the 95% quantile. There is pre-indication that all the models may respond early to changing market conditions. This is due to the fact that the evolutions of the VaR estimates seem to be non-clustering, however, albeit useful, the visual inspections do not constitute proper backtest analysis, hence, VOLUME 8, 2020 we proceed to discuss the Kupeic's and the Christoffersen's tests reported in Table 7. The critical values corresponding to 1% and 5% Chi-Squares with one degree of freedom are 6.635 and 3.841 respectively. Similarly, the corresponding two degrees of freedom critical values are 9.21 and 5.99 respectively. A good VaR estimate must pass both the unconditional coverage and the independence tests, thus our interest is in failing to reject the nulls of these tests. The nulls are rejected when the test statistics are higher than the corresponding critical values.

3) BACKTESTING VALUE-AT-RISK MODELS
It is noted from the Table that the test statistics for all the models are below the corresponding critical values except for the VaR (GJR) and VaR A (S) models for the 5% VaR estimates, therefore, the nulls for correct exceedances and independence of failures are not rejected for the models with smaller test statistics. These signify that the respective models passed the tests at 1% and 5% confidence levels.
Observations from Table 7 indicate that all the models for the 99% VaR estimates, with the exception of the VaR B (T) have LRUC test statistics, which are less than the 6.635 critical value, thus we failed to reject the nulls of correct exceedances of VaR violations from the models. Similarly, we fail to reject the nulls of correct exceedances of VaR violations from the 95% estimated VaR models, with the exception of the VaR A (GJR), VaR B (GJR) and VaR A (S) models. These results suggest that all the models are significantly accurate and acceptable for making risk decisions, however, we cannot categorically make this conclusion. This is due to the fact that, the unconditional test does not account for time-varying volatility, in the sense that it ignores the time losses which occur, thus, the test may fail to reject a model that produces clustered VaR violations [45]. To address this problem, we use the conditional coverage test. Observations from Table 7 indicate that the nulls of correct exceedances of VaR violations and independence are not rejected for all the models except the 95% VaR(GJR) models. The observed failure rates of all the models except for the 95% VaR(GJR) models, hence, are not significantly different from the corresponding expected rates. We can, therefore, categorically conclude that with the exception of the 95% VaR(GJR) models, the VaR estimates from all the models are significantly accurate and would respond early to changing market conditions, without clustering over time.

4) COMPARING VALUE-AT-RISK MODELS
In comparing the superiority of the estimated VaR models, the MSC procedure alongside with the number of VaR violations, VaR violation failure rates and the ADmean and ADmax of VaR violating returns contemplated in [39] are used. In theory, the number of expected violations with 95% and 99% confidence levels for 1-step ahead forecasts are 22 and 4.4 (5% of 440 and 1% of 440) respectively. Models with violations closer to the expected violations tend to have less failure rates. VaR estimates only an upper bound on the losses that occur with a given frequency; thus, we do not know anything about the sizes of the potential losses, which is of much interest to financial risk practitioners. To address this theoretical drawback of VaR, an alternative risk measure, the expected shortfall (ES), is used. Given an integrable loss L with E (|L|) < ∞ having a continuous distribution function F L and a confidence level α ∈ (0, 1), the expected shortfall is defined as [45] In addition to the VaR estimates, we report the mean ES values as well as the MSE values for comparison purposes. A model with smaller mean VaR, mean VaR loss, mean ES and MSE of the expected shortfall, less failure rate, superior MSC rank, as well as a minimum ADmax and ADmean, is preferred. It may not be possible for a model to be superior in terms of all the eight metrics, hence, a voting criterion based on these metrics is introduced to help in selecting the model with overall superior ability (i.e. a model with majority of the votes is adjudged superior). The results of the comparison metrics are reported in Tables 8. A look at the MCS values reveals that, all the augmented models were selected into the superior set of models at both 1% and 5% confidence levels and they are consistently ranked number one. The VaR B (NA) and the VaR B (T) models were eliminated by the procedure, thus in terms of this test; the corresponding augmented models have superior predictive abilities.
On the average, the augmented models yield lower failure rates in comparison to the referenced models at 1% confidence level but not at 5%, which is in agreement with the results from [36], hence, the 1% confidence level VaR models may lead to fewer bank failures. In addition, the augmented models tend to have lower MSE values from the expected shortfall estimates and lower mean VaR losses, hence; the augmented models have superior predictive abilities in comparison to the reference models, however there are exceptions to this generalization as seen in the VaR(S) and VaR(E) models. It is further observed that, the mean absolute VaR estimates from the augmented models are consistently lower than the estimates from the reference models. Furthermore, the mean ES for all the augmented models are lower than the values for the corresponding reference models, except for the VaR(S) and VaR(E) models, therefore, on average, it is anticipated that the use of these augmented models may lead to lower bank costs. The augmented models also tend to have lower mean absolute deviations for VaR violating estimates but produce large maximum absolute deviations in some instances.
The overall predictive abilities of the models based on the voting patterns indicate that, the augmented models are relatively superior to the reference models for all the 1% VaR estimates. The same conclusion cannot be made for the 5% VaR estimates. This is because; there is a split of votes among the VaR(S) models. It is worth noting that, although the VaR A (GJR) model was decisively adjudged superior to the VaR B (GJR), neither can be used in making risk decisions because they all failed the independence and the unconditional coverage tests. In general, the available evidence suggests superiority of the proposed method in forecasting VaR which agrees with the results from [36], [43], and [44].
It seems on the surface that models with more accurate volatility forecasts do not necessary yield better VaR forecasts when we compare the superiorities of the volatility models in Table 6 to the superiorities of the VaR models, however, this may not be fully true. The use of squared returns as a proxy for actual volatility tend to over-exaggerate true volatility, thus they lead to inflated MSE with distorted forecasts [4]. VaR forecast evaluation measures are therefore more appropriate in assessing volatility forecast. In this regard, based on the superiority of the augmented VaR models for the 1% VaR estimates we can conclude that our approach of modeling changes and breaks in the unconditional volatilities of exchange rates yielded improved volatility forecast. On the other hand, we cannot make such generalization for the 5% VaR estimates although majority of the augmented models outperformed the non-augmented versions.

5) CAPITAL REQUIREMENT ANALYSIS
Although our approach yielded superior VaR and ES estimates for all models at 1% and the majority at 5%, they are of no practical importance to risk practitioners especially, the banks when they are unable to use the estimates to compute acceptable regulatory capital requirements (as set out by the Basel II Accord). The capital requirements are used to control and monitor market risk exposure of financial institutions. Furthermore, they act as a buffer for adverse market conditions. Overestimation of value-at-risk forces institutions to hold significant amounts of capital and lose opportunity costs, while underestimation overexposes institutions to market risks and losses in their balance sheets that cannot be recovered at crisis periods [15]. This may lead to repercussions on their positions, on the market. Basel II accord allows banks to use their internal models to compute VaR estimates. Reference [39], however, emphasizes that, banks have the responsibility to demonstrate the accuracy of their models sufficiently through backtest analysis based on the number of VaR violations. In addition to this, the Base II accord has instituted penalty zone (see Table 9) to penalize bad models in terms of a multiplicative factor k, based on the VaR estimates over the last 250 business days. Based on the penalty zones, the capital requirement is defined by [7] Capital requirement t = max −VaR t−1 , − (3 + k) VaR 60 (26) where VaR 60 is the average VaR over the last 60 business days. In line with the quantitative standards prescribed by the Basel Committee, we focus the backtest analysis on the 99% quantile VaR estimates. Table 10 reports the mean daily capital requirements (MDCR) for the out-of-sample period. A result that immediately emerges from the Table is that the augmented models consistently produced lower MDCR in comparison  to the non-augmented versions. The superior performance of the augmented models in terms of the MDCR coincides with our previous analysis based on the eight model comparison metrics. Furthermore we can observe that, majority of the augmented models avoided the regulatory penalty zone while a few of them slipped into the yellow zone with the associated penalties; however, the converse of this statement is the case for the non-augmented models. These observations suggest that most of the augmented models would easily (no imposed penalty) pass the scrutiny of regulatory bodies unlike the non-augmented versions. The outcomes in the yellow range are plausible for both accurate and inaccurate models; however, the presumption that a model is inaccurate grows as the number of exceptions increases in the range [7]. In Table 7, the expected violations for the VaR A (T) model is smaller than that of VaR B (T), the VaR A (T) models, thus, would face less hurdles (penalties) in passing through the scrutiny of regulatory bodies, unlike the VaR B (T). Both versions of the VaR(S) models, however, will face similar challenges before been certified accurate by the regulatory bodies.

VI. CONCLUSION
There are undesirable consequences associated with inaccurate VaR estimations for banks, thus accurate forecasting of value-at-risk forms an integral part of decision-making and long-term stability of financial institutions. In this paper, 1% and 5% value-at-risk were estimated using univariate GARCH models augmented with exogenous variables. Hypothetical mutual dependencies between pre-estimated volatilities and the exogenous variables were computed to investigate the levels of exchanged information. The Jackknife-biased corrected Kernel density estimation was used in computing the mutual information while Pearson's method was used in computing the strength of the linear dependencies. In assessing the individual accuracies of the VaR estimates, the conditional coverage and the unconditional coverage tests were used. Competing models, which passed both tests, were then ranked using the model set confidence test procedure. Several other model comparison tools were also employed to assist in selecting the best models. We also conducted a capital requirement analysis to assess the usefulness of the models to banking institutions in computing mean daily capital requirements. In the next four paragraphs, we briefly discuss the main contributions of the study in relation to existing studies.
In studying dependencies among the variables (volatility, returns, and dummy or break variables constructed from returns), to the best of our knowledge, this is the first study to employ the concepts of mutual entropy and the Jackknife-biased corrected KDE approach to data from the rand forex market. Hence, this aspect of the study serves as a contribution to literature on the relationships among the dynamics of exchange rate markets of emerging economies from the perspective of mutual information. The mutual information concepts can be used to re-examine and re-affirm the various established forms of theoretical and phenomenological relationships in literature. In addition, due to the insensitiveness of mutual information to the size of data sets, it can be used to study the relationships between financial variables with limited or small data samples In speculative asset markets, asset returns are related to their volatilities via the risk premium theory [13]. This relationship is the underlying basis for computing risk premium and can be used to study the effects of an asset's volatility on its returns. The reverse of this relationship is used to study the concept of volatility asymmetry. Assets move in tandem, hence, a cross-asset returns-volatility relationship (the relationship between the volatility of one asset and the returns of another asset) and its reverse may exist. These relationships may be useful in studying the effects of cross-asset risk on asset returns and asymmetric effects of exogenous returns on volatility respectively. The reverse of cross-asset returns-volatility relationship has been useful in studying spillover of volatility asymmetry across different speculative asset markets [56], as well as the improvement of multivariate volatility forecasts [48]. Cross-asset returns-volatility relationship is a theoretical construct, thus empirical evidence of substantial mutual dependencies between volatility and the exogenous returns support the plausibility of this hypothesis.
The study also contributes to the extant literature on VaR estimation. In its unique contribution, it brings on-board, a simple but a novel approach to account for breaks and changes in the unconditional volatility of GARCH-type models. The approaches used in [2], [43], and [44] to model breaks and time-variations involve relatively complicated procedures which are sometimes not easy to incorporated into other modeling frameworks. However, in comparison to these approaches, our methodology is simple and easy to incorporate into other volatility frameworks such as the stochastic volatility framework. Furthermore, since the break variables used are exogenous unlike [31], they prevents the compounding of bias, which may be introduced by consecutive endogenous outliers in the parameter estimation. In addition, our approach takes into consideration the actual economic state of volatility in constructing the break variables unlike [31]; hence, periods of crises are modeled differently from periods of increased volatility. Again, due to the superiority of the models built on our approach and the fact that the MCS procedure ranked all models built on our approach number one, our approach, thus, provides alternative or complementary tools, which can be used to mitigate risk in financial institutions comprehensively. It is also useful to individual traders and investors who may not have any standard approach of computing financial risk associated with their daily decision-making.
Finally, the forecasting structures of non-time varying GARCH models such as [14] induce a monotonic mean-reversion path on the long-run forecasts, thus the forecasts converge or revert along a monotonic path, which is inconsistent with the underlying stochastic path. This is because the conditional forecasts converge to a monotonic non-time varying long-run variance. The evidence of significant estimated coefficients of the break and the proxy variables as reported in the bottom section of Table 5 indicates a potential source of uncertainties, which can induce time-variations in the long-run variance of GARCH models so that the conditional forecasts revert or converge to a stochastic time-varying long-run variance consistent with realized volatility.
The main findings of the study include the following: • The joint distributions of all the paired variables used in the hypothetical study are not bivariate normal and since the individual variables are characterized by heavy-tailed distributions, the bivariate joint distributions may be a mixture of heavy-tailed distributions • The hypothetical study provides evidence of substantial percentages of exchanged information between the lagged exogenous variables suggesting that there is a better chance of predicting exchange rate volatility with these variables.
• In general, the exogenous break variables tend to exchange relatively higher mutual information with volatility in comparison to the endogenous break variables, thus, suggesting that break variables constructed from exogenous returns have higher likelihoods of volatility predictive abilities. They are, hence, more likely to be adequate in accounting for breaks in the unconditional volatilities of exchange rates.
• Our approach led to a significant reduction of information leakages in the in-sample fitted models and perceived reductions of information leakages in the out-of-sample models. In addition, the approach also yielded less persistent volatilities, reduced half-life, and improved in-sample explanatory powers of the models.  In addition, there were improvements in the predicted volatilities from all the models; however, the same cannot be said about the forecasted volatilities.
• Our approach yielded better forecasts for all the 1% VaR models and the majority of the 5% VaR models. Accurate volatility is implied by an accurate VaR forecast [15], thus our approach yielded similar superiorities in terms of the volatility forecasts.
• On the usefulness of the VaR estimates in computing daily capital requirements, our approach consistently produced lower MDCR for all the models. Furthermore, majority of the models built on our approach avoided the regulatory penalty zones while few of them slipped into the yellow zone with relatively less associated penalties. In conclusion, our approach led to fewer VaR violations, improved 1% value-at-risk forecasts, lower ES forecasts, and optimal daily capital requirements, thus, the models are preferred from regulatory and institutional point of views, because they would lead to optimal bank costs and fewer bank failures. The 5% value-at-risk forecasts for the VaR(NA),  VaR(T) and VaR(E) models, however, may not be preferred from regulatory point of view, although they yielded improved VaR and ES forecasts. This is due to the fact that the models have relatively higher violations; hence, they may lead to frequent bank failures or severe regulatory penalties. However, from an institutional point of view, they are recommended because of their perceived lower bank costs. It should be noted that our proposed methodology per se might not be the cause of the relative higher violations for the 5% VaR estimates. This may be due to the use of inappropriate volatility model specifications and or inadequate market uncertainty proxies and exogenous break variables to estimate the volatility inputs of the VaR model. In a broader sense, the results are in support of studies, which advocate that failure to account for breaks in the unconditional variance lead to sizable upward biases in the degree of persistence in the estimated GARCH models with forecasts that systematically underestimate or overestimate volatility and the subsequent value-at-risk on average, over long horizons [43].
The proposed method depends on substantial levels of mutual dependencies among assets, thus, it is unlikely to yield improved forecasts when there is no substantial evidence of mutual dependencies. In addition, the approach is not parsimonious because it sometimes requires more exogenous covariates and higher-order ARMA-GARCH terms to guarantee optimal parameters that may ensure forecast accuracy. The methodology can be extended to multivariate case, where common exogenous covariates can be incorporated into the volatility processes to improve value-at-risk forecasts. The methodology may also be useful in forecasting value-at-risk for other speculative class of assets, which are known to be mutually dependent. In future, attention would be focussed on applying the methodology to broader exchange rate markets in an attempt to generalize the findings.

APPENDIX
The first rows of each of the graphs in Appendix 3 to 7 represent 99% VaR models while the second rows are for the 95% VaR models. In each row, the first model represents the augmented model while the last represent the non-augmented versions.