Exchange Rate Volatility Forecasting by Hybrid Neural Network Markov Switching Beta-t-EGARCH

The motivation of this study is built from the previous research to find a way to enhance the forecast of advanced and emerging market currency volatilities. Given the exchange rate’s nonlinear and time-varying characteristics, we introduce the neural networks (NN) approach to enhance the Markov Switching Beta-Exponential Generalized Autoregressive Conditional Heteroscedasticity (MS-Beta-t-EGARCH) model. Our hybrid model synthesizes these two approaches’ advantages to predict exchange rate volatility. We validate the performance of our proposed model by comparing it with various traditional volatility forecasting models. In-sample and out-of-sample volatility forecasts are considered to achieve our comparison. The empirical results suggest that our hybrid NN-MS Beta-t-EGARCH outperforms the other models for both emerging and advanced market currencies.


I. INTRODUCTION
Volatility represents the degree to which variable changes over time, and it is an essential facet in risk evaluation of many essential economic tasks such as value at risk, financial asset pricing, and exchange rate [52]. Volatility has three significant characteristics in the financial area, namely, volatility clustering property [37], asymmetric property [10], and nonlinearity property [19]. These properties lead to uncertainty in financial time series. In the context of economic globalization, international transactions, and capital flow across the borders have increased. There is an indisputable fact that the foreign exchange market is a crucial factor affecting the transactions and capital flows. Thus, policymakers must understand the exchange rate volatility before making fiscal and monetary policy decisions, especially in those import-led and export-led countries.
In this respect, our study aims to forecast the exchange rate volatility of the three emerging markets' currencies, namely the Brazilian Real, Chinese Yuan, and Indian Rupee, and three advanced markets' currencies, namely, Euro, The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Benedetto . Japanese Yen, and Pound Sterling. These currencies are selected as there are the top traded currencies in recent years. We use six currencies data for the investigation on the premise that they can function as the accurate predictors of exchange rate risk on a worldwide scale, allowing investors and exporting-importing firms to capture the volatility and prepare themselves against the risks.
Advanced and emerging countries have different development stages and have different positions in the world industrial chain; therefore, their innate characteristics make them need to make a suitable exchange rate system and exchange rate policy according to the economic development level and the opening degree. Advanced economies tend to adopt flexible systems while Emerging economies are characterized by their trade openness, economic development, foreign-currency liabilities, foreign exchange reserve holdings, economic size, export concentration ratios, and financial development [1]. Therefore, there is still a difference between exchange rate behaviors in both advanced and emerging countries. Although countries have similar macroeconomic fundamentals and adopted the freely floating exchange rates, the volatility of the exchange rate may vary significantly from one country to another [33], and scholars have attempted VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ to find the best method for forecasting the emerging and advanced currencies; however, inclusive results have been obtained [7], [16], [17], [20], [26], [29], [31], [32].
In the past few decades, there are several periods of high market volatility, such as the Asian financial crisis in 1997, the United States' financial crisis, the European debt crisis, and Japan's dotcom bubble. Turner et al. [5] noted that these crises are the structural change sources in the volatility persistence, thereby leading to regime-switching in the parameters in the variance equation. Thus, they suggested using the Markov Switching GARCH (MS-GARCH) model to cope with the exchange rate's low-and high-volatility regime. The application of this model can be found in the work of la Torre-Torres et al. [34]. Although this model has shown good performance in volatility forecasting, it seems not to take into account the asymmetric leverage effects for the conditional volatility. Thus, in this study, we consider Markov Switching (MS) Beta-t-EGARCH, a relatively recent volatility model introduced by Blazsek and Ho [39].
Furthermore, we consider adding the neural network's method in our volatility forecasting endeavor to recognize the similarity and close relationship between this method and many statistical techniques and its other usefulness. The neural networks' superior ability is to learn and generalize the pattern or characteristics of data [53]. More specifically, neural networks can learn the nonlinear and complex relationship among variables without model specification and the hidden relationship in the data without imposing any fixed relationships. Besides, after learning from historical data, various NN models can generalize and predict unobservable or future data. Therefore, neural network as a form of artificial intelligence inspires the present researchers with fresh consideration and cognition on volatility forecasting.
Consequently, the conventional regime-switching volatility models, which do not take into consideration the neural network's method, may become obsolete in modeling volatility in the exchange rate and predicting the complicated phenomena. To deal with this problem, the exchange rate volatility can be modeled by combining both the conditional mean and the conditional variance with nonlinear time series and a neural network model to achieve possible gains in forecasting and modeling capabilities. In other words, in this study, the MS-Beta-t-EGARCH will be incorporated with the multi-layer perceptron (MLP) type of neural networks model. As suggested by Bildirici and Ersin [23], [25], augmenting the regime-switching GARCH model with multi-layer perceptron (MLP) type of neural networks would enhance forecasting capabilities. Specifically, this study proposes a hybrid forecasting model that combined MLP and Markov Switching Beta-t-EGARCH.
This paper contributes to the literature in two aspects. First, to the best of our knowledge, this is the first attempt to extend the MS Beta-t-EGARCH to the neural networks model. Second, this paper's results provide a way to improve the prediction of exchange rate volatility, and these findings are useful for the related issues of currencies, such as international trade, foreign income and expenses, and foreign exchange rate management. This paper will be organized as follows. We review the previous research in Section 2. In Section 3, the regime-switching Beta-t-EGARCH with multi-layer perceptron (MLP) type of neural network models will be explained in detail. Section 4 is the data description. Section 5 explains different models' empirical results and compares these models' performance based on the loss functions. Section 6 makes a summary.

II. RELATED WORKS
Scholars have studied the characteristics and behavior of exchange rate volatility extensively. Among the econometric methods, the standard GARCH (1,1) (SGARCH) of Bollerslev [44] is most widely used for volatility forecasting in the foreign exchange market (see [20], [21], [43]). However, it is unable to capture asymmetric effects like the leverage effect, which is commonly found in asset return series. The Exponential GARCH (EGARCH) model of Nelson [9] was introduced to account for the asymmetric volatility. The advantage of the EGARCH model over the basic GARCH (1,1) is that it can capture the asymmetric effect or leverage effect in financial time-series. Recently, the Beta-t-EGARCH model of Harvey and Sucarrat [2] was introduced. This model broke ARCH and GARCH terms' positive parameter assumptions, thereby improving the long-term forecasting performance.
However, the Beta-t-EGARCH still faces some limitations, and the conditional mean and conditional variance equations are based on the linear structure. This model specification is not consistent with recent financial data as the financial data usually present low-and high-volatility periods. Under this consideration of different volatility levels or regimes, Gray [40] introduced Markov Switching GARCH to cope with the financial data's structural change. Caporale and Zekokh [14] revealed that the forecasting performance of the MS-GARCH model is superior to that of the one-regime GARCH model. Recently, Blazsek and Ho [39] considered asymmetric leverage effects for conditional volatility and proposed MS-Beta-t-EGARCH. They compared the in-sample statistical performance of the MS-Beta-t-EGARCH model with that of the single-regime Beta-t-EGARCH model. The result showed that the two-regime model is better than the one-regime alternative.
Alternatively, many studies have pursued nonlinear modeling using neural network (NN) models. Galeshchuk [41] revealed that the Markov Switching models do not improve exchange rate forecasts. She also mentioned that NN could be an alternative approach that can produce a nonlinear model without prior knowledge about the functional forms. In addition, NN does not require assumptions regarding the distribution of data while it can learn from the data. The NN models have recently been applied to predict volatility series in many fields [11], [38], [45], [52]. In the case of the exchange rate volatility forecasting, Panda and Narasimhan [6] found that Neural Network (NN) has a better exchange rate 207564 VOLUME 8, 2020 forecasting performance for not only in-sample but also outof-sample period, compared to the linear regression and random walk models. Other researchers also find a similar result (see, [3], [13], [22]). Although the NN technique has several advantages that distinguish it from the other existing prediction methods, it is a black box learning approach because it cannot interpret the relationship between input and output or deal with uncertainties. Another major disadvantage of neural networks is that there is no formal systematic model-building approach. Fortunately, there is a new idea considering combining NN with the GARCH-type models.
Due to the existence of complex nonlinear correlation structure among variables and more massive data sets, the conventional GARCH-type models' prediction results may not be reliable [51]. Many papers discussed and proposed extending the NN approach to both the one-regime and two-regime GARCH-type models in the literature. Examples are NN-GARCH of Donaldson and Kamstra [38], Carvalho Griebeler [26], Kristjanpoller and Minutolo [47]; NN-EGARCH of Tseng et al. [8] and Lahmiri and Boukadoum [42]; NN-GARCH typed models of Bildirici and Ersin [23], NN-Integrating GARCH of Kristjanpoller and Minutolo [47]; NN-two-regime Markov Switching (MS)-ARMA-GARCH models of Bildirici and Ersin [24]; and NN-two-regime Smooth Transition(ST)-GARCH models of Bildirici and Ersin [25]. These studies combine GARCH-type models and the NN model by adding the NN structure to the existing GARCH-type equations. They confirmed that the hybrid neural network and GARCH-type models provide greater predictability than the traditional GARCH-type models alone. Kristjanpoller and Minutolo [47] mentioned that the NN model could capture volatile properties that could not be captured by the GARCHtype models alone. Besides, the NN model introduces a potential enhancement to earlier GARCH models since it can learn and manipulate erroneous, incomplete, or fuzzy data inputs [46]. This is to say, and it is not easy to have an accurate volatility prediction when using traditional GARCH models alone. The time-series are often influenced by different characteristics, such as high persistence in the conditional variance, nonstationary, and nonlinear structure of the volatility process.
Combining regime-switching GARCH-type models with neural network methods to deal with the nonlinear dynamics of volatility has been pursued limitedly in recent years. Bildirici and Ersin [24] used a neural network augmented MS-GARCH models to predict stock volatility. They concluded that neural network models are promising. Bildirici and Ersin [25] applied hybrid Smooth Transition-GARCH models and MLP-type neural networks to forecast petrol price volatility. Their results indicated significant improvement of this hybrid model over the traditional ST-GARCH models.
In a nutshell, there is evidence that its integration could improve the regime-switching GARCH-type models' forecasting performance with the NN model. However, there are still opportunities to improve the existing models and their forecasting performance further. In this study, we proposed to combine the MS-Beta-t-EGARCH model with the NN model based on Multilayer Perceptron (MLP) to achieve improvement in terms of forecasting accuracy.
Nevertheless, to the best of our knowledge, no paper has tried to combine the NN model with the MS-Beta-EGARCH model yet. Thus, we aim to fill the research gap and enhance the MS Beta-t-EGARCH performance by combining it with the NN model. Specifically, we take advantage of the NN to improve the forecasting ability of the MS-Beta-EGARCH approach. To do this, we extend the MLP typed neural networks to the MS-Beta-t-EGARCH (1,1) model. Thus, we have NN-MS-Beta-t-EGARCH (1,1) to depict the essential stylized facts about asymmetries, volatility clustering, or mean reversion in different financial regimes. We expect that our model would be likely to bring about an increase in forecasting capabilities.
To illustrate that our model may work in exchange rate volatility forecasting since deep learning requires using highperformance computing access, we show it on the simplified example of a shallow NN (single hidden layer). We show that with a shallow NN, combining it with MS-Beta-t-EGARCH leads to better predictions than using NN by itself (and that traditional MS-Beta-t-EGARCH).

III. METHODOLOGY
In this section, we present the approaches related to our proposed NN-MS-Beta-t-EGARCH model.

A. BETA-t-GARCH MODEL
The Beta-t-EGARCH model of Harvey and Chakravarty [2] is a particular type of the Dynamic conditional score model. It is an extended version of the EGARCH model that uses the conditional score of beta-t-distribution to estimate the conditional variance. Thus, the model becomes more robust against outliers. The formulation also follows Nelson's conventional EGARCH model [8], where the conditional volatility is logtransformed. However, this model does not restrict the ARCH and GARCH parameters to be positive. The Beta-t-EGARCH model is constructed by two components, the conditional mean and the conditional volatility, and the model is driven by the conditional scale parameter λ t . The Beta-t-EGARCH (1,1) model considered in this study is formulated as: where µ t is the constant term. The conditional volatility with leverage effects is expressed as: where . α and β are ARCH effect and GARCH effect coefficients, respectively. z t is the standardized residual which is assumed to follow the student-t distribution with mean zero, variance 1 and degree of freedom v > 2. To satisfy the covariance stationarity of Beta-t-EGARCH (1,1), we restrict |β| < 1. Then the conditional volatility is computed as (3)

B. MARKOV SWITCHING BETA-t-EGARCH MODEL
One of the Markov Switching model's key characteristics is that it allows all or some parameters to switch across different regimes. Thus, the stylized facts in financial series with two volatility regimes (i.e., high and low) can be captured under a Markov chain process hidden in the regime-switching. The MS-Beta-t-EGARCH of Blazsek and Ho [39] is formulated as: The logarithm of regime dependent conditional scale of volatility λ t (s t ) is expressed as Thus, the estimated parameters varying across regimes in this model are ω(S t ), α(S t ), β(S t ) and v(S t ). S t ∈ (0, 1) indicates the realization of the two-state Markov Chain, which is the probabilistic structure of the switching regime indicator and is governed by the first-order Markov process with a constant transition probability matrix P. P is determined by two parameters: p 00 and p 11 , taking values between 0 and 1. Thus, where p 00 and p 11 are the transition probabilities from regime 0 to regime 0, and regime 1 to regime 1, respectively. To find the optimal parameters in this model, the maximum likelihood estimation (MLE) is used. Therefore, the full conditional likelihood function is specified as where the f (y t | (S t = j )) and (S t = j |y t−1 ) are, respectively, the regime-dependent conditional density of y t and the filtered probability of the regimes j = 0, 1. (S t ) is the set of the regime-dependent parameters.

C. ARTIFICIAL NEURAL NETWORK
ANN is a network of artificial neurons that can receive inputs, change their internal states according to the inputs, and then compute outputs based on the inputs and internal states. These artificial neurons have weights that can be modified by a process called learning [49]. The ANN model can be presented as whereg and g are the output and input activation functions, respectively. y t and x i are the output and input, respectively. b I and b O are the bias term of input and output layers, respectively. w I is the weight vector between the hidden layer and the input layer, while w O is the weight vector between the hidden layer and the output layer. Fig. 1 illustrates the architecture of the ANN.

D. MARKOV SWITCHING BETA-t-EGARCH WITH NEURAL NETWORK MODEL AND ESTIMATION
In this section, we extend the MLP models that belong to the ANN family to the Markov Switching Beta-t-EGARCH. In this respect, the conditional mean and conditional variance processes augmented with MLP typed neural network are regime dependent. By combining the neural networks with the Markov Switching model, Bildirici and Ersin [23] confirmed the model's better performance in terms of in-sample and out-of-sample volatility forecasting accuracy. The formulation of this model is similar to the formal Markov switching Beta-t-EGARCH in Eqs. (4-6); however, the estimation of the logarithm of the conditional scale λ t (S t ) is different, such that where ξ h (S t ) is the additional regime-dependent output weight in the neural network.g(·) is the linear activation function of the output layer. g(z t (S t )δ h (S t )) is the regimedependent multi-layer perceptron neural network (MLP) that possesses h hidden neurons in each regime, which can be written as where 2 is the standardized residual at lag d(input) in the log-sigmoid activation function g (·) and δ h,d (S t ) is the additional regime-207566 VOLUME 8, 2020 dependent input weight. We note that three layers, namely one input layer, one hidden layer, and one output layer, are considered in the multi-layer perceptron neural network. To have a simple estimation, one neuron is assumed (h = 1).
In the estimation point of view, the maximum likelihood estimation is also used to estimate all unknown parameters in MS Beta-t-EGARCH. Thus, the log of the regime-dependent conditional likelihood of y t is written as where is all unknown parameter and f (y t | (S t = j )) is the conditional density of y t , which can be defined as where (·) is the gamma distribution. (S t = j |y t−1 ) is the filtered probability, which can be filtered by Hamilton's filter [18] as written below.
Therefore, the maximum likelihood estimator for the Markov Switching Beta-t-EGARCH Model with NN model is given by We note that the output weight parameter is estimated by minimizing the sum of the square's residuals of the volatilities, whereas σ t is the actual volatility.

E. MODEL SELECTION
Although AIC and BIC are goodness-of-fit measures of the models, they are different for some reasons. AIC gives prediction upon the assumption of maximum likelihood, while BIC consists of a likelihood term and a penalty term. The penalty term depends on the number of free parameters in the trained model for different states. However, it is hard to justify the best model when the competing models' AIC values are not much different; hence, the raw AIC and BIC may lead to ambiguous interpretation. AIC weights (AICw) and BIC weights (BICw) of Wagenmakers and Farrell [12] are introduced as the alternative model selection methods in this study. Note that these statistics are referred to as conditional probabilities for each model. The formulas of AICw and BICw are as follows: where i (AIC) = AIC i − min AIC and i (BIC) = BIC i − min BIC. The minAIC andminBICare the minimum AIC and BIC of the best model. We note that these two weight statistics can be viewed as the probability that model i (M i ) is the best.

F. FORECASTING EVALUATION
The Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are also used to evaluate the prediction error rates and model performance in volatility forecasting analysis. These measures can be computed by where σ i is the estimated forecasting volatility and σ i is the realized volatility. Note that the realized volatility is measured by σ i = |y i − E(y i ).|

IV. DATA DESCRIPTION
The data sets analyzed in this paper are the financial time- These exchange rate series are chosen because the corresponding advanced and emerging currencies are those highly traded in the world. Note that the US dollar is not included in our analysis as it is used as the benchmark currency for those six currencies. Table 1 and Fig. 2 show the basic data information and a graph of the six currencies' returns, respectively. We can  observe that the INR shows the highest average return (0.0014), followed by GBP's average return (0.0012). BRL return has the highest standard deviation (76.635%), followed by JPY (63.709%) and GBP (62.484%). This evidence suggests that the British Pound is characterized by high return and high risk. Positive skewness is observed in all series, except for JPY. High kurtosis is also observed in all series. These results indicate that our currency returns are leptokurtic, with a longer distribution and fatter tails. In other words, the kurtosis and skewness statistics show that there had been a deviation from normality in the currency returns. We thus conduct the Jarque-Bera test to confirm the non-normal distribution. Jarque-Bera test strongly suggests that the null hypothesis of the normal distribution for daily returns can be rejected. Furthermore, the ARCH-LM heteroscedasticity test is used to verify the ARCH effects in our currency returns. The result of heteroscedasticity cannot be rejected for the daily returns series. Then, we also employ the Augmented Dickey-Fuller test to examine our series's stationarity, and the result shows strong evidence supporting the stationarity of our series.
As shown in Fig 2, there are low-volatility and highvolatility periods along 2007-2019. This indicates that the variance of the return of currencies is not stationary and presents the volatility clustering, confirming the GARCHtype models' use. We can observe that most of the currencies experience high volatility around 2008-2009, corresponding to the US's global financial crisis. This figure suggests that all currencies are generated by a sequence of random variables with regime-switching probability distributions. Therefore, it might be appropriate to forecast these six currencies' volatility by using the Markov Switching Beta-t-EGARCH model.

V. EMPIRICAL RESULTS
The parameter estimation, in-sample volatility forecast, and out-of-sample forecast of the NN-MS-Beta-t-EGARCH and other baseline models (Beta-t-EGARCH, NN-Betat-EGARCH, and MS-Beta-t-EGARCH) were undertaken. We also considered the previous hybrid deep learning models, namely the Neural Network (NN)-MS-GARCH of Bildirici and Ersin [23], and the Neural Network (NN)-GARCH of Kristjanpoller and Minutolo [48]. In addition, the neural network without GARCH (NN) of Malliaris and Salchenberger [28] was also considered as other competing models. By doing this, we can investigate the performance of the hybrid models and determine whether improvements of MS-Beta-E-GARCH can be achieved in the forecasting of exchange rate volatility by combining a neural network method with MS-Beta-t-EGARCH. Then, several criteria were used to evaluate the performance of these models.

A. ESTIMATION RESULTS FOR DIFFERENT VOLATILITY MODELS
At the first stage, we present the parameter estimates of our proposed NN-MS-Beta-EGARCH model and baseline models (MS-Beta-EGARCH and Beta-EGARCH). The estimation of NN-MS-Beta-EGARCH model for six currencies is provided in Table 2, and two baseline models for six currencies are reported in Table 3. The last two rows show the results of the diagnostic test on the standardized residuals. According to the results, it can be summarized as in the following: First, the results reported in Tables 2 and 3 show that the GARCH effect of the first and second regimes, β(0) and β (1), are close to 1 for all currencies, except for the first regime of INR. This indicates that these exchange rate markets experience a high degree of volatility persistence in both regimes. Second, the measures of the unconditional variance α(0) + β(0) for regime 1 and α(1) + β (1) for regime 2 show high persistence for all the returns in both regimes. Besides, the summation of these ARCH and GARCH effects are less than one satisfying the stationary covariance condition. Third, the transition matrix of the Markov Switching model is estimated; and we observe that p 00 > 0.9 and p 11 > 0.9 are observed in all currencies, except for INR, indicating that the probability of staying in the same regime is larger than 90% and there is only a 10% chance of switching from one to the other regime. Finally, the diagnostic tests are presented in the last two rows: Ljung-Box autocorrelation test and ARCH effects test. The result shows that the Ljung-Box test strongly accepts the null hypothesis of no serial correlation of the standardized residual. Furthermore, the ARCH effects reveal that we cannot 207568 VOLUME 8, 2020 reject the null hypothesis of no serial correlation and the ARCH effects in the squared residual series.
One of the advantages of the MS-GARCH typed model is that it can be used to characterize and forecast the probability of being in a high-or low-volatility regime [34]. Therefore, the regime probabilities obtained from the MS-Beta-t-EGARCH are illustrated in Fig 3. According to Fig. 3, the high exchange rate volatility regime's smoothed probabilities are presented for our sample period from 2007 to 2019. We observe the high-volatility regime in some periods. It is observed that the probability of staying in the high-volatility regime of INR is close to one during the entire sample period, indicating the high volatility of INR during 2007-2019; however, the probability of staying in this regime is not persistent. We also notice that the probability that CNY stays in the high volatility is low during 2009-2010. This finding is consistent with the Chinese government's intervention toward CNY appreciation because of the effects of the global economic crisis on China's exporters. For GBP, JPY, and BRL, the smoothed probabilities of all currencies display that upswings are abrupt and much shorter while downswings are more gradual and highly persistent. Among these three currencies, the most severely fluctuating one is JPY.

B. IN-SAMPLE STATISTICAL PERFORMANCE
In this section, the first forecasts that are performed are those from the conventional neural network and Beta-E-GARCH model to use them as a benchmark to compare with our proposed hybrid model. Specifically, six conventional volatility models, consisting of the artificial neural network (Model 1), the hybrid artificial neural network and GARCH model (Model 2), the hybrid artificial neural network and Markov Switching GARCH model (Model 3), Beta-t-EGARCH model (Model 4), the hybrid artificial neural network and Beta-t-EGARCH (Model 5), and Markov Switching Beta-t-GARCH model (Model 6), are compared with our proposed hybrid artificial neural network and Markov Switching Beta-t-EGARCH (Model 7). These models are compared in terms of AIC, BIC, AICw, BICw, MAE, and RMSE criteria for in-sample forecast performance.
In the second stage, to obtain the most accurate models, the model is further selected using the AIC, BIC, AICw, and BICw. For AIC and BIC, a lower value indicates a more parsimonious model. For AICw and BICw, a higher value indicates a better model fit. The model selection criteria results have arrived at the same conclusions for the in-sample results. We find that the regime-switching Beta-t-EGARCH model with a multi-layer perceptron (MLP) achieves a more reasonable degree of fit than the competing models for all six currencies. Besides, it is conspicuous that the regime-switching has improved the conditional volatility modeling performance as the Markov Switching model is superior to the single-regime alternative. To obtain robustness results, we need to select the model for in-sample performance and the forecasting ability. Hence, we should further compare the model performance by the out-of-sample volatility forecast.

C. IN-SAMPLE STATISTICAL PERFORMANCE
In the third stage, the prediction power of each model was compared by the out-of-sample performance. The worldfamous economist Friedman [27] points out that ''the only relevant test of the validity of a hypothesis is a comparison of its predictions with experience.'' We need to compare the prediction result with the real data to validate the forecasting performance.
As we mentioned before, in this paper, we employ MAE and RMSE as loss functions for making the comparison, and a lower value of MAE and RMSE indicates a better prediction while a higher value might lead to more significant uncertainty in prediction. Table 5 presents the out-of-sample forecast performance of four competing models. The forecast horizon is selected as 1 and 21 days ahead to evaluate the models' performances in short and long horizons. The results show a significant improvement of the Markov Switching Beta-t-EGARCH with NN models in both the short-term (1-day) and long-term (21-day) volatility forecasts for CNY, JPY, EUR, INR returns. The results in this table demonstrate the NN-MS-Beta-t-EGARCH for six currency volatilities have reduced MAE and RMSE. One crucial point is that the NN-based models' MAE and RMSE values are comparatively lower than expected. Starting from T+1 day to T + 21 days, the NN based models provide lower MAE and RMSE values. We also observe that the MAE and RMSE values for the NN-MS-Beta-t-EGARCH model obtained for long horizons are higher than short horizons.
Next, we compare the performance of the single-regime and two-regime models. The two-regime switching models, NN-MS-Beta-t-EGARCH and MS-Beta-t-EGARCH, outperform the single-regime counterparts as the lower MAE and RMSE are obtained. This confirms the presence of the structural change in six currencies' volatilities.
According to the above results, our NN-MS-Beta-t-EGARCH improves the capability to model and forecast exchange rate returns and volatility over the baseline models, not only for the short term but also for the long-term volatility forecasting. Hence, we conclude that a neural network approach can enhance the MS-Beta-t-EGARCH model. This study proves the generalization and forecasting power of the NN-MS-Beta-t-EGARCH model.
Finally, to validate our proposed model's forecasting performance, we compare our NN-MS-Beta-t-EGARCH model with two hybrid models, namely the Neural network-MS-GARCH and the Neural network GARCH. It is observed that our proposed model provided superior results relative to these hybrid models. VOLUME 8, 2020

D. THE OUT-OF-SAMPLE FORECASTING PERFORMANCE
Finally, we investigate whether or not our proposed hybrid models statistically significantly outperform other competing models. To achieve this analysis, out-of-sample twenty daysahead forecasts of conditional volatilities are generated based on seven models. Then, the model confidence set (MCS) of Hansen, Lunde, and Nason [36] is used to compare these seven models' forecasting performance. We would like to note that the MCS test has the ability to find the best forecasting models with a certain probability in the set of competing models [50]. Hansen, Lunde, and Nason [36] proposed two test statistics, which are the semi-quadratic statistic and the range statistic; thus, we consider both tests in our comparison. These tests are constructed from two loss functions, MAE and RMSE, and we use 5,000 bootstrap simulations to obtain the p-values. Note that the higher p-value indicates higher forecasting performance.
According to the MCS test results shown in Table 6, both of the loss functions indicate that the volatility forecast based on our proposed NN-MS-Beta-t-EGARCH shows the best performance as the p-value is equal to 1.000 for all currencies. This result confirms the robustness of our proposed forecasting model.

VI. CONCLUSION
In this work, we combine the merits of two-regime Markov Switching (MS) Beta-t-EGARCH and multi-layer perceptron (MLP) type of neural networks (NN) models for predicting the emerging and advanced market currencies which are most traded nowadays. The proposed model aims at modeling the nonlinear conditional mean processes, the conditional variance, and neural network architecture simultaneously through the hidden Markov process. The model is suggested to forecast the volatility of the six widely traded currencies in advanced and emerging market currencies: Brazilian Real, China Yuan, Indian Rupee, Japanese Yen, Euro, and Pound Sterling. Several criteria are conducted to assess the accuracy of our model. Both in-sample and out-of-sample forecasts of NN-MS-Beta-t-EGARCH are introduced to investigate the accuracy and performance of our model.
Based on this study's findings, both in-sample and out-of-sample volatility forecasts suggest that the NN-MS-Beta-t-EGARCH is superior to the conventional MS-Betat-EGARCH as well as the single-regime alternative. The results show that the MS-Beta-t-EGARCH based neural networks augmented models provide the lower AIC, BIC, MAE, and RMSE compared with the competing models' results. We then conclude that the conditional variance models augmented with artificial neural networks enable improved modeling as long as the proposed model captures the volatility more efficiently. Moreover, we compare our proposed model with NN-MS-GARCH and the NN GARCH and found that our proposed model still outperforms these two models under the major currencies data. We also confirm that augmenting MS-Beta-t-EGARCH with MLP leads to better predictions than using NN or MS-Beta-t-EGARCH alone. The MCS test also investigates the robustness forecast, and the result confirms the superiority of our proposed model.
The exchange rate is a key factor driving a national monetary policy and international trade balance for policy implications. According to the present findings, the foreign exchange markets experienced a high degree of volatility persistence over the past decades, and only an extreme event might switch the market returns to the other regime. Hence, from the perspective of policymakers, to create a favorable environment for economic development and avoid more significant fluctuations in the exchange rate, the government does not need to interfere with the foreign exchange market when the market stays in the high volatility regime. For further research, as the exchange rate dynamics are featured by nonlinearity with high and low volatility regimes, it is worth considering the neural networks approach when undertaking some research regarding the financial time-series volatility forecasting. It will enhance the forecasting capabilities.