A Hybrid Model for Financial Time Series Forecasting—Integration of EWT, ARIMA With The Improved ABC Optimized ELM

The practical significances and complexities of financial time series analysis induce highly demand more reliable hybrid model that denoised the data efficiently, handled with both linear and nonlinear patterns in the data, to achieve more accurate results. This paper suggests a new forecasting hybrid model for financial time series data combined Empirical Wavelet Transform (EWT) technique with improved Artificial Bee Colony (ABC) algorithm, Extreme Learning Machine (ELM) neural network, and Auto-Regressive Integrated Moving Average (ARIMA) linear analysis algorithm. The EWT is used to decompose and denoise the data to reconstruct the data more suitable for forecast. The improvement of the ABC algorithm is according to the Good Point Sets (GPS) theory and adaptive Elite-based Opposition (EO) strategy (GPS-EO-ABC) to overcome the drawbacks of the original algorithm and enhance the optimization performance. The optimized ELM with GPS-EO-ABC, as well as the ARIMA, are utilized independently to generated different forecasting results and combined by the weight-based procession. We testify the performance of the proposed improved ABC algorithm by ten benchmark functions, simulating the proposed forecasting models by three financial time-series datasets. The results indicate that: (1) The proposed algorithm shows outstanding capacities in parameter optimization. The optimized ELM generated more stable and precise results compared with original ELM, ABC-ELM, single LSTM, and ANN; (2) The proposed hybrid model has not only effectiveness but also efficiencies in denoising data, correcting outliers, coordinating both linear and nonlinear patterns, its performance in financial time series forecasting is more excellent than existing hybrid models.


I. INTRODUCTION
The statistical tools or techniques that reveal the rules and forecast the futures of phenomena through financial time-series data have a guiding signature for both governments and enterprises to predict revenues and costs, to evade financial risks available. Financial time-series analysis has always been the frontier field of financial engineering and enterprise risk management [1]. As a peculiar time-series data, financial time series data which derived from stock The associate editor coordinating the review of this manuscript and approving it for publication was Sudipta Roy . prices, interest rates, and so on, commonly include the following characteristics [1]- [3]: (1) The randomness and complexity in the generation process; (2) Most data collected with noise; (3) Inherent non-linear relationships among data.
Thus, data decomposing, data denoising, and dealing with nonlinear data patterns properly are core points of putting financial time-series models into the application. Traditional time series analysis model such as Auto-Regressive Integrated Moving Average (ARIMA) [4], dynamics models such VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ as Generalized Auto Regressive Conditional Heteroskedasticity (GARCH) [5], and machine learning model such as Artificial Neural Network (ANN) [2] are not solely sufficient to address these problems [6]- [8]. Many studies have attempted to use the hybrid model methodology to denoise the data, increase the chance to capture the linear patterns with nonlinear patterns in the data, and improve the performance of the forecasting results [6]. The hybrid model in works of literature mainly comprises three categories as follows.
Hybrid models combined the aforementioned independent models. Since it reported in 1969, the hybrid methodology that a combination of forecasts has been attracting a lot of interest [9]. During the recent three years, the most commonly seen in combined linear time series model with machine learning model for financial time series data forecasts are ARIMA with ANN [10], ARIMA with SVM [11], ARIMA with DLN [12]. There are also many studies combined nonlinear dynamics models with machine learning models such as GARCH with SVM [13], etc. When employing this type of hybrid modeling, one model is based on the residual of the other one [6], [8]. If the subsequent model failed to model the residual of the prior model, the whole generalized performance of the hybrid model would be declined. Therefore, it is still needed further studies for the model architecture and the performance improvement by other techniques.
Hybrid models integrated the data decomposing and denoising module. This type mainly utilizes the data preprocessing approaches such as Wavelet Transform (WT), Empirical Mode Decomposition (EMD), Chaos Theory (CT), and so on, to establish reasonable prior information which made analysis grounding on statistical inferences obtain sensitivity [14]. For example, Bao et al. [15] proposed a hybrid model based on deep learning framework consists of WT and DLN. The stock price series data decomposed by the WT, which made it more suitable for forecasting. Likewise, the studies of Wadia and Ismail [16] and Raimundo and Okamoto [17] applied the WT into ARIMA and SVM, respectively. Awajan et al. [18] combined EMD with ARIMA to decompose the nonstationary and nonlinear stock market time-series data into Intrinsic Mode Functions (IMF) for ARIMA. The performance of the result is better than the original ARIMA model. Ravi et al. [19] used the CT to construct phase space followed by invoking Multi-Layer Perceptron (MLP), which testified that CT is an efficient way of eliminating volatility in financial time series data.
Hybrid models mixed the swarm intelligence algorithms. This type focus on exert the advantages of swarm intelligence algorithms. In recent decades, the swarm intelligence algorithm involved in financial linear time series models or machine learning models include Genetic Algorithm (GA) [20], Particle Swarm Optimization (PSO) algorithm [21], [22], Fireflies Algorithms (FA) [23], Ant Colony Optimization (ACO) algorithm [24], etc. Among the most popular swarm intelligence algorithms, Artificial Bee Colony (ABC) is on the basis of the particular intelligent behavior of artificial honeybee swarms [25]- [27]. Compared with other algorithms, it has the advantages of fewer parameters and easy implementation [28]. The ABC algorithm and its optimized models have also been applied to the time series estimation or analysis [29]- [31]. However, the process of ABC optimization is easily fell into under-fitting solutions due to the uneven distribution of initial solutions in the data space [32]- [34], leading to the solutions of algorithm lacked stability and robustness. Besides, like other swarm intelligence algorithms, techniques to enhance the global search, as well as the local search ability of the whole algorithm is subject to high demanding [28], [35], which aims to avoid phenomena such as precocious convergence, and trapping into local optima solution. Hence, although the hybrid models depending on ABC or other swarm intelligence algorithms have some improvement, there still have the space to make performance better.
In summary, hybrid models mentioned above have certified the abilities to synthesize the information in financial time series parameters, reveal the rules among data, and generate more accurate predictions. Hence, this study proposed a hybrid financial time series forecasting models combining data preprocessing techniques Empirical Wavelet Transform (EWT), machine learning model Extreme Learning Machine (ELM), linear model ARIMA, and improved ABC algorithms. This assignment has explained the central importance of hybrid model methodology in financial time series analysis. The overall structure of the study takes the form of five sections, including the introduction section. Section II, Proposing the novelty and basic idea in the proposed hybrid model methodology. Section III, Introducing individual models and the necessary modules in the proposed hybrid model. Section IV, illustrating the testing experiment of the proposed algorithms and the simulation study on the proposed model. Conclusions from this paper presented in Section V.

II. METHODOLOGY
As in FIGURE 1 illuminated, the basic idea and data process route of the proposed hybrid model is: Firstly, the financial time series data submitted to the data-prepossessing model by using of EWT. Then, the reconstructed data yielded to individual forecasting models, including ARIMA, and the ELM improved by GPS-EO-ABC to obtain individual outcomes. At last, by using (1), the corresponding predictions combined where w j is the weighting of different models, and w = 1, e jt is the j th forecasting results of the time series dataset {X (t)} in the t th plot. Details are illustrated in next section.
The novelties of the proposed hybrid model are as follows: (1) EWT [15]- [17], [36], [37] is a novel data prepossessing model which has the formalism of WT and the adaptability of EMD [37], [38]. The prime advantage of EWT is to remedy the drawbacks that the primary WT requires specification of prior wavelet basis and parameters, as well as lack of self-adaptive method [39]. In this study, the EWT is applied to denoise and decompose the financial time series data to eliminate the impact of outliers, which made the time series data more suitable for the ARIMA and ELM forecast. (2) In the existing literature, the ANN [2], [8], [29], [40]- [43] mainly based on backpropagation algorithms, it is generally far slower than required. As a Single Layer Feedforward Neural (SLFN) network, ELM has the advantages in learning rate and relatively accurate results [40], [41]. Previous studies had proved its superiority in fast time series analysis [44]. However, its application in financial time series forecasts usually do not have good prediction owing to the randomness and intermittence in the data, the enhancement for forecasting accuracy and reliability of individual ELM still need further study. (3) The process of ABC optimization is easily fell into under-fitting solutions due to the uneven distribution of initial solutions in the data space [32]- [34], leading to the results of the algorithm lacked with stability and robustness. Besides, like other swarm intelligence algorithms, techniques to enhance the global search, as well as the local search ability of the whole algorithm, is subject to high demanding [28], [35], which aims to avoid phenomena such as precocious convergence, and trapping into local optima solution. Therefore, this study proposed an improved ABC based on Good Point Sets (GPS) theory and adaptive Elite Oppositionbased (EO) strategy [25]- [31], [33] to address these problems. (4) The aforementioned hybrid models have generated more precise and reliable forecasting results. Nevertheless, few studies have combined data preprocessing, linear time series model, machine learning model with improved swarm intelligence algorithms. This study would fill this gap.

A. EWT MODEL
WT represents any function as a superimposition of a set of wavelets that provide crucial information about both time and frequency domains at the located positions [36]. When employing the WT technology to decompose the time series data, it is important to note that performing wavelet decomposition on time series must be independent for each partition period in order to avoid incorporating information from future data [45]. As a combination of WT and EMD, Empirical Wavelet Transform (EWT) proposed by J. Gilles [37], its core point is a self-adaptive improvement for traditional WT. The brief introductions are [38]: in signal space L 2 (R), the EWT consists of the empirical scaling function spectrumφ n and the empirical wavelets spectrumψ n , which mathematically expressed by using (2) and (3) respectively.
where ω n is the nth maxima of Fourier spectrum, γ is the real number belong to [0, 1], β(X ) is an arbitrary C k ([0, 1]) function which satisfies the following properties: is an orthonormal basis of L 2 (R). Like the classical wavelet transform, EWT can be implemented arising from the built of orthonormal basis. Therefore, the detailed coefficients between signal and empirical wavelet are calculated according to (5), the approximation coefficients between them are calculated by convention w ε f (0, t) using (6):  where f (T) is the function mapping signal, τ is the half the length of the ''transition phase'', and τ n = γ ω n , by using (7), the reconstruction is obtained: Following this formalism, the empirical mode is given by (8) and (9) [46]: Thus, the EWT is adaptively decomposition of the Fourier line, it does not necessarily correspond to the position of different modes. By using of the local max-min method, the period boundary of Fourier supports can be detected adaptively from the information contained in the processed spectrum.
Previous studies [15], [46], [47] have proved that datadecomposition level greater than or equal to three is sufficient to describe the data in a meaningful way. In this study, the data are decomposed to four levels. Relative pseudocode is listed in Algorithm 1.

B. ARIMA MODEL
The ARIMA pioneered by Box and Jenkins are the most widespread linear time series analysis model [1], [48]. It integrates the Auto Regressive (AR) model and Moving Average (MA) model to forecast from the parameters. Briefly introductions are as followings [4]: Given time series with N data {x(t)} N t=1 , an ARIMA (p, d, q) model can be defined as: where p, q are the model orders, d is the difference order. B is the lag which made (B)x(t) = x(t − 1). Partial Auto Correlation Function (PACF) is the commonly used lag identification functions, ε t is the white noise that the mathematical expectation is zero, and the variance is one. φ(B) is the autoregressive model of order p and θ(B) is the moving average function of order q, which can be mathematically expressed by (11) and (12) respectively.
C. IMPROVED ABC The ABC algorithm was proposed by Turkish academic Karaboga in 2005 [25]. It inspired by simulating bees honey-foraging behavior to solve multidimensional optimization problems, and it consists of three categories of artificial honey bees: employed foragers, onlookers, and scouts [26], [27]. In ABC, each food source regarded as a solution in the data space. The employed foragers and the onlookers are responsible for the exploitation of food sources, and the scouts are in charge of avoiding too few kinds of food sources [28]. The process of artificial honey bees foraging is essentially a solving process. The ABC algorithm steps can describe as follows [25]- [27]: Step 1: Randomly generate initial honey bee population with the scale of M , the first half is the employed foragers, and the second half is the onlookers, each artificial bee x is a D dimensional vector; Step 2: The employed foragers sent onto the food sources, and every forager updated the fitness of food sources according to (13): where newX ij is the new location of food source, x ij is the jth dimension of forager or food source X i , x kj is the jth dimension of random selected X k , k is not equal to j, γ is a random number between [−1,1]; Step 3: Calculate the probability that the food sources with which the onlookers prefer them, according to (14): fitness ij (14) Step 4: Send the scouts into the search area for avoiding too few food sources, memorize the best food source. If the requirements met, ABC ends. Otherwise, back to Step 1.
The principle of the proposed GPS-EO-ABC algorithms is to improve the initialization of artificial bee colony according to Good Point Sets theory to enhance the global searching abilities. Then, to improve the way of food sources and bees updating based on Elite-based Opposition algorithms to enhance the local search abilities. Moreover, the inertia weights function is introduced, which can adaptively adjust the search step length to avoid the phenomenon of oscillation in the iterative late.

1) GOOD POINT SETS THEORY
The studies of Liu et al. [33], Li et al. [49], and Zhang [50] have proved that ABC, FA, and GA based on good point sets theory will provide better target solving data space and generate more diversified initial population, which can enhance the global search abilities and avoid quickly falling into optimal local value. The definitions of GPS theory are as below [51].
Definition 1: Let D dimensional Euclidean space, the unit cube: {GD|xi} = (x 1 , x 2 , · · · , x D ), the good point: r = {r 1 , r 2 , · · · , r D }, r ∈ G D , Compare r with sample sets P n (i) one by one, and the deviation sets is obtained, where: is an arbitrary constant only related to r and (ε > 0).  2 is bidimensional initial population distribution of 200 samples generated by random method compared with the good point sets method. Obviously, the latter is much more well-distributed than the former. In addition, the construction of the good point sets has nothing to do with the dimension of data space. This method can well address optimization problems in the field of multi-dimension functions. Thus, this study proposed a good point sets initialized ABC algorithm, to enhance the global search abilities of traditional ABC to avoid premature or underfitting solutions. The good point sets r j calculated by using (15): where D represents the dimension of the unit cube in the Euclidean space, p means the minimum prime that can make (p − 3)/2 ≥ D hold. If the coding length of the solution is 20, which means the chromosome dimension of each bee equals to 20, the p = 43.

2) ELITE OPPOSITION BASED ALGORITHM
Elite Opposition-based (EO) machine learning algorithm was introduced by Tizhoosh H. R. in 2005 [52]. Its basic idea is learning from past data or instructions, optimize estimated solutions, and search in large spaces for an existing solution [52], [53]. The studies of improved ABC depended on EO have proved it has the outstanding ability to coordinate global search with local search [54]. Thus, this study regards the food sources preferred by onlookers as the Elites, implementing the EO algorithms to maintain the diversity of population and strengthen the ability of local search. Related approaches are as follows [52]- [54]: Definition 2: Let the Elites of the food sources (where artificial bees are send onto, they can also regard as the Elites of bees colony) in D dimensional space: Where a i and b i is the lower and upper boundaries of x i . The opposition point of Elites defined asx i , where: then calculate the fitness f ( This study regards the food sources preferred by onlookers as the Elite points, implementing the EO algorithms to maintain the diversity of population and strengthen the ability of local search. In addition, traditional swarm intelligence algorithms are easily sunk into oscillation near the peak value of function in the iterative late [21], [22], [32], [55], which is related to the search steps of artificial bees. In the iterative late, if the distance between the peak value with the t th times artificial bees is less than the step-length. At the t +1 th times iteration, the bees would move to the other side of the peak value, and back to the origin at the t + 2 th times iteration. Optimization with adaptive step-length could address this problem well [56], [57]. Taking the steady iterative state problem into account, an adaptive Elite-Opposition based algorithm referred to the inertia weight strategy of particle searching in the PSO algorithm [55] is presented in this study. The inertia weight W of artificial bees searching is calculating by (17): where t is the number of the iteration, t max is the maximum iteration, ω is the constant coefficient. As t increases, the inertia weight W (t) would decrease. The artificial bees searching steps can be decreased automatically by the application of W (t), it can enhance the searching accuracy around the vicinity of Elite points in the iterative late. In this study, the calculation of the opposition point of Elites is not using (16) but using (18) In summary, the pseudocode of GPS-EO-ABC is listed in Algorithm 2.

D. GPS-EO-ABC ELM MODEL
Extreme Learning Machine (ELM) proposed by Huang et al. [40], is attracting widespread interest due to VOLUME 8, 2020 Algorithm 2 GPS-EO-ABC 1. Set the upper boundaries ub, lower boundaries lb, artificial bees population size M, the limits of food sources Limit, the maximum iteration t max , the iterative counter t = 0; Generate the initial population P according to Definition 1. and Equ. (15); 2. Send foragers onto the food sources, calculate the fitness F(x); 3. Set the non-updated counter Bas = 0 (Bas ≤ Limit); Foragers search for new food sources newX ij according to Equ. (13); 8.
End For 9.
Send onlookers onto the food source: 11.
Onlookers search for new food sources newX lj according to Equ. (13); 14.
Calculate F(newX lj ), updating the population based on Greedy selection; 16.
Send scouts to find food sources that Bas = Limit, generate new one instead of it; 20. t = t + 1; 21. End While the extremely learning rate and less intervention [41]. It is based on single layer feedforward artificial neural networks, which are simpler and more efficient than traditional neural networks back propagation algorithms. The standard SLFN with m hidden nodes can be written as [40], [41]: where ω i is the input weights connecting ith hidden node and the input nodes, i = 1, · · · , m, β i is the output weights connecting i th hidden node and the output node. b i is the bias in the i th hidden node, x j is the jth input variables, j = 1, · · · , n, t j is the jth output variable, g(X) is the activation function mainly based on Sigmoid. It is expressed compactly by using (20): where: Thus, the output weight β calculated by using (22): whereH is the pseudo-inverse matrix of H.

1) PRINCIPLE OF GPS-EO-ABC ELM
The principle of the GPS-EO-ABC ELM model is: Laid down the topology of the model according to the input and output data. Thus, the coding length of artificial bees is determined. Each bee represents one weights and bias terms solution of ELM. Execute the GPS-EO-ABC algorithms as Algorithm 1.shown to find the best bee corresponding to the optimal fitness value. Decode the bee and assign the value to ELM, through the training and testing, the model would achieve a more accurate result to suit various modeling conditions. FIGURE 3 gives the flow chart of the GPS-EO-ABC ELM model.

2) STEPS
Data after decomposing and denoising by EWT are employed as the inputs of ELM. The aforementioned the GPS-EO-ABC ELM algorithm is implemented as follows: Step 1 (Encoding): Assign random value to input weights and the bias terms B between [−1,1], setting hidden neuron number, calculating the output weights and creating the ELM networks according to Equ. (19), Equ. (20), Equ. (21), and Equ. (22), determining the topology of ELM networks.
Encode the initial hidden node number N , weights W , and bias terms B as a whole, each artificial bee, including the value of them.
Step 2 (Parameters Initializing): Set the upper boundary ub, and the lower boundary lb, the size of bee colony M, the first half is the employed foragers, and the second half is the onlookers, the limits of food sources Limit, the maximum iteration t max , the iterative counter t; Initialize the artificial bees population-based on GPS theory; Step 3 (Fitness-Calculating): Decode the artificial bees, training and testing ELM network to get the error indices (MAE, MAPE, and RMSE), using the error indices as the fitness of each bee, and the minima of error indices is the maxima of fitness.
Step 4 (Bees and Food Sources Updating): Update the bees and food sources according to Algorithm 1.
Step 5 (Judging): Judge whether the iterative error satisfies the end conditions or not, if it is satisfied, the optima are obtained and given to the ELM network to produce the output result, model ends; Otherwise, judge whether reaches the maximum iteration t max or not, if not, turn to Step 4. Otherwise, end.

E. THE ARCHITECTURE OF PROPOSED HYBRID MODEL
The architecture of the proposed hybrid model is displayed in

IV. EXPERIMENTS AND SIMULATION
The proposed algorithm and hybrid model have been implemented by using MATLAB R2018b, on a system with an Intel (R) Core (TM) i3-4150CPU @ 3.50 GHz, 4.00GB memory and WIN10 64-bit operating system.

A. EXPERIMENTS FOR PROPOSED ALGORITHM
To evaluate the performance of the improve ABC algorithm based on good point sets theory and adaptive elite-opposed strategy, this study compared it with traditional ABC algorithm, the Good Point Sets optimized ABC algorithm (GPS-ABC). Ten benchmark functions are selected to test the performance as listed in Table 1. Among them, f 1 (x) to f 5 (x) are bidimensional functions, and f 6 (x) to f 9 (x) are multi-dimensional functions, the dimension d = 20; The parameters of the three ABC-based algorithms are setting as follows: the population size M is set to 100, the maximum iteration t max is set to 100, the limits of food sources Limit is set to 50. In TABLE 2, Best, Worst, Mean and Var are used to record the best solution, the worst solution, the average  of solutions and the variance of solutions for twenty times independent experiments. FIGURE 5 shows the convergence curve of the three algorithms.
Take f 6 (x) for example, comparing with the original ABC, the best solution, the worst solution, the average solution of the twenty times experiments is decreased 3.76 × 10 −2 , 7.62 × 10 −1 , and 2.52 × 10 −1 respectively. Comparing with the ABC only improved by GPS, the corresponding solutions is decreased 1.64 × 10 −2 , 1.45 × 10 −1 , 5.25 × 10 −2 respectively. The performance of the proposed algorithm is the best, which indicates the abilities of GPS-EO-ABC to approximate to the most optimal solutions. Besides, the variance of the solutions is decreased 3.25 × 10 −2 , 1.75 × 10 −3 for ABC, GPS-ABC respectively, which means the proposed algorithm can generate more accurate solutions stably and robustly. It is because the GPS-EO-ABC is based on the uniform distributed initial population according to Good Point Sets theory and the elite-based opposition strategy introducing VOLUME 8, 2020 with inertia weights, which made artificial bees in the algorithm keep an adaptive step-length to search for the optima in most cases. Thus, the GPS-EO-ABC has dramatic global search facilities. Furthermore, in view of the converge curve, no matter in bi-dimension functions or in multi-dimension functions, GPS-EO-ABC displays the advantages in convergence speed and constancy. Although the GPS-ABC has some improvement for ABC, the GPS-EO-ABC can approach to the optimal solutions with more stable convergence curve, which demonstrates the superiority of the AEO algorithm. Therefore, the GPS-EO-ABC could provide a outstanding way to enhance the local search abilities.   where Y is the observed value, Y ' is the predicted value, t is the tth time period in the dataset, T is the total time periods.
2) DATA PREPOSSESSING As FIGURE 7 illustrated, by using EWT to decompose the data, the information can be segmented in the Fourier spectrum well. In this study, prior to forecasting by ARIMA and GPS-EO-ABC ELM, the financial time series data are decomposed to four levels, the four uncorrelated subseries are extracted and one residual is obtained after extraction. The four subseries are reconstructed and the residual is ignored which helps to denoise the original time series. Besides, the PACF values between each reconstructed data and the antecedent values at lag k to obtain the input variables for individual models are shown in FIGURE 8. It is clearly that the PACF values for Data 2 and Data3 is significant at lag 1 after the elimination of internal correlation, and no PACF plot with 95% confidence bands for Data1, which demonstrated that the data at tth time strongly affect the prediction that occurs one time step ahead, while the effect of the actual data from t -2th time to t − kth time to the one-step ahead prediction is not so obvious. Thus, the values at lag 1 are selected as the input variables for training.

3) PARAMETER SETTING
To  used back propagation algorithm, the LM training function is exploited to speed up the training process, the training time is set to 100, the training target is set to 0.01, the learning  rate is set to 0.1. For LSTM, the number of hidden units is set to 10, the max epoch is set to 100, the initial learning rate is set to 0.1. Each experiment is repeated ten times, and results with the best accuracy are presented.

4) RESULTS AND COMPARISONS
The forecasting results of abovementioned individual ELM related models for Data 1, Data 2, and Data 3, are listed in     Figure 10 shows the one-step ahead forecasts generated by the proposed hybrid model. The first five lines in each table are the results of individual models without data prepossessing by EWT, the proposed optimized ELM not only generated more accurate results than single ELM, but also had good performances comparing with single LSTM and basic ANN. For example, the decrease in RMSE and MAPE of Data 1 for LSTM is 0.3183 and 8.2254%; 0.1380 and 3.5334%; 0.3312 and 8.5065% with different steps ahead. For ANN, the decrease is 0.0052 and 0.1817%; 0.0313 and 0.6576%; 0.0337 and 0.3199%.
The middle five lines are the hybrid models combined EWT with each individual models. Take Data 2 the models GPS-EO-ABC ELM with or without EWT for instance, the error indices of one-step ahead, two-step ahead, and three-step ahead are decreased by 0.0363 and 0.0030%, increased by 0.0202 and 0.0015%, decreased by 0.0349 and 0.0030% respectively. The ehancement of accuracy performance by data decomposing and denoising through EWT is not remarkable in two-step ahead. It reflected that the EWT can achieve more accurate results in most environments.
The facilities of EWT technique are also reflected in the last four lines of each table, which lists forecasts of the hybrid models combine linear financial time series analysis model ARIMA with machine learning model ELM, LSTM, and GPS-EO-ABC without EWT, as well as the VOLUME 8, 2020  ARIMA-ANN that linear partterns are submitted to ARIMA and nonlinear patterns are submitted to ANN. The results indicate that: The hybrid model methodology combined linear time series model with machine learning model is an relative efficient way to improve the forecasts. Given Data1 GPS-EO-ABC ELM related models as a example. Comparing with individual model, the MAPE and RMSE of AR GPS-EO-ABC ELM are decreased by −1.1737%, and 2.0861 for one-step ahead, by −1.9139%, 2.0538 for two-step ahead, by −2.6172%, and 2.9124 for three-step ahead. Comparing with hybrid models combined with EWT, the corresponding indices are decreased by −1.2098%, and 1.2212; −1.9126%, and 2.0983; −2.6183% and 2.9532. The increasements in MAPE indicates the influence of outliers on the forecasting results. Data denoising by EWT, the outliers could be corrected. Therefore, the proposed hybrid model not only decreased MAE and RMSE further more, the MAPE decreased 0.6168%, 1.0187%, and 1.4363% for different steps ahead respectively, which shows that the most excellent forecasting performance of the proposed hybrid model at each data.

V. CONCLUSION
In this study, we proposed a hybrid model for financial time series forecasts that incorporate with EWT, optimized ELM based on GPS-EO-ABC, as well as the linear time series analysis model ARIMA. The EWT is used to decompose and denoise the data, aiming to eliminate the impact of noise and correct the outliers in the original time series, which made it more suitable for the forecast. The improvement of the ABC based on the GPS theory and adaptive EO algorithm, focusing on enhancing both the global search abilities and local search facilities of the original ABC algorithm, which made further optimization on the performance of ELM. The reconstructed data by EWT is submitted to ARIMA and the GPS-EO-ABC-ELM model for forecasting respectively, and independent results are combined by weight-based procession, purposing to take the advantage of ELM in extremely learning rate, capture the linear patterns and nonlinear patterns in the time series data adequately, and achieve more accurate results. We used the MAPE and RMSE as the error indices to testify each model. The results of experiments and simulations shown that the proposed algorithm is robust and convergent. Improved ELM model by GPS-EO-ABC can generate more accurate and stable forecasting results compared with the original ELM, ABC-ELM, GPS-ABC, the single LSTM, and basic ANN. The hybrid model, combined with EWT, GPS-EO-ABC ELM, and ARIMA, has the most excellent forecasting performance. The proposed methodology is not only effective but also efficient. Information derived from the proposed model could satisfy the practical requirements of financial time-series analysis.

APPENDIXES APPENDIX I.
In order to strengthen the theoretical basis of GPS-EO-ABC, this section theoretically analyses the convergence by building the absorbed Markov models.
Definition 3: Given the tth iteration bee colony of GPS-EO-ABC algorithm X (t), n times arbitrary different iteration are set as t 1 < t 2 < · · · < t n ∈ T .
If satisfy P [X (t n ) ≤ X n |X (t n−1 ) = X n , . . . , X (t 2 ) = X 2 , X (t 1 ) = X 1 ] = P [X (t n ) ≤ X n |X (t n−1 ) = X n−1 ], the {X (t)} ∞ t=0 is a Markov process. Theorem 4: Let the random process corresponding to the artificial bee colony sequence based on the GPS-EO-ABC algorithm {X (t)} ∞ t=0 , the {X (t)} ∞ t=0 has Markov property. Proof: The Optimization process of GPS-EO-ABC includes four steps. These steps are all independent process, the most optimal forager and food source in the tth iteration is only determined by the t-1th. When X (0) is the initialized population, the selection of initial data located in good point data based on Good Point Sets theory, thus the initial distribution of X (0) bees is uniform in the whole sample space with unchanged values. The state of X (t) bees only determined by X (t-1), it has no relationships with X (t − 1), . . . , X (2), X (1), Moreover, the {X (t)} ∞ t=0 is a random procession.
has Markov property. Definition 5: Given the Markov process {X (t)} ∞ t=0 , and the optimal solution space Z * ⊂ Z . If satisfy the probability: is an absorbed Markov process.
Theorem 6: Let the random process corresponding to the artificial bee colony sequence based on the GPS-EO-ABC is an absorbed Markov process. Proof: According to Theorem 4, after each iterative, the global optimal solution of the algorithm remains updated or unchanged.

APPENDIX II.
The theorical analysis of the financial hybrid model combined EWT with ARIMA and GPS-EO-ABC ELM based on convex quadratic programming Kuhn-Tucker (K-T) conditions. Definition 9: Let X = (x 1 , x 2 , · · · , x N ) T the points in Ndimensional Euclidean space. For programming: If f (X ), g j (X ) are convex functions, h i (X ) is linear function, it is a convex programming function.
Theorem 10: Let the financial time series with N data {X (t)} N t=1 , modeling combined ARIMA and GPS-EO-ABC ELM for time series forecasting is a convex programming function.
Proof: The hybrid model chooses MAE, MAPE, and RMSE as the error indices for model accuracy forecasting, VOLUME 8, 2020 the smaller indices value indicates the better model performance. It is known that the ARIMA is linear function. The GPS-EO-ABC ELM can be written as a quadratic optimization problem: which illustrated the proposed model optimization is a quadratic function, the model combined EWT, ARIMA, and GPS-EO-ABC ELM is a convex programming function.
Definition 11: Let X = (x 1 , x 2 , · · · , x N ) T the points in N -dimensional Euclidean space E, X ' one local optima of convex programming function J the set of all active constraint subscripts of X , P the feasible direction of X, f (X ) is differential at X ', and g j (X ) is differential at X' j ∈ J g j (X ) is continuity at X' j / ∈ J .
According to Definition 13, prove up.
HE YU received the B.B.A. degree from Anhui Normal University (AHNU), in 2010, and the M.A.Sc. degree from Guangxi University (GXU), in 2013. He is currently working with the Anhui Vocational College of Press and Publishing. He has contributed to the design and verification of the proposed hybrid model. He has published over ten articles in journals. His research interests are in artificial intelligence applications in financial engineering, machine learning algorithms, neural networks and statistical learning, and convolutional neuron networks.
LI JING MING received the Ph.D. degree in management science and engineering from the Hefei University of Technology (HFUT), in 2017. He is currently working with the School of Management Science and Engineering, Anhui University of Finance and Economics (AUFE). He has contributed to the design of parallel algorithms for the combination of swarm intelligence optimization algorithm and neural network. These algorithms are usually used for classification problems and are designed by parallel interaction. In the past five years, he has been involved in the algorithm flow of the combination of swarm intelligence optimization algorithm and machine learning method, which are used for solving the problems of agrometeorological disaster prediction, haze warning, and credit evaluation. He also developed an attribute selection method based on the combination of discrete Glowworm Swarm Optimization algorithm and fractal learning. His main research interests include artificial intelligence, machine learning, and financial engineering.
RUAN SUMEI received the Ph.D. degree in business administration from the Hefei University of Technology, Hefei, China, in 2014. She currently serves as a Professor and a Master Tutor at the Finance School, Anhui University of Finance and Economics (AUFE). She has published over 40 articles in journals. Her main research interests are bank management, corporate governance, and internet finance.
ZHAO SHUPING received the Ph.D. degree in business administration from the Hefei University of Technology, Hefei, China, in 2015. He is currently working management information systems with the Hefei University of Technology. He has developed some machine learning algorithms for classification or prediction in recent years. He has published over ten articles in journals. His main research interests are decision-making theory and data mining.