Loading web-font TeX/Main/Regular
An Adaptive SVR for High-Frequency Stock Price Forecasting | IEEE Journals & Magazine | IEEE Xplore

An Adaptive SVR for High-Frequency Stock Price Forecasting


The workflow of Adaptive SVR for stock price forecasting.

Abstract:

In order to mitigate investments, stock price forecasting has attracted more attention in recent years. Aiming at the discreteness, non-normality, high-noise in high-freq...Show More
Topic: Index Modulation Techniques for Next-Generation Wireless Networks

Abstract:

In order to mitigate investments, stock price forecasting has attracted more attention in recent years. Aiming at the discreteness, non-normality, high-noise in high-frequency data, a support vector machine regression (SVR) algorithm is introduced in this paper. However, the characteristics in different periods of the same stock, or the same periods of different stocks are significantly different. So, SVR with fixed parameters is difficult to satisfy with the constantly changing data flow. To tackle this problem, an adaptive SVR was proposed for stock data at three different time scales, including daily data, 30-min data, and 5-min data. Experiments show that the improved SVR with dynamic optimization of learning parameters by particle swarm optimization can get a better result than compared methods including SVR and back-propagation neural network.
Topic: Index Modulation Techniques for Next-Generation Wireless Networks
The workflow of Adaptive SVR for stock price forecasting.
Published in: IEEE Access ( Volume: 6)
Page(s): 11397 - 11404
Date of Publication: 09 March 2018
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Stock price forecasting is one of the most challenging tasks encountered by financial organizations and private investors. To effectively mitigate the risk and to gain high investment return, a large number of prediction models are proposed. Then, with the development of information technology, the research on finance changes from macro to micro. Research direction of study is shifted from low-frequency data to high-frequency data. Charles Sutcliffe [1] pointed out that high-frequency data offers many opportunities for more detailed analysis of market activity. Jacquier et al. [2] presented that the financial high-frequency data is non-normality and proposed a stochastic volatility model to solve the non-normality. Jobson and Korkie [3] shows that the high-frequency data is highly nonlinear. Subsequently, the paper [4] demonstrates that high variability of intra-minute returns across stocks is shown to have a crude U-shaped pattern.

The above literature shows that the high-frequency data has the characteristics of the discrete, non-linear and non-normal distribution. In response to these characteristics, many scholars to improve the existing financial low-frequency data analysis model to adapt to the new characteristics of high-frequency data. The models of statistical techniques, such as autoregressive integrated moving average [5], are replaced by artificial intelligence (AI) methods, including artificial neural networks (ANNs) [6], Genetic Algorithm [7], and Hidden Markov Model [8]. In general, BP neural network is the most widely used. However, BPNN suffers from the risk of over-fitting, large number of parameters, and difficulty in obtaining a stable solution [9]. To solve the problems described above, intensive research to improving ANNs are presented such as BPNN with genetic algorithm [13], [14] and ANNs with metaheuristics [15]. Recently, support vector machine regression (SVR) has also been more envisioned in nonlinear regression estimation. SVR has a better generalization owing to the structural risk minimization principle. Thus, SVR has been successfully used in many fields of time series prediction such as traffic flow prediction [10], and financial time series forecasting [11], [12]. Nevertheless, it is difficult to set up the parameters of SVR. A model to fitting parameters fails to meet changing data. Soon afterward, some improved SVR are proposed, such as SVR with fuzzy model [16] and GA-SVR [17]. In this paper, the particle swarm optimization algorithm (PSO) is proposed to optimizing the adaption and parameters for SVR algorithm. Major contributions include:

  • We discuss that stock data have different distribution characteristics in different periods or different stocks.

  • The parameters of the traditional SVM are fixed in advance, and can not adapt to the changing characteristics of the financial high-frequency data.

  • We propose a novel algorithm to cope with changing financial high-frequency data, which applies adaptive mechanism and particle swarm optimization algorithm to adjust parameters dynamically.

  • We do a comparative experiment at three different time scales, including traditional SVR and BPNN. Experimental results show that the proposed method is more effective than others.

The rest of the paper is organized as follows. Section II reports on related works about forecasting methods for high-frequency data and time series. Section III describes an analysis of the problem, followed by our experiments in section IV. Section V concludes the paper.

SECTION II.

Related Work

This section will present the principle of SVR as described by prior research. The formalization of SVR is described in detail, which is helpful to understand the proposed method in this paper. SVM/R, originally proposed by Cortes and Vapnik [18], uses a linear model to implement nonlinear class boundaries through mapping the input vectors x into the high-dimensional feature space. In the new space, a non-linear model is converted into a linear model. Thus, SVM/R is interested in the solution of the maximum margin hyperplane in the new space. Then, the classification/ regression problem is transformed into a quadratic programming problem, which is easy to solve by an optimization program.

Given a data \left ({x_{i},y_{i} }\right) , i =1, 2, \ldots , n , x_{i}=R^{n} , y_{i}\in R , where x_{i} is the n -dimensional input, that is historical stock data in this paper, y_{i} is the output, that is predicted value, n is the number of training set. The form of SVR can be represented as the following equation:\begin{equation} f(x)= w^{T}\varphi \left ({x }\right)+b\quad \varphi :R^{n}\to F,~ w \,\epsilon \,F \end{equation}

View SourceRight-click on figure for MathML and additional features. where w is an element in the high-dimensional spaces, \varphi \left ({x }\right) is the non-linear mapping function, b is the threshold. Then, the above regression problem is to seek to minimize the risk of f, by minimizing the structural risk, as the following.\begin{equation} R_{reg}=\frac {1}{2}\parallel w\parallel ^{2}+\,C\times \frac {1}{n}\sum \nolimits _{i=1}^{n} {\vert y_{i}-f(x_{i})\vert _{\varepsilon }} \end{equation}
View SourceRight-click on figure for MathML and additional features.
Where \parallel w\parallel ^{2} is the description function, R_{emp} is the empirical risk, C is a penalty parameter, \vert y_{i}-f(x_{i})\vert _{\varepsilon } is an insensitive parameter. The parameters of C and \varepsilon affect the precision of SVR significantly. The value of \varepsilon is defined as following.\begin{equation} \vert y_{i}-f\left ({x_{i} }\right)\vert _{\varepsilon }=\! {\begin{cases} \mathrm {0,} &\quad \vert y_{i}-f\left ({x_{i} }\right)\vert \le \varepsilon \\ \left |{ y_{i}-f\left ({x_{i} }\right) }\right |-\varepsilon ,&\quad else \\ \end{cases}} \end{equation}
View SourceRight-click on figure for MathML and additional features.

By introducing nonnegative slack variable \mathrm { \xi } and Lagrange function, using Karush-Kuhn-Tucker (KKT) condition, the function f\left ({x }\right) is described as following:\begin{equation} f(x)=\sum \nolimits _{i=1}^{n} {(a_{i}-a_{i}^{\ast })k(x_{i},x)+b} \end{equation}

View SourceRight-click on figure for MathML and additional features. Where a_{i}, a_{i}^{\ast } are Lagrange multiplier, k(x_{i},x) is defined as the kernel function. There are some different kernels for generating the inner products to construct machines in the input space. Different kernels will have different nonlinear decision surfaces to obtain different results. We should choose the kernel according to the feature of inputs. Generally, common example of the kernel function is Gaussian radial basis function (RBF). We also choose RBF as our kernel in this paper, as following:\begin{equation} f(x)=\sum \nolimits _{i=1}^{n} {(a_{i}-a_{i}^{\ast })\mathrm {exp}(-\parallel x_{i}-x\parallel ^{2}/2\sigma ^{2})+b} \end{equation}
View SourceRight-click on figure for MathML and additional features.
Where x_{i} is the element of training set, x is the element of test set, \sigma is a parameter of RBF.

However, the parameters of SVM are determined in advance. Usually, cross validation is used to determine parameter values. And the SVM with fixed parameters does not apply to constantly changing financial high frequency data.

SECTION III.

The Proposed Adaptive SVR

In this section, we first introduce the overall architecture of the proposed adaptive SVR in this paper. Then, details and knowledge of the proposed method are described.

A. Overall Architecture

As time goes on, the price of stock will change a lot. So, SVR based on fixed parameters is difficult to adapt the changing. Online learning will waste a lot of computing resources. Meanwhile, the speed of execution cannot meet the demand. We set up a threshold \theta , and once the estimation error exceeds the threshold \theta , the model will be trained with the latest data. For the sake of improving the parameter optimization, we introduce particle swarm optimization (PSO for short) to SVR. Finally, an adaptive SVR based PSO is proposed in our paper. The process of adaption is shown as Figure 1.

FIGURE 1. - Process of adaption in SVR.
FIGURE 1.

Process of adaption in SVR.

B. The Adaptive SVR

1) The Optimization Algorithm Used in SVR

Particle swarm optimization, originally proposed by Eberhart and Kennedy [19], is a kind of evolutionary computation, which is the behavior of birds’ predation. In this paper, we use PSO for short. The basic idea of PSO is to find the optimal solution through cooperating and sharing information among individuals. It is simple and easy to be implemented and does not have many parameters. At present, PSO has been widely used in optimization, neural network training, fuzzy system control and so on.

In PSO, each potential solution is called a particle, the position of the i-th particle in the n-dimensional space, the flight velocity, and the optimal value are expressed as follows:\begin{align} x_{i}=&\left ({x_{i,1},x_{i,2},\cdots ,x_{i,n} }\right)\in R^{n} \\ v_{i}=&(v_{i,1},v_{i,2},\cdots ,v_{i,n})\in R^{n} \\ p_{i}=&(p_{i,1},p_{i,2},\cdots ,p_{i,n})\in R^{n} \end{align}

View SourceRight-click on figure for MathML and additional features.

After obtaining the optimal value of the individual and the global optimal solution, the particles update their speed, position according to the following formula:\begin{align} v_{i}\left ({t+1 }\right)=&wv_{i}\left ({t }\right)+c_{1}r_{1}\left ({p_{i}-x_{i}(t) }\right)+c_{2}r_{2}(p_{g}-x_{i}(t))\notag \\ \\ x_{i}\left ({t+1 }\right)=&x_{i}+v_{i}(t+1) \end{align}

View SourceRight-click on figure for MathML and additional features. Where i=1,2,\cdots ,N , N is the number of training set, and r_{1}, r_{2} are rand number ranged by [0, 1]. c_{1},c_{2} are acceleration factors, which range are non-negative numbers. w is a factor of inertia weight.

2) The Adaptive SVR

In our proposed method, the PSO is used for optimizing the parameters (C and g) of SVR. So, there are two dimensions in the above x_{i} , C and g, respectively. v_{i} denotes i -th particle’s speed. And, the data are divided into training set and validation set. We choose the best parameter by optimizing the prediction accuracy of the validation set. The cost function is shown below.\begin{equation} f\left ({C,g }\right)=minarg\sum \nolimits _{m}^{M} \sum \nolimits _{n}^{N} {Loss(y_{i}-y_{i}^{\prime }(c_{m},g_{n}))}\qquad \end{equation}

View SourceRight-click on figure for MathML and additional features. Where, y_{i} denotes the true value of i -th sample, and the y_{i}^{\prime }(c_{m},g_{n}) means a function of C and g. When the prediction model gets a model we need, we terminate the optimization process.

SECTION IV.

Problem Analysis

Obviously, our problem is to improve the accuracy of financial high-frequency data prediction. In this section, we mainly focus on how to use SVR for forecasting. In order to accurately forecast stock price with SVR, we studied the impacts of penalty coefficient C and distribution parameter on SVR, which are two important factors for the accuracy of prediction. We randomly selected five stocks on the Shanghai Stock Exchange as evidence. The results show that the effect of C and on the prediction results is different in different periods or different stocks.

A. The Effect of C in Different Stocks or Different Periods

We randomly selected five stocks on the Shanghai Stock Exchange. In the analysis, we employ RMSE, MAPE, MAD to evaluate C of SVR. Meanwhile, without loss of generality, we take three-time scales into account, including daily data, 30-minute data, and 5-minute data. We observed the accuracy of the prediction when the C values ranged from 10−5 to 104.

Firstly, we study the relationship between the different stocks and the C value in the same period. The three indicators (RMSE, MAPE, MAD) of the five stocks (SH06, SH16, SH26, SH36, SH56, in which SH06 is SH600006 for short) are shown in figure 2. As we saw it, the optimal solutions of different stocks forecasting correspond to different C , although they tend to move up and down together in the same period. The experiments in 30-minute data and 5-minute data also came to the same conclusion. So, we only show the result of the daily data.

FIGURE 2. - The effect of C in different stocks.
FIGURE 2.

The effect of C in different stocks.

Subsequently, we validated our assumption that the same stock has different data distributions at different times and that different C values need to be set to obtain the best prediction. So, we selected three stocks (SH06, SH16, SH26), and chosen three periods (from January to March) of each stock to investigate. The detailed results of RMSE in three stocks are illustrated in table 1. S06–15 means the stock SH06 in the year of 2015. The table 1 reveals that the prediction performance of SVR is better when the C value is between 0 and 100 which is the range of particle swarm optimization mentioned later. It also can be seen that the stock SH06 gained the best result when C was 0 in 2015, C is 10 in 2016, and C is 100 in 2017. The same case happens with the other two stocks. Hence, we can get the conclusion that a stock needs to set up different C values for SVR at the different period to get better performance.

TABLE 1 The Effect of C in a Stock of Different Periods
Table 1- 
The Effect of C in a Stock of Different Periods

B. The Effect of G in Different Stocks or Different Periods

g is also of great importance to affect the ability of forecasting in the SVR. We study the effects of g with the same analysis methods as mentioned above. Firstly, we study the different effects of g on three different time scales at the same period, such as daily data, 30-minute data and 5-minute data. Then, we analyze the change of g in the same stock in different periods. In order to seek out the change regulation of g in SVR, we evaluated the error of the SVR model by g ranged from 10−5 to 104.

Figure 3. shows the changing of SVR prediction results for daily data by different g. It is clear from the Fig. 2 that the prediction results are better when the C value is between 10−2 and 102. However, there are different C values for different stocks. Experiments in 30-minute data and 5-minute data also prove that our conjecture is correct. The distribution of stock data is inconsistent at different times.

FIGURE 3. - The effect of g in different stocks or different periods.
FIGURE 3.

The effect of g in different stocks or different periods.

FIGURE 4. - The result comparison of SH56 stock in 5-min data.
FIGURE 4.

The result comparison of SH56 stock in 5-min data.

Figure 3. shows the changing of SVR prediction results for daily data by different g. It is clear from the Fig. 2 that the prediction results are better when the C value is between 10−2 and 102. However, there are different C values for different stocks. Experiments in 30-minute data and 5-minute data also prove that our conjecture is correct. The distribution of stock data is inconsistent at different times.

From the above analysis, it is concluded that the impacts of C and g on prediction results are different in different stocks or different periods. Therefore, a SVR model of fixed parameters cannot be applicable to different stocks, or different periods of the same stock. To solve this problem, we propose an adaptive SVR based on particle swarm optimization, which can improve the versatility and efficiency of the program. The adaptive SVR applied a feedback mechanism to adjust the parameter of SVR dynamically. When the model error exceeds a threshold \theta , the SVR will retrain the model soon.

SECTION V.

Experiments and Results

A. Dataset and Data Modeling

1) Dataset

The data set is from Shanghai Stock Exchange, including SH600006, SH600016, SH600026, SH600036, SH600056, which can be download by http://www.wstock.net. We choose three different time-scales (or named frequencies) to validate our approach namely, the daily data, the high frequency data of 30 minutes, and the high frequency data of 5 minutes. Daily data is from the listing date to March 31, 2017. The 30-minute data is from January 1, 2017 to March 31, 2017. And the 5-minute data ranges from February 1, 2017 to February 28, 2017. By the way, we take eighty percent of the data as the training set, and twenty percent of the data as the test set in our experiment.

2) Data Modeling

Let us denote \boldsymbol {X}=\{\boldsymbol {x}_{\boldsymbol {1}},\boldsymbol {x}_{\boldsymbol {2}},\cdots ,\boldsymbol {x}_{\boldsymbol {n}} as the time series and the training set, where \boldsymbol {x}_{\boldsymbol {i}}=\left ({\boldsymbol {x}_{\boldsymbol {i}}^{\boldsymbol {1}},\cdots ,\boldsymbol {x}_{\boldsymbol {i}}^{\boldsymbol {5}} }\right)\boldsymbol {\epsilon }\boldsymbol {R}^{\boldsymbol {5}} is the i-th data point and n is the total number of training set, \boldsymbol {x}_{\boldsymbol {i}} is composed of 5 elements, including open, high, low, close data, and trading volumes. \boldsymbol {Y}=\{\boldsymbol {x}_{\boldsymbol {2}}^{\boldsymbol {1}},\boldsymbol {x}_{\boldsymbol {3}}^{\boldsymbol {1}} ,\cdots ,\boldsymbol {x}_{\boldsymbol {n}+\boldsymbol {1}}^{\boldsymbol {1}} is the result according to the \boldsymbol {X} . That is to say, we employ the previous time’s trading information to forecast the open price for the next time, which is \left ({\boldsymbol {x}_{\boldsymbol {i}}^{\boldsymbol {1}}\boldsymbol {,\cdots ,} \,\boldsymbol {x}_{\boldsymbol {i}}^{\boldsymbol {5}} }\right)\boldsymbol {\to }\boldsymbol {x}_{\boldsymbol {i}+\boldsymbol {1}}^{\boldsymbol {1}} . In order to fit the model of SVR, the data is preprocessed by normalization. Linear transformation for each column is done, by dividing the maximum value, so that the data is mapped to [0, 1].

B. Experiment Setup and Comparison

1) Experiment Setup

Good parameter settings will lead to the better performance. This article has three model parameters that need to be set, including traditional SVR, BPNN, and adaptive SVR. Then the paper describes the parameter settings in this experiment in order to reproduce the experimental results. In the traditional SVR model, the parameters are C={2.2} and \mathrm {g}={2.8} which are the best settings by PSO optimization. And in the BPNN model, the parameters are epochs=1000 , lr=0.01 and goal=0.001 , where epochs denotes the maximum number of iterations, lr denotes learning rate which is a common parameter in machine learning algorithms, and goal represents minimum error of training target. The weights of BPNN are randomly initialized.

We use the same parameter setting with mentioned traditional SVR, as the initial value of the adaptive SVR. And the threshold of the adaptive SVR is \theta =0.05 for daily data, \theta =0.02 for other data. The value of C is ranged from 1 to 100, and the g is from 0.1 to 10. In accordance with the method described in section 2.2, the parameters utilized in PSO are set as follows: the inertia weight is w={0.7} , the acceleration factors are {c1=0.01} and c2=0.01 . Besides, the maximum number of iterations is {iter=400} , and the initial population number is 40.

2) Evaluation Metrics

To evaluate the proposed method, we apply three widely used quality indexes, i.e., Root Mean Square Error (RMSE), Mean Absolute Percent Error (MAPE), Mean Absolute Deviation (MAD). The formulas are as following:\begin{align} RMSE=&\sqrt {\frac {1}{N}\sum \nolimits _{i=1}^{N} \left ({{observed}_{i}-{predicted}_{i} }\right)^{2} } \\ MAPE=&\sum \nolimits _{i=1}^{N} {\left |{ \frac {observed_{i}-{predicted}_{i}}{observed_{i}} }\right |\times \frac {100}{N}}\qquad \quad \\ MAD=&\frac {1}{N}\sum \nolimits _{i}^{N} \left |{ \left ({{observed}_{i}-{predicted}_{i} }\right) }\right | \end{align}

View SourceRight-click on figure for MathML and additional features.

3) Performance Comparison

In order to verify the effectiveness and robustness of the proposed method, we performed experiments on different stocks and different time scales. The BPNN model adopted by [11] and traditional SVR model are used as comparison methods which are commonly used for prediction. In the following, we discuss the experimental results in details.

The first experiment is performed on the data set of 5 minutes. We compared the performance of BPNN, traditional SVR, and adaptive SVR by the quantitative index of RESE, MAPE, and MAD. The detailed results are shown in Table 2. Obviously, the result of AD-SVR is the best, followed by BPNN, and SVR is the worst. It can be discerned from the table 2 that the forecast results of the five stocks have been greatly improved. The worst one of the five stocks is SH26, which was 11% lower than the traditional SVR, in RESE, MAPE, and MAD. And the SH26 stock is also better than BPNN. However, the best one of the five stocks is SH56, whose error has been reduced by about 60% compared with traditional SVR, and reduced by about 45% compared with BPNN.

TABLE 2 The Results for 5 Minutes Data
Table 2- 
The Results for 5 Minutes Data

To compare the results intuitively, we show one of our experimental results in figure 2. This figure illustrates the prediction results of SVR, BPNN, AD-SVR, and the true values which ranges from 21.66 to 22.47. It can be seen that the result of AD-SVR is closest to the real value than that of SVR and BPNN, especially where the peak appears. As time went on, the prediction of SVR became worse and worse. As far as the section 3 is concerned, the reason is that the change of data distribution feature leads to lower SVR performance of prediction.

The second experiment is conducted on the data set of 30 minutes. The errors of SVR, BPNN, and adaptive SVR are shown in Table 3. Compared with traditional SVR, it can be seen that improvements are more pronounced than that in the data set of 5 minutes. The worst one of the five stocks is SH06, whose error has been reduced by about 15%. And the stock SH56 is the best one whose error is 79% lower than traditional SVR in RESE, MAPE, MAD. For BPNN, this experiment is difficult to draw conclusions. The results in the SH16, SH26, and SH36 are better than that of traditional SVR. But the result in the SH56 is worse than that of SVR. This could be due to BPNN being affected by initialization parameters. Yet, compared with BPNN, the results of AD_SVR are almost superior to that of BPNN, except for SH36. And AD_SVR is more robust than BPNN due to nonrandom parameters.

TABLE 3 The Results for 30 Minutes Data
Table 3- 
The Results for 30 Minutes Data

Similarly, we show one of our results by the following figure 5. Compared with 5-minute data, the 30-minute data fluctuated significantly, ranging from 21.63 to 23.59. As we can see from figure 5, the first 25 predictions of SVR, BPNN, and AD_SVR are almost the same as the true value. As time goes by, the prediction of SVR and BPNN becomes worse and worse. However, the prediction of AD_SVR is always moving up and down around the true value. As shown in figure 5, the red line and the black line are almost coincident.

FIGURE 5. - The result comparison of SH56 stock in 30-min data.
FIGURE 5.

The result comparison of SH56 stock in 30-min data.

The third experiment is implemented on the daily data set. The results are indicated in Table 4. The results are similar with the previous experiments. Compared with SVR, the errors of AD_SVR in different stocks have been reduced, ranging from 8% to 41%. It should be noted that the training set of adaptive SVR is much smaller than that of traditional SVR, in order to improve the speed of the program. The results of BPNN is better than that of SVR in SH06, SH26, and worse than that of SVR in SH36 and SH56. However, AD_SVR outperforms BPNN and SVR.

TABLE 4 The Results for Daily Data
Table 4- 
The Results for Daily Data

The following figure 6 shows the results of SH56 in daily data, which changes from 9.91 to 25.10. Changes are consistent with those experiments mentioned above. The results of SVR and BPNN are getting worse. We can see that several peaks of the green and blue lines deviated from the true value greatly.

FIGURE 6. - The result comparison of SH56 stock in daily data.
FIGURE 6.

The result comparison of SH56 stock in daily data.

From the experiment we can see that, BPNN is easily affected by random initialization parameters. So, it is difficult to gain a good performance for all stocks. And traditional SVR does not apply to changing stock data. However, AD_SVR has a strong robustness to adapt to changing stocks (different stocks, different periods, different time-scales and so on). The verification experimental results indicate the effectiveness and reliability of AD_SVR model.

SECTION VI.

Conclusion

In this paper, an adaptive SVR based on PSO is proposed to enhance the versatility of the model and to avoid suffering from adjusting parameters of SVR. We tested our proposed algorithm on three time-scales on the stocks of Shanghai Stock Market. The results showed that the adaptive SVR has better adaptability and better prediction results than the traditional SVR and BPNN. And we don’t take the impact of historical models into account in this experiment. So, a weighted adaptive SVR will be introduced in our future work.

ACKNOWLEDGMENT

Yanhui Guo and Siming Han contributed equally to this work.

References

References is not available for this document.