Twin Least Squares Support Vector Regression of Heteroscedastic Gaussian Noise Model

The training algorithm of twin least squares support vector regression (TLSSVR) transforms unequal constraints into equal constraints in a pair of quadratic programming problems, it owns faster computational speed. The classical least squares support vector regression (LSSVR) assumpt that the noise is Gaussian with zero mean and the homoscedastic variance. However, it is found that the noise models in some practical applications satisfy Gaussian distribution with zero mean and heteroscedastic variance. In this paper, the LSSVR is combined with the twin hyperplanes, and then an optimal loss function for Gaussian noise with heteroscedasticity is introduced, which is called the twin least squares support vector regression with heteroscedastic Gaussian noise (TLSSVR-HGN). Like LSSVR, TLSSVR-HGN also lacks sparsity. To analyze the generalization ability of the proposed model, the sparse TLSSVR-HGN (STLSSVR-HGN) is proposed with a simple mechanism. The proposed model has been verified using the artificial data set, several benchmark data sets and actual wind speed data. The experimental results show that TLSSVR-HGN is a better technology than the other algorithms.


I. INTRODUCTION
Support vector machine (SVM) is a statistical learning theory for solving classification and regression problems [1]- [3], and it shows some advantages in various fields due to its good generalization ability. For example, it has been used in pattern recognition [4], feature selection [5], function approximation [6] and financial regression [7]. Compared with other machine learning algorithms, SVM has unique algorithm characteristics. However, from another perspective, SVM itself has the disadvantage of high computational complexity. To solve this problem, Many researchers have established some effective algorithms (e.g. SMO in [8], SVMTorch in [9]) and developed usable software tools (LIBSVM in [10]). In addition, many people have built some simple models. For example, LSSVM [11] is faster than classical SVM in terms of computational complexity by solving linear equations. ν-support vector The associate editor coordinating the review of this manuscript and approving it for publication was Chao Shen . machines [12] verified parameter insensitive/edge models with arbitrary shapes. Mangasarian and Wild [13] proposed the multisurface proximal support vector machine classification via generalized eigenvalues (GEPSVM). From the GEPSVM, Jayadeva et al. [14] proposed a new machine learning method based on binary classification problem, which is called twin support vector machine (TWSVM). On this basis, Ding et al. [15] divided TWSVM into multiple categories and analyzed it based on basic theoretical and geometrical meanings. TWSVM produces two non-parallel hyperplanes, each hyperplane will be as close as possible to one of the two hyperplanes, and to some extent away from the other hyperplane. Compared with classical SVM, TWSVM has faster computing speed and better generalization ability. Thanks to these advantages, TWSVM has been applied in these fields, including identity recognition [16], feature selection [17], verification of handwritten signature [18], wind speed prediction [19], social lending [20] and classification of food [21] etc. To improve multi-classification, many people proposed some methods, including multi-class SVM using second-order cone constraints [22] and Multi-class support vector machine classifiers using intrinsic and penalty graphs [23] etc.
As for support vector regression (SVR), it is to solve the regression problem, and there are also many improved algorithms. In particular, Suykens et al. [24] proposed least squares support vector regression (LSSVR) by introducing the least squares method. In LSSVR, the inequality constraints of SVR are transformed into equality constraints. This strategy can reduce the complexity of SVR and thus improve the learning speed. The core of classical SVR is the minimization of the convex quadratic function, which obeys the pair of linear inequality constraints for all training samples, which leads to its slow learning speed. So Peng [25] proposed the twin support vector regression (TSVR) as an effective regressor, and defined a group of epsilon insensitivity parameters to solve two related SVM type problems. Zhao et al. [26] proposed the notion of twin hyperplanes with the fast speed of least squares support vector regression (LSSVR) yields a new regressor, termed as twin least squares support vector regression (TLSSVR). N.Parastalooi et al. [27] proposed modified twin support vector regression (MTSVR) for data regression. Peng [28] proposed a pair of quadratic programming problems (QPP) for directly optimizing TSVR in primal space (PTSVR) based on a series of linear equations. Y.Shao et al. [29] proposed two related SVM type problems are solved to determine a pair of ε-insensitive proximal functions. R.Khemchandani et al. [30] inspired by Bennett and Bi's work on SVM [31], a new regression model called TWSVR is proposed. Moreover, many other researchers have also done a lot of work [32]- [38].
In addition, TSVR only solves a pair of smaller quadratic programming problems, which is different from the classical SVR. As a result, the number of constraints in each quadratic programming problem is smaller than that in the classical SVR, so the training samples speed will be faster. However, TLSSVR transforms inequality constraint problem into equality constraint problem, which is relatively low accuracy. Therefore, it has some limitations. In order to improve the accuracy of TLSSVR, we propose a new optimization algorithm based on twin support vector regression (TSVR) and twin least squares TSVR (TLSSVR) to make up for some shortcomings of TLSSVR. It mainly combines TSVR, LSSVR and introduces heteroscedastic, homoscedastic Gaussian noise error, which is called twin least squares support vector regression with heteroscedastic Gaussian noise (TLSSVR-HGN) and homoscedastic Gaussian noise (TLSSVR-GN). In addition, this article introduces the sparse version of TLSSVR-HGN (STLSSVR-HGN). In the real data set, we use the wind speed data set, the experimental results have verified that the accuracy of TLSSVR-HGN is higher than STLSSVR-HGN, TLSSVR-GN, TLSSVR, LSSVR and SVR. At the same time, TLSSVR-HGN has better generalization ability than the other five algorithms. We have done similar experiments on other data sets, and the feasibility and effectiveness of the proposed methods are validated.
The rest of this article is structured as follows. The second section mainly explains some symbols used and the derivation of TSVR and TLSSVR formulas. The third section mainly introduces the TLSSVR-HGN modeling process, the TLSSVR-HGN algorithm and the STLSSVR-HGN algorithm. The fourth section mainly describes the experimental results and analysis on the artificial data set, several UCI data sets and wind speed data set. The last section summarizes this work.

II. RELATED WORK
In this section, we review the classical TSVR and TLSSVR based on Gaussian noise distribution.
Assume the noise in X l is additive, namely where ξ i is random, under the assumption that the observations are drawn i.i.d.(independently and identically distributed) with P(ξ i ) of mean µ and standard deviation σ i (i = 1, · · · , l).

A. TWIN SUPPORT VECTOR REGRESSION
In order to improve the computational speed and generalization performance of normal support vector regression (SVR), in spirit of twin support vector machine (TWSVM) [14], Peng [25] proposed an efficient regressor, termed as twin support vector regression (TSVR), which is obtained by solving the following two quadratical programming problems, shown as and where K (X , X T ) represents the kernel matrix whose element k ij = k(x i , x T j ), x T j is the row vector of X , e is a vector of unities of proper size, 1 ≥ 0 and 2 ≥ 0 are threshold parameters, ξ and η are slack vectors. Here, is kernel function which gives the inner product of φ(x i ) and φ(x j ) in the high dimensional feature space, where φ(•) is usually a nonlinear mapping, which VOLUME 8, 2020 can map the input space x i into a high-dimensional feature space φ(x i ).
The dual problem of P TSVR is expressed in the following and where H = [K (X , X T ), e], g = y-eε 1 , h = y + e 2 . When the optimal solutions α of (2) and β of (3) are obtained, the optimal orientation vectors and biases follows: And two functions f 1 (x) and f 2 (x) are obtained as, Each one determines the -insensitive lower-bound or upper-bound regressors. Hence, the predictor of TSVR is constructed as From Fig. 1, we regard Down-sinc(X ), Up-sinc(X ) and sinc(X ) as f 1 (x), f 2 (X ) and f (x), respectively.

B. TWIN LEAST SQUARES SUPPORT VECTOR REGRESSION
Unlike TSVR in (2) and (3), where the inequality constrains are used, here the equality constrains are utilized in order to obtain twin least squares support vector regression (TLSSVR) [26]: and where ω 1 , ω 2 are the normal vectors of the hyperplane, b 1 , b 2 are the offsets, ξ i , ξ * i (i = 1, · · · , l) represent the prediction residual, C 1 , C 2 are the regularization parameters.
By the Lagrangian multiplier method, we have the solution of Eq. (6): After obtaining the solution α by (6), for any new testing samples x ∈ R L , we have the lower-bound prediction function of TLSSVR: Analogously, for any new testing samples x ∈ R L , we have the upper-bound prediction function of TLSSVR by (7): In LSSVR, the Lagrange multiplier vector α is proportional to the error vector ξ in the training sets, indicating that LSSVR loses sparsity. In addition, compared with SVR, LSSVR has almost no robustness, which indicates that LSSVR also has some shortcomings compared with classical SVR, and some new methods are needed to make up for these shortcomings. And models TSVR and TLSSVR do not consider the heteroscedasticity of D l . In the following section, we construct a new twin Least squares support vector regression for heteroscedastic tasks.

III. TWIN LEAST SQUARES SUPPORT VECTOR REGRESSION OF HETEROSCEDASTIC GAUSSIAN NOISE MODEL AND ALGORITHM
As a function approximation machine, the objective is to estimate an unknown function f (x) from the data set X f ⊆ X l . Generally, the optimal loss function in a maximum likelihood sense [39], [40] is where p(ξ i ) be probability distribution of ξ i .
We assume that the noise in Equation (1) is Gaussian, with zero mean and heteroscedastic variance σ 2 (9), the optimal loss function corresponding to Gaussian noise with heteroscedasticity is If the noise in Equation (1) is Gaussian, with zero mean and the homoscedastic variance σ 2 , the loss function is degenerated to c(ξ i ) = 1 2 ξ 2 i (i = 1, · · · , l), which is the squared loss function.

A. TWIN LEAST SQUARES SUPPORT VECTOR REGRESSION OF HETEROSCEDASTIC GAUSSIAN NOISE MODEL
Give the training samples X f , we construct a lower-bound nonlinear regression function f 1 Following the idea of forming two hyperplanes in TSVR, the model of twin least squares support vector regression of heteroscedastic Gaussian noise model (TLSSVR-HGN) can be formally defined as where σ 2 i (i = 1, 2, · · · , l) are variances; 1 ≥ 0 is threshold parameter, C 1 > 0 is penalty parameter; whereas ξ i (i = 1, · · · , l) are random variables.
Analogously, an upper-bound nonlinear regression func- The uniform model of TLSSVR-HGN of an upper-bound can be formally defined as where σ * i 2 (i = 1, 2, · · · , l) are variances; 2 ≥ 0 is threshold parameter, C 2 > 0 is penalty parameter; whereas ξ * i (i = 1, · · · , l) are random variables. We introduce the Lagrangian functional L(ω 1 , b 1 , α, ξ ) as To minimize L(ω 1 , b 1 , α, ξ ), we derive partial derivative ω 1 , b 1 , ξ , respectively. Obtained by the KKT (Karush-Kuhn-Tucker) conditions, we have Thus Substituting of the extreme conditions into L(ω 1 , b 1 , α, ξ ) and seeking maximum of α, we obtain the dual problem of the primal problem (11) where C 1 > 0 is penalty parameter. After ω and ξ , we can obtain the solution Similarly, we have the dual problem of the primal problem (12) where C 2 > 0 is penalty parameter. After ω * and ξ * , we can obtain the solution The lower-bound nonlinear regression function f 1 (x) = ω T 1 · (x) + b 1 , and upper-bound function f 2 (x) = ω T 2 · (x) + b 2 of TLSSVR-HGN can be obtained by solving VOLUME 8, 2020 the above optimization problems, respectively. We obtain the lower-bound and upper-bound regressors of TLSSVR-HGN as Once f 1 (x) and f 2 (x) are got, the end regressor of TLSSVR-HGN is constructed as Specially, the noise in Equation (1) is Gaussian, with zero mean and the homoscedastic variance σ 2 , following the idea of forming two hyperplanes in TSVR, the model of twin Least squares support vector regression of homoscedastic Gaussian noise model (TLSSVR-GN), also called twin kernel ridge regression of Gaussian noise model (TKRR-GN), can be formally defined as and where 1 , 2 ≥ 0 are threshold parameters, C 1 , C 2 > 0 are penalty parameters; whereas ξ i , ξ * i (i = 1, · · · , l) are random variables.
The dual problems of the primal problems (20) and (21) of and where C 1 > 0 and C 2 > 0 are penalty parameters.

B. ALGORITHM OF AUGMENTED LAGRANGE MULTIPLIER METHOD
Before dealing with the eq.(17), we need to calculate v i . The parameters v i is to make the regression hyperplane closer to (x i , y i ) ∈ X f . When the training errors occur in the training samples which lying down the -insensitive lower-bound function, v i will be weighting the training errors. In the beginning, we will make a rough estimate of the location of these samples using normal LSSVR [41], because we do not know which samples lie down the -insensitive lower-bound function. From eq.(15), the slack errors ξ i are proportional to the Lagrangian multiplier α i . As shown below, the parameters v i (i = 1, · · · , l) are initialized.
where ρ = 1 Bring v i (i = 1, · · · , l) into eq.(17) to produce a new Lagrange multiplier α, If α meets the above conditions eq. (24), v i (i = 1, · · · , l) are recomputed. When the index set S is no longer updated, this calculation stops iterating. Remark 1: In step 2 of Algorithm 1, the computational complexity is O(l 3 ) when calculating the inverse matrix. In step 4, we calculate the inverse matrix. When the index Algorithm 1 Lower-Bound Function of TLSSVR-HGN 1: Initialization: From the training data X f get the kernel parameters, and determine ρ 1 and ε 1 , and the regularization parameter C 1 ; 2: From eq.(17) gets Lagrange multiplier sequence α, and get index set S 1 whose elements correspond to α i <0, where α i is the ith entry of vector α. 3: According to α,get v i (i = 1, · · · , l) set S does not change, we assume that the number of calculation cycles is l 1 , the computational complexity of this process is l 1 O(l 3 ). At last, the total computational complexity of Algorithm 1 is (l 1 + 1)O(l 3 ).
Remark 2: Similar to Remark 1, when the index set S does not change in Algorithm 2, we assume that the number of calculation cycles is l 2 , the total computational complexity of Algorithm 2 is (l 2 + 1)O(l 3 ).
Finally, the prediction function of TLSSVR-HGN is shown as follows: Similarly, we can get the algorithms of lower-bound function and upper-bound function of TLSSVR-GN.
From the solution of TLSSVR-HGN, we can find out that TLSSVR-HGN is not sparse. To analyze its generalization ability, we use a simple mechanism to sparse TLSSVR-HGN in Algorithm 3. The final prediction function of TLSSVR-HGN include the lower-bound function and upper-bound function. That is to say, it seems nonsense when the lower-bound function and upper-bound function are sparse, respectively. In general, the larger α i + α * i , the greater the effect of the training pair (x i , y i ) in this model TLSSVR-HGN. Therefore, a natural idea is that the training data with smaller α i + α * i is called nonsignificant data. A simple mechanism is proposed to impose sparseness by pruning support values [42], viz α i + α * i , which come from the sorted support value spectrum for linear system. Follow a similar idea, the sparsity is introduced by gradually omitting the least significant samples in the training set and retraining the TLSSVR-HGN.

Algorithm 2
Upper-Bound Function of TLSSVR-HGN 1: Initialization: From the training data X f get the kernel parameters, and determine ρ 2 and ε 2 , and the regularization parameter C 2 ; 2: From eq.(19) gets Lagrange multiplier sequence α * , and get index set S 2 whose elements correspond to α * i >0, where α * i is the ith entry of vector α * . 3: According to α * ,get v i (i = 1, · · · , l) Algorithm 3 Sparse TLSSVR-HGN (STLSSVR-HGN) 1: Given the complete training set X f , get the TLSSVR-HGN model by Algorithms 1 and 2; 2: For X f in step 1, remove a small amount of training samples(e.g. 5% of the set) with smallest values α i + α * i in the spectrum; 3: Re-implement Algorithms 1 and 2 to get the TLSSVR-HGN based on the reduced training samples; 4: if the performance of TLSSVR-HGN degrades then 5: Go to step 9; 6: else 7: Go to step 2; 8: end if 9: Output the function of STLSSVR-HGN.
Remark 3: By Remark 1 and Remark 2 we can get the total computational complexity is (l 1 + l 2 + 2)O(l 3 ).
Remark 4: We assume that the number of computing TLSSVR-HGN model is n. Therefore, the computational complexity of Algorithm 3 is n(l 1 + l 2 + 2)O(l 3 ), where the range of n will not be larger than our experiment. VOLUME 8, 2020

IV. EXPERIMENTS AND DISCUSSION
In this section, to check the regression performance of TLSSVR-HGN, we compare it with STLSSVR-HGN, TLSSVR-GN, TLSSVR, LSSVR and SVR on several data sets, including the artificial data set and UCI data set. To further verify its validity, TLSSVR-HGN is used to predict the real-world application of short-term wind in a final experiment.
In all experiments, the regression algorithms are implemented in the Python 3.7 environment on Windows 7 running on a PC with 4 GB of RAM. The initial parameters of the proposed ALM approach are C ∈ [0.5, 250]. We use 10-fold cross validation strategy to search the optimal positive parameters C [43]. To reduce the computational complexity of parameter selection, we set C 1 = C 2 and 1 = 2 . For STLSSVR-HGN, we set the number of reduced samples to be five percent of the total training samples in Algorithm 3. Many actual applications suggest that Gaussian kernel function tend to perform well under general smoothness assumptions. In this work, polynomial function and Gaussian kernel function are used as the kernel function of different models.
where d is a positive integer, γ is a tuned parameter and determined from the interval [0.1, 10] by cross-validation. Similarly, we also set γ 1 = γ 2 to reduce the computational burden of parameter selection. In order to report the performance of the aforementioned algorithms, the evaluation criteria are specified before presenting experimental results. We introduce the following criterions, mean absolute error (MAE), root mean square error (RMSE), sum of squared errors (SSE), total sum of squares (SST), and sum of squares of the regression (SSR) to evaluate the performance of different models. Obtaining a smaller SSE/SST is usually accompanied by an increase in SSR/SST [25].
Formula (29): Sum of squares of the regression.
Formula (30): The ratio of the sum of squares of the testing samples to the sum of squares of the deviations.
Formula (31): The ratio between the explanatory sum of squares deviation of the testing samples and the actual sum of squares.
Where l is the size of the samples, y i is the real data, y * i is the prediction results, y = 1 l l i=1 y i is the mean of testing samples. In addition, #SV represents the number of support vectors, CPU time denotes the time of testing samples [26].

A. ARTIFICIAL DATA SET
First of all, we compare the six models on the sinc(X ) function, which is used to test the performance of the regression. It is shown as To effectively check the performance of the proposed method, the training samples are perturbed by Gaussian noise in the experiment. The sinc(X ) function with noise are shown as 4,4] , ξ i ∈ N (0, 0.15 2 ). (33) where U [a, b] represents the uniformly random variable in [a, b] and N (c, d 2 ) represents the Gaussian random variable with mean c and variance d 2 .
We sample 200 training samples and 200 testing samples, the testing samples are uniformly generated from the sinc(X ) function. In addition, the testing samples do not contain any noise. Fig. 2 shows the prediction results of different models on the sinc(X ) function. Fig. 3 shows the prediction errors of four models on the sinc(X ) function. Table 1 shows the result comparisons of six models on the sinc(X ) function. Table 1 shows TLSSVR-HGN derives the smallest MAE, RMSE, SSE and SSE/SST, and the largest SSR/SST among these algorithms. In addition, its generalization performance is best, i.e., it owns smallest and largest evaluation criteria, respectively, viz. RMSE and SSR/SST. Fig. 3 shows the prediction error of TLSSVR-HGN is closer to zero than other three algorithms, Fig. 2 shows the fitting capacity of TLSSVR-HGN is perfect. However, like other algorithms such as TLSSVR-GN and TLSSVR, the solution of   TLSSVR-HGN is not sparse, so we sparse TLSSVR-HGN, which is named STLSSVR-HGN. From Table 1, we also compare the testing CPU time for these methods. Except for the LSSVR, the results indicate STLSSVR-HGN takes obvious advantage in the testing time. The reason is that the number of support vectors is reduced. In short, our TLSSVR-HGN outperforms TLSSVR-GN, TLSSVR, LSSVR and SVR on approximating this artificial function.

B. UCI DATA SET
For this part, to further evaluate the regression performance of the proposed method, we use two UCI data sets [44], including the Boston house price and stock. For the two data sets, the parameters of this model are obtained using 10-fold  cross validation. Each group of UCI data set is divided into 100 training samples and 100 testing samples.
On the stock market, we can use the market index to predict the overall trend of the stock. Figs. 4-7 show the prediction results of TLSSVR-HGN, STLSSVR-HGN, TLSSVR and    TLSSVR-GN on the stock, respectively. Fig. 8 shows the prediction errors of four models on the stock. Table 2 lists the error statistics of TLSSVR-HGN, STLSSVR-HGN, TLSSVR, TLSSVR-GN, LSSVR and SVR models on the stock. On the Boston house price, it has 13 features that determine the trend of the house price. Figs. 9-12 show the prediction results of TLSSVR-HGN, STLSSVR-HGN, TLSSVR and TLSSVR-GN on the house price, respectively. Fig. 13 lists the statistics of house price errors for the four models. Table 3 shows the error statistics of six models on the Boston house price.       obtains the largest SSR/SST on the two data sets and keeps the smallest SSE/SST among the six methods. In addition, the experiment results show STLSSVR-HGN still takes obvious advantage for STLSSVR-HGN, TLSSVR, TLSSVR-GN and SVR in the testing CPU time. However, compared to TLSSVR-HGN, STLSSVR-HGN has larger RMSE and smaller SSR/SST. Therefore, STLSSVR-HGN is not as good as TLSSVR-HGN in terms of generalization capabilities. Figs. 8 and 13 show that the error of TLSSVR-HGN is closest to zero among the four algorithms. In conclusion, the proposed method has the good regression performance, STLSSVR-HGN has smaller testing CPU time than TLSSVR-HGN, TLSSVR-GN, TLSSVR and SVR.

C. SHORT-TERM WIND SPEED FORECASTING
To show the effectiveness of the proposed model, we collected wind speed dataset from Heilongjiang Province, China. Data-set consists of more than 1-year wind speeds, recording the average wind-peed values every 10 min. As a whole, these samples with 4 attributes, mean, variance, minimum, maximum, are given. We analyze one-year time series of wind speeds. The result shows that the error of wind speed in one or two hours with the persistence forecast ( [19], [45]) satisfy the Gaussian distribution. However, their variances change with the average wind speed. This is a heteroscedastic kind of task. Now we test the performance of different techniques on wind speed prediction. The experiment setup is described as follows. We transform the original series into a multivariate task by using a 12-dimensional vector → x i = (x i−11 , x i−10 , · · · x i−1 , x i ) as input variables to predict x i+step , where step = 1, 3. That is to say, we use the above model to predict the wind speed after 10, 30 minutes of every point x i , respectively.
On the wind speed, we divide the data set into 100 training samples and 100 testing samples. To obtain the model parameters, we use 10-fold cross validation. In the data set, the next two sample points are separated by 10 minutes, so we first predict a sample point every 10 minutes and then a sample point every 30 minutes. Figs. 14-17 respectively    shows the wind speed prediction result of TLSSVR-HGN, STLSSVR-HGN, TLSSVR-GN and TLSSVR after 10 minutes. Fig. 18 shows the prediction error results of the four models on wind speed after 10 minutes. Similarly, Figs. 19-22 respectively shows the wind speed prediction result of TLSSVR-HGN, STLSSVR-HGN, TLSSVR-GN and TLSSVR after 30 minutes. Fig. 23 shows the prediction error results of four models on wind speed after 30 minutes. Table 4 shows the prediction error statistics of the six models on wind speed after 10 minutes. Table 5 shows the prediction error statistics of the six models on wind speed after 30 minutes.      effective method for predicting wind speed. Fig. 18 and 23 show that compared with other algorithms, the prediction result of TLSSVR-HGN still is closer to the original data. In other words, the fitting result of TLSSVR-HGN is better than other three models. From Tables 4 and 5, we can also see that compared with TLSSVR-GN, TLSSVR, LSSVR and SVR, STLSSVR-HGN has smaller RMSE and larger SSR/SST, it indicates STLSSVR-HGN has better generalization performance than TLSSVR-GN, TLSSVR, LSSVR and SVR. In addition, STLSSVR-HGN still has the smallest testing CPU time among TLSSVR-HGN, TLSSVR-GN, TLSSVR and SVR. Hence, it is necessary to develop STLSSVR-HGN.
Observing from the error statistics in Tables 1-5, we can know TLSSVR-HGN model is better than STLSSVR-HGN, TLSSVR, TLSSVR-GN, LSSVR and SVR models in terms of some criterions (MAE, RMSE, SSE, SSE/SST and SSR/ SST). In addition, we can also know LSSVR has the smallest testing CPU time among the six algorithms. In short, this experiments on the artificial data set, several benchmark data sets and the real world data of wind speed confirm the feasibility and efficacy of TLSSVR-HGN and STLSSVR-HGN furthermore.

V. CONCLUSIONS
In this paper, we propose a new regression technique, referred to TLSSVR-HGN. The classical TLSSVR, TLSSVR-GN, LSSVR and SVR techniques make the assumption that the error distribution is Gaussian. However, it was reported that the noise models in some real-world applications do satisfy heteroscedasticity. Thus, it is necessary to use a new technique to build prediction models. In this work, we derive a heteroscedastic noise loss function and develop a new technique by proposing a twin regressor of heteroscedastic noise model. According to KKT conditions, we introduce the Lagrangian functional and obtain the dual problem of TLSSVR-HGN. The ALM approach is applied to solve TLSSVR-HGN. To analyze the generalization ability of the proposed model, we propose sparse TLSSVR-HGN (STLSSVR-HGN). Finally, experimental results on the artificial data set, several UCI data sets and the real-world data of wind speed confirm the effectiveness of the proposed technique. We compared different models on stock, Boston house price and wind speed data set, it is concluded that TLSSVR-HGN has better generalization performance than other algorithms, and it is better than other models at predicting data.
This work just discussed the problem of regression models with Gaussian heteroscedasticity. In fact, there is a similar challenge in classification learning. In a similar manner, we can study the classification problem for the heteroscedastic noise model in the future.
SHIGUANG ZHANG received the B.Sc. degree in mathematics from Shaanxi Normal University, in 1998, the M.S. degree in mathematics from Guangxi University for Nationalities, in 2007, and the Ph.D. degree in applied mathematics from Hebei Normal University, in 2014. He is currently working with the College of Computer and Information Engineering, Henan Normal University, China. He has authored more than 20 peer-reviewed articles. He has served as a reviewer in several prestigious peer-reviewed international journals. His research interests include optimization algorithm, machine learning, and knowledge discovery for regression.
CHAO LIU received the B.Sc. degree in computer science and technology from the Hengshui College, in 2018. He is currently pursuing the master's degree with the College of Computer and Information Engineering, Henan Normal University. His main research interests include machine learning and knowledge discovery for regression. He has received funding from ten grants from the National Science Foundation of China, the China Postdoctoral Science Foundation, the Science and Technology Innovation Talents Project of Henan Province, and the Key Project of Science and Technology of Henan Province. He has authored or coauthored for over 70 articles. He has written three books on the topic of intelligent information processing. He holds 15 patents of invention. His main research interests include granular computing, optimization algorithm, intelligent information processing, and data mining. He has received the title of Henan's Distinguished Young Scholars for Science and Technology Innovation Talents. He has served as a reviewer in several prestigious peer-reviewed international journals.