Parameter Optimization of Support Vector Regression Using Henry Gas Solubility Optimization Algorithm

,


I. INTRODUCTION
As a popular and powerful machine learning algorithm, support vector machine (SVM) has widespread applications. It was first proposed by Vapnik [1]. The invention of software package LIBSVM [2] makes SVM more concerned. Many scholars have noticed the superiorities of SVM, such as excellent generalization performance, high ability to model complex and non-linear relations, etc [3]. Support vector regression (SVR) and support vector classification (SVC) are the two major components of SVM. SVR is the focus of this work. It has been applied in many fields such as stock market price forecasting [4], bearing health monitoring [5], electric load forecasting [6], river stage prediction [7], prediction of global solar radiation [8], predictive model of surface The associate editor coordinating the review of this manuscript and approving it for publication was Zhaojun Li . roughness in lenses precision turning [9], total organic carbon content prediction [10], etc.
Although SVR has great advantages and wide applications, its performance is still affected by the initial setting parameters, such as the kernel type, penalty factor and other parameters in the kernel function. Especially in the processing of high-dimensional data, the effects regarding the generalization performance and accuracy are relatively large. Therefore, searching the optimal parameter set of SVR is a very challenging task. The firefly algorithm (FA) was involved to optimize SVR parameters for the stock market price forecasting [4]. The particle swarm optimization (PSO) was employed to elect parameters of SVR for the prediction of total organic carbon content [10]. The ant lion optimizer (ALO) was adapted to seek for the SVR's optimal parameters for on-line voltage stability assessment [11]. The dragonfly algorithm (DA) was drawn into SVR to obtain the optimal parameters for the prediction and application of VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ porosity [12]. The salp swarm algorithm (SSA) was introduced to optimize SVR for the prediction of long-term temperature effect in structural health monitoring of concrete dams [13]. The fruit fly optimization algorithm (FOA) was involved in SVR for predicting the number of vacant parking spaces after a specific period of time [14]. The genetic algorithm (GA) was applicably administered to determine optimal SVR parameters for forecasting bed load transport rates of three gravel-bed rivers [15]. Three interesting findings can be obtained from the above literatures.
1) The prediction stability of SVR is very excellent. However, the hybrid SVR method combined with other intelligent algorithms loses a lot of stability performance in order to obtain the better prediction accuracy. 2) Almost all the work started from a specific application field. The developed algorithm was not verified in other fields.
3) The convergence performance and computational complexity are rarely discussed and compared.
Motivated by these issues, this work is an attempt to fill the gaps, and makes contributions in the following two areas. First, a novel swarm intelligence optimization approach is sought to optimize SVR parameters and obtain good prediction accuracy, convergence performance and computational complexity with relatively small predictive stability loss. Second, many data sets from various fields are utilized to test the proposed approach.
Henry gas solubility optimization (HGSO) algorithm is an up-to-date meta-heuristic method [16], which was invented by Hashim et al. [17], [18] in 2019. It simulates the behavior of Henry's law and imitates the huddling behavior of gas to balance exploitation and exploration in the search space. HGSO algorithm has evaluated on several benchmark data sets and achieved significant superiority against some competitive algorithms. In addition, it has little influence on SVR prediction stability via subsequent experiments.
In this paper, we propose a novel SVR-HGSO model for the first time. HGSO is applied to tune the parameters of SVR for obtaining the high predictive accuracy and stability. First, SVR parameters are randomly generated in some certain ranges to form parameter population. Second, the prediction accuracies (PAs) are obtained using the population and SVR. Thirdly, the population and optimal SVR parameters are updated via PAs and HGSO. The second and third steps are repeated until the cut-off conditions are met. Finally, the optimal SVR parameters are achieved. Ten lowand high-dimensional benchmark data sets are utilized to assess the accuracy, convergence performance and computational complexity of the proposed approach. Simultaneously, the SVR, SVR-FA, SVR-PSO, SVR-ALO, SVR-DA, SVR-SSA and SVR-HHO (harris hawks optimization [19]) algorithms are compared with our approach. The reason for choosing SVR algorithm is to observe the changes of performance on prediction accuracy, convergence stability and rate. The other hybrid algorithms are selected because they are very similar to the proposed approach and have the outstanding performance and practical applications.
The innovation includes two points. First, HGSO algorithm is involved in SVR parameter optimization. The proposed approach has excellent comprehensive performance against the other competitive algorithms. It has competitive prediction performance, excellent stability and convergence performance. Second, the proposed approach is tested and verified in many fields, such as the airfoil self-noise, air quality, automobile price prediction and so on.
The remainder is arranged as follows. A related literature review is listed in Section II. Section III presents research methodology and analysis methods. The SVR-HGSO integrated approach are developed in this section. Section IV provides the case study and discussion. The conclusion and future work are summarized in Section V.
The no free lunch theorem in optimization [28] proves that there is and will never be an algorithm to resolve all optimization problems. Hence, brand new algorithms may have the potential to outperform the present ones on some problems. The HGSO approach is a new hand. However, it has showed amazing optimization performance. Compared with PSO, gravitational search algorithm (GSA) [29], cuckoo search algorithm (CS) [30], GWO, WOA, EHO and SA, it obtained competitive and superior results [17]. These conclusions provide us with the feasibility and rationality of integrating it into SVR parameter optimization, so that we can combine HGSO and SVR to form a new machine learning approach for prediction. The experimental results also show that the proposed approach has excellent comprehensive performance.

III. THE SVR-HGSO APPROACH
In this section, we describe the novel SVR-HGSO hybrid approach. The general idea of the methodology is as follows. a) do the SVR parameter population representation. b) obtain PAs based on the population using fitness functions and SVR. c) update the population and calculate optimal SVR parameters via PAs and HGSO's core operations. We repeat step b and c until the cut-off condition is met. The cut-off condition is generally set to reach the maximum iteration.
In the idea, there are two issues that need special attention [3]. The first is how the SVR parameter population is represented. The second is the selection of evaluation method for PAs. In SVR-HGSO, the discussions of the issues are shown in the following.
1) SVR parameter population representation: The population X is made up of individuals X i . We express the population as X = {X 1 , X 2 , · · ·, X n }, where n indicates the number of individuals in the population. The attributes of X i are determined via SVR parameters.
-SVR [2] is elected as the specific SVR method in this work. We select the four main SVR parameters, the kernel type K , penalty factor C, gamma γ in kernel function and in loss function. The individual 2) The selection of evaluation method for PAs: The mean squared error MSE is the major index in regression or prediction problems. We can easily achieve MSE since -SVR of LIBSVM itself has the MSE output [2]. In this paper, MSE is elected as the fitness function (optimization objective). In a broad sense, -SVR and MSE are jointly responsible for the fitness function of HGSO. The whole process of the SVR-HGSO model is depicted in Fig. 1. The specific steps are described as follows.
Step1 Confirm the lower and upper limits of K , C, γ and for forming the lower limit sets LB and the upper limit sets UB. LB = lb K , lb C , lb γ , lb . UB = {ub K , ub C , ubγ , ub }. So that, we initialize randomly in the limits of LB and UB. The maximum of the iteration is marked as MaxIter. The loop count variable m is set to 1. Step2 Divide the input samples into training data and testing data. Step3 Based on X and training data, the supervised learning is done using -SVR for obtaining training models. Step4 Obtain MSE sets based on testing data using training models. Step5 Select the optimal MSE and corresponding current optimum SVR parameter set X opt . Step6 If m > MaxIter, go to Step9; otherwise, go to Step7.
Step7 Update X based on X opt using HGSO's core operations, such as Henry's coefficient updating, solubility updating, position updating, local optimum escaping and the position updating of the worst individuals. Step8 m = m + 1. Go to Step3.
Step9 Output X opt .

A. HENRY GAS SOLUBILITY OPTIMIZATION
The HGSO algorithm was proposed in 2019 by Hashim et al. [17]. Based on Henry's law, it imitates the huddling behavior of gas to balance exploitation and exploration in the search space and avoid local optima. It should be noted that the gas population here corresponds to the SVR parameter population. The core operations required for this work are listed as follows.
where T (m) denotes the temperature of m th generation. 15. It is worth noting that there is another level between population and individual: cluster in HGSO. The population is divided into equal clusters equivalent to the number of gas types. Each cluster shares the same Henry's constant value (H j ). The first generation coefficient H j (1) = 0.05 · rand(). rand() is a function that generates a random number between 0 and 1. Another coefficient C j = 0.01 · rand().
where S i,j (m) represents the solubility of gas i in cluster j of m th generation. Ks is a constant. P i,j (m) denotes the partial pressure on gas i in cluster j of m th generation. P i,j (m) = 100· rand(). VOLUME 8, 2020 3

) UPDATE POSITION
The position here corresponds to the SVR parameters in this work. This operation is very critical and updated using Eq. (3).
where X i,j (m) indicates the position of gas i in cluster j of m th generation. X i,j (m + 1) is the next position of X i,j (m). r is a random value between 0 and 1. α = 1. β is a constant. F is the flag that changes the direction of the search agent and provides diversity = ±. X j,opt (m) is the best gas in cluster j of m th generation. X opt (m) represents the best gas of m th generation. F opt (m) denotes the fitness (MSE) of the best gas of m th generation. F i,j (m) is the fitness of gas i in cluster j of m th generation.

4) ESCAPE FROM LOCAL OPTIMUM
Rank and elect the number of worst individuals (N w ) using Eq. (4).

5) UPDATE THE POSITION OF THE WORST INDIVIDUALS
Let the worst individual regenerate within the numerical range using Eq. (5).
where G k denotes the position of the worst individuals. 1 k N w . ra indicates a random value between 0 and 1. The pseudo-code of core operations of HGSO is displayed in Fig. 2.

B. -SUPPORT VECTOR REGRESSION
The -SVR prediction approach is a type of SVR that was developed by Vapnik [1]. We consider a set of training samples,{(xt 1 , z 1 ) , · · ·, (xt l , z l )}, where xt i ∈ R n is a feature vector and z i ∈ R 1 is the target output. The penalty factor C > 0 and > 0. The standard form of -SVR is The dual problem is where Q ij = K xt i , xt j = φ(xt i ) T φ xt j and means the kernel function. e = [1, 1, · · ·, 1] T .
We apply LIBSVM to resolve the -SVR problem and output α * − α. It's remarkable that the selection of K (·) has a significant impact on the output and operation efficiency. K (·) must be the symmetric function satisfying Mercer condition. Four classic forms are listed as follows.
1) Linear kernels: 2) Polynomial kernels: 3) RBF kernels: 4) Sigmoid kernels: RBF and sigmoid kernel functions have high efficiency, especially in the case of predicting high dimensional samples. Therefore, these two functions are elected as our alternatives. The kernel function is elected via the kernel type K in LIBSVM. The coefficient coef is set to 0. In general, the remaining parameters we can adjust are K , C, γ and . In fact, the standard -SVR just solve the problem which the dimension of target output is 1. In case of multi-dimensional target output problem, it's necessary to split the output into single dimension and forecast separately, and finally average the prediction results.

IV. CASE STUDY
The experimental conditions involve (i) a PC with an Intel Xeon W-2123, 3.60GHz, 16GB RAM, and Windows 10, (ii) Matlab R2018a, (iii) LIBSVM software package for Matlab, and (iv) the data sets, which were obtained from the University of California at Irvine (UCI) machine learning repository [31]. The main features of 10 data sets are shown in Table 1. The data sets are varied in their number of instances and attributes for testing the performance to deal with the problems of different complexity. The data set i and j have too high instances. In order to speed up the operation, they are normalized between 0 and 1.
In this work, three experiments are performed. The first experiment is applied for testing the prediction accuracy and convergence stability. The second experiment is to verify the rate of convergence. In the last experiment, the computational complexity is calculated and compared for each approach.

A. EXPERIMENT I
For testing the prediction accuracy and convergence stability, the proposed SVR-HGSO approach is compared with SVR-HHO, SVR-SSA, SVR-DA, SVR-ALO, SVR-PSO, SVR-FA and single -SVR based on the ten data sets. The initial parameter setting of SVR-HGSO and other algorithms are represented in Table 2. The 10-folds cross-validation training/testing technology is applied. The ratio of training data to testing data was 9 to 1. The optimal prediction accuracy (MSE) is obtained via the steps of Section 3 based on the training and testing data. This process is repeated ten times for each algorithm. The average (Avg) and standard deviation (Std) of ten optimal prediction accuracies are revealed in Table 3. The box-plot charts forms are represented in Fig. 3. In this figure, the box denotes the interquartile range, the whiskers indicate the maximum and minimum MSE, the cross sign in the box is the mean MSE, and the circles express the outliers of the accuracy values. These results reveal that SVR-HGSO achieves the best performance on data set h and j. In addition, SVR-HGSO has a badly stable output on data set a, c, d, e, g, h, i and j via the Std data. Fig. 4 demonstrates these data of all -SVR based algorithms. According to this figure, SVR-HGSO's convergence stability is optimum compared with the other hybrid algorithms. This result is very outstanding.
Furthermore, the Wilcoxon's test is utilized to test the statistical significance of the difference of the given prediction outcomes and obtain P values with a 5% significance level.
The P values of SVR-HGSO vs. other algorithms are shown in Table 4. The P values more than 0.05 reveal that the SVR-HGSO results have no statistically significant differences compared with other approaches. These data are marked with underline in Table 4. According to the table, there is no significant difference between our approach and SVR-DA on 7 data sets. Compared with SVR-SSA, the number of data sets with no significant difference is 6. Compared with SVR-HHO with the best accuracy, there is no significant difference on the data set a, b, e, i and j. In particular, compared with -SVR, this value is 0, which shows that there is a significant difference between the proposed approach and -SVR. The above results confirm the capacity of the HGSO algorithm to optimize the parameters of -SVR in terms of prediction accuracy and convergence stability.

B. EXPERIMENT II
In order to verify the convergence rate, SVR-HGSO is compared with SVR-HHO, SVR-SSA, SVR-DA, SVR-ALO, SVR-PSO and SVR-FA. The difference from Experiment I is the cut-off condition. The other experimental conditions are the same as Experiment I. This experiment's deadline condition is changed to make the prediction accuracy reach the maximum value of MSE of each algorithm in each data set of Table 3. The running iteration number is recorded. The process is repeated 10 times and the average number of iteration is calculated. The smaller the number, the faster the convergence rate. The results are revealed in Table 5.
SVR-HGSO obtains the minimum number of iterations on data set b, d, f and j. SVR-HHO achieves the best results on data set c, e and h. SVR-SSA gains the optimal convergence rate on data set a, f and i. SVR-DA performs the best on data set d and e. SVR-PSO has the optimal result on data set d and g. SVR-FA obtains the minimum number on data set b and d. SVR-ALO do not achieve optimal performance. For offering the comprehensive and intuitive evaluation (RCRank) on the convergence rate of these approaches, Eq. (12) is applied.
where nd is the total number of data sets. RCrate represents the normalized value of iteration number. RCold indicates the VOLUME 8, 2020  original value of iteration number. RCMin (RCMax) is the minimum (maximum) of iteration number in light of data set. The result is displayed in Fig. 5. According to the figure, SVR-HGSO obtains the first place with a score of 0.22 (lower is better). SVR-HHO and SVR-DA also have a good performance. Compared with our approach, SVR-FA shows a relatively wide gap. The above results confirm the proposed approach has the capacity of rapid convergence.

C. EXPERIMENT III
The computational complexities of these hybrid algorithms change with the number of sample cases and features, hence it is difficult to calculate them accurately. In this paper, we set the same criteria for these algorithms and obtain the corresponding running time of different samples for testing the computational complexity. The experimental conditions are similar to Experiment II. The deadline condition is the prediction accuracy reaches the average value of MSE for all algorithms in each data set of Table 3. The running time is recorded. The process is repeated 10 times and the average time is calculated. The shorter the time, the lower the computational complexity. The results are revealed in Table 6. The unit is seconds.   SVR-HGSO obtains the lowest computational complexity on data set f and j. It has competitive results on the other data sets. SVR-HHO achieves the minimum operation time on data set c and h. SVR-SSA gains the best results on data set a. SVR-DA performs the best on data set b, d, e and i. SVR-PSO has the optimal result on data set g. SVR-ALO and SVR-FA do not achieve optimal performance. For offering the comprehensive and intuitive evaluation (CCRank) on the computational complexity of these approaches, Eq. (13) is applied.
where CCtrate represents the normalized value of operation time. CCold indicates the running time. CCMin (CCMax) is the minimum (maximum) of operation time for each data set.
The result is shown in Fig. 6. According to the figure, SVR-HGSO wins the first place with a score of 0.18 (lower is better). SVR-DA also have a good performance at the score of 0.22. The other approaches perform not so well. The above results confirm the proposed approach has the synthetically optimum computational complexity.

D. DISCUSSION
As a summary for all data sets in the three experiments, Table 7 lists the statistical values which indicate the number of data sets each algorithm won/tied/lost on prediction accuracy, associated P values, and chi-square χ 2 . It reveals that SVR-HGSO outperforms other algorithms on 2 data sets. Regarding significance, the outcomes show SVR-HGSO is worse on 4, 3, 1, 2, 0, 0, and 2 data sets compared with SVR-HHO, SVR-SSA, SVR-DA, SVR-ALO, SVR-PSO, SVR-FA and single -SVR respectively. The χ 2 results indicate that the stability of SVR-HGSO is second only to -SVR (lower means more stable). That means that SVR-HGSO is better than other hybrid SVR methods in terms of convergence stability.
In order to fully evaluate the approaches, Eq. (14) is utilized to test comprehensive performance (Rank) regarding the prediction accuracy, convergence stability and rate, and computational complexity. Wrate indicates the normalized winning rate of MSE. Srate means the normalized stability ratio on χ 2 . The bigger the winning rate, the better the prediction accuracy. However, the lower the stability ratio, the better the prediction stability performance. Therefore, the two are opposite and need to be unified. The different normalization formulas shown in Eq. (14). Finally, the lower the value of Rank, the better the overall performance. The results are listed in Table 7.
where  proportion with other algorithms is even lower. These mean the proposed algorithm has the optimal comprehensive performance. SVR-HGSO has the optimum prediction accuracy on data set h and j. It holds the best stability of convergence on data set a, c, d, e, g, h and j, except for -SVR. Regarding the convergence rate, it obtains the optimum on data set b, d, f, i and j. It has the lowest computational complexity on data set f and j. It has competitive performances in many other data sets. These prove that SVR-HGSO has the high effectiveness and efficiency, is better at dealing with high-dimensional data and performs not bad on the low-dimensional data.
In terms of prediction accuracy, convergence stability and rate, and computational complexity, SVR-HGSO demonstrates a very competitive power. In fact, the integrated approaches sacrifice a small part of -SVR's stability to pursue prediction accuracy. In this respect, SVR-HGSO is the best balanced compared with SVR-HHO, SVR-SSA, SVR-DA, SVR-ALO, SVR-PSO and SVR-FA and obtains the optimum convergence performance and computational complexity. Hence, we state that the HGSO algorithm is worthy of application in the field of SVR parameter optimization. It has the potential to leverage the whole prediction process.

V. CONCLUSIONS
This work presents a novel integrated approach for optimizing SVR parameters using the HGSO algorithm. The proposed SVR-HGSO approach realizes the continuous optimization of SVR parameters via the iterative process. During this period, the prediction accuracy is generated continually until the cut-off conditions are met. We finally obtain the optimal prediction accuracy and the corresponding SVR parameters. The three experiments of prediction accuracy, convergence and computational complexity were implemented. The varied experimental outcomes on the benchmark data sets revealed SVR-HGSO's best comprehensive performance compared with other well-known algorithms.
The study confirms an immense potential of the SVR-HGSO approach in forecasting field.
For future work, the new ideas on improving the SVR model and/or the HGSO algorithm can be proposed. Secondly, the application of real-world engineering optimization problem should be involved.