An Efficient Parameter Adaptive Support Vector Regression Using K-Means Clustering and Chaotic Slime Mould Algorithm

Support vector regression (SVR) performs satisfactorily in prediction problems, especially for small sample prediction. The setting parameters (e.g., kernel type and penalty factor) profoundly impact the performance and efficiency of SVR. The adaptive adjustment of the parameters has always been a research hotspot. However, the substantial time cost and forecast accuracy of parameter adjustment are challenging to many scholars. The contradiction of big data prediction is especially prominent. In the paper, an SVR-based prediction approach is presented using the K-means clustering method (KMCM) and chaotic slime mould algorithm (CSMA). Eight high- and low-dimensional benchmark datasets are applied to obtain appropriate key parameters of KMCM and CSMA, and the forecast accuracy, stability performance and computation complexity are evaluated. The proposed approach obtains the optimal (joint best) forecast accuracy on 6 datasets and produces the most stable output on 3 datasets. It ranks first with a score of 0.024 in the overall evaluation. The outcomes reveal that the proposed approach is capable of tuning the parameters of SVR. KMCM, CSMA and SVR are skillfully integrated in this work and perform well. Although the performance is not outstanding in terms of stability, the proposed approach exhibits very strong performance with respect to prediction accuracy and computation complexity. This work validated the tremendous potential of the proposed approach in the prediction field.


I. INTRODUCTION
As a part of support vector machine (SVM) [1]- [3], support vector regression (SVR) has produced gratifying achievements in the field of prediction. However, poor choices of penalty factor, kernel type and others can markedly reduce the SVR's capability. Hence, many heuristic algorithms are combined with SVR to realize SVR parameter tuning, including the Henry gas solubility optimization (HGSO) [4], salp swarm algorithm (SSA) [5], dragonfly algorithm (DA) [6], ant lion optimizer (ALO) [7], particle swarm optimization (PSO) [8], firefly algorithm (FA) [9] and others. With the addition of these heuristic algorithms, the prediction accuracy and generalization ability have been greatly improved.
The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues .
However, the parameter tuning requires constant iterations, which involves running SVR many times. The enormous time cost is challenging for many scholars, especially in high-dimensional data prediction.
The integration of the K-means clustering method (KMCM) is an interesting and feasible solution [10]. KMCM is applied to SVR for sales forecasting [11]. For accurate export trade value prediction, a three-stage model is proposed, which consists of FA-based SVR, FA-based KMCM and wavelet transform [12]. The running times of these methods are excellent. However, the prediction accuracies of these methods are not ideal.
Two meaningful findings can be gained from the above-mentioned studies.
1) The hybrid approach of heuristic algorithms and SVR produces ideal prediction accuracy but requires VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ considerable time. KMCM makes the running time of hybrid SVR algorithms very short. However, it reduces prediction accuracy and stability. 2) Almost all research focuses on a particular area. The universality of the development algorithm requires further development. Motivated by these findings, this paper aims to fill the gap and make contributions in the following two respects. First, based on the three-stage model of KMCM-heuristic algorithm-SVR, a new heuristic algorithm is sought to enable the prediction accuracy, stability and calculation complexity to reach the ideal level. Second, the datasets of different fields are applied to verify the proposed algorithm.
The slime mould algorithm (SMA) is a novel heuristic algorithm which was presented by Li et al. [13] in 2020. It is similar to the bacterial foraging optimization algorithm [14]- [16]. However, SMA involves a distinct mathematical model, using accommodative weights to imitate the generation process of positive and negative feedback of a slime mould propagation wave based on a bio-oscillator, forming the optimal path of food connection with good exploration capacity and development tendency. SMA has been evaluated on an extensive set of benchmarks and produced significant advantages against some classical algorithms (e.g., ant colony optimization (ACO) [17], [18], sine cosine algorithm (SCA) [19], SSA [20], grey wolf optimization (GWO) [21], [22], moth flame optimization (MFO) [23]- [25], and whale optimization algorithm (WOA) [26]).
In this paper, the chaotic slime mould algorithm (CSMA) is presented to improve SMA's search efficiency, and then an up-to-date KMCM-CSMA-SVR model is proposed. KMCM is in charge of cluster analysis of training data for reducing the time cost. CSMA is responsible for tuning the SVR parameters to acquire high forecast accuracy and stability. First, the cluster centers are obtained using KMCM. Second, SVR parameters are randomly generated to build the parameter population. Third, the prediction accuracy data are achieved via SVR and the population. Finally, the optimal parameters and population are renewed by CSMA. The last two steps are duplicated until the cut-off condition is met. Eight varied benchmark datasets are applied to calculate the prediction accuracy, stability and computation complexity. As the comparison algorithms, HGSO-SVR, SSA-SVR, DA-SVR, ALO-SVR, PSO-SVR, FA-SVR and KMCM-SVR also participated in the experiment.
There are two innovations. First, the ingenious combination of KMCM, CSMA and SVR is modeled. KMCM is used to improve the index of operation time. CSMA participates in SVR parameter optimization, which makes up for the impact of KMCM and improves the prediction accuracy and other performances. Second, the presented model has been verified on many occasions (e.g., for appliance energy, Beijing PM2.5, and wave energy converters).
The remaining sections are arranged below. The related work is listed in Section II. The KMCM-CSMA-SVR integrated model is shown in Section III. Section IV offers the case study, including the three experiments and discussion. The last section presents the conclusion and the prospect for future research.

II. RELATED WORK
The grid search is a classical approach for adjusting SVR parameters. However, the experimental results showed that the heuristic algorithm is more efficient than grid search in this SVR tuning work [27]- [29]. The genetic algorithm (GA) was involved in the parameter tuning of SVR for modelling gravel bed Riverbed transport [30]. To predict the long-term temperature effect of concrete dam structure health monitoring, SSA was utilized to optimize SVR [5]. The chaos-based FA and SVR were combined to perform stock market price forecasting [9]. With the rapid development of these methods, SVR prediction accuracy and generalization ability have been greatly improved in practical application. However, the time required is approximately the number of iterations multiplied by the time that SVR runs alone. The time spent is greatly increased.
Hence, some scholars set out to improve operation efficiency. A rapid data preprocessing procedure was proposed for SVR [10]. The dataset was handled to obtain clusters using KMCM. SVR only trained the centroids of clusters. This method effectively reduced the time cost. The applications included sales forecasting [11], export trade value prediction [12], and hour-ahead frequency security assessment [31]. KMCM processes the training data to reduce the number of samples, thus reducing the operation time. At the same time, it discards some data, which affects the prediction accuracy and stability.
In the no free lunch theorem of optimization [32], it is revealed that there is no way to solve all problems with one algorithm. This encourages us to find a proper way to overcome the above difficulties. SMA is a novice approach. Its magic lies in its distinct mathematical model, which utilizes accommodative weights to imitate the process of generating positive and negative feedback of the propagation wave of slime mould. Compared with other heuristic algorithms, it produced competitive outcomes [13]. The chaotic local search is added to SMA to improve search efficiency. It offers us the feasibility and rationality to build the KMCM-CSMA-SVR model. The experimental results prove that KMCM-CSMA-SVR offers outstanding comprehensive performance.

III. METHODOLOGY OF THE PROPOSED ALGORITHM
The framework of KMCM-CSMA-SVR is presented below: a) obtain cluster centers based on datasets and KMCM, b) perform the population presentation of SVR parameters, c) gain prediction results based on the population and cluster centers using SVR, and d) renew the population and optimum parameters by CSMA's operations. Steps c and d are reduplicated until the deadline condition is met. The deadline condition is usually set to be the maximum number of iterations. The SVR parameter population − → X consists of individuals where n is the number of individuals. The SVR parameters form the attributes of − → X i . The KMCM-CSMA-SVR algorithm is shown in Fig. 1. The detailed steps are listed below.
Step 1 Input the operation times OT , KMC's number of clusters K , and the iteration maximum MaxIter. The CSMA's loop count variable mi equals 1. The operation count variable oi is equivalent to 1. The optimum SVR parameter archive is marked as − − → X arch .
Step 2 Normalize the data of the same property in the dataset to -1 to 1 and distribute the dataset to training data and test data.
Step 3 Obtain cluster centers CCs based on the training data, KMCM and parameter K .
Step 4 Determine upper and lower limits of KT , PF, γ and to constitute the upper limit set UB and lower limit set LB. UB = ub KT , ub PF , ub γ , ub . LB = lb KT , lb PF , lb γ , lb . − → X is initialized randomly under UB and LB.
Step 5 Supported by − → X and CCs, perform the supervised learning via -SVR to achieve training models.
Step 6 Acquire MSE sets supported by test data and training models.
Step 7 Elect the smallest MSE and corresponding optimal SVR parameter set − − → X opt of the current generation.
Step 8 If mi > MaxIter, store − − → X opt and corresponding MSE in − − → X arch . Go to Step 11; otherwise, go to Step 9.
Step 12 Output the minimum MSE and corresponding SVR parameters in − − → X arch .

A. CHAOTIC SLIME MOULD ALGORITHM
The SMA was proposed in 2020 by Li et al. [13]. It was vitalized by the oscillating patterns of slime mould. For forming the optimum path to connect food with outstanding exploratory capacity and exploitation propensity, a distinct mathematical model was presented using accommodative weights to imitate the process of generating positive and negative feedback of the propagation wave supported by the bio-oscillator. However, SMA suffers from some limitations, including relatively slow convergence rate and the dilemma of local optimum. In this paper, chaotic maps are added to SMA to improve the search efficiency and better exploit the locality of the solutions. The operations required for this work are listed below.
Approach Food: To model the approaching behaviour of slime mould, the following rule is presented to simulate the contraction pattern: . a is listed in Eq. (2). − → vc reduces linearly from 1 to 0. t is the current generation.
− → X b is the individual site where the highest concentration of odour has been found.
− → X indicates the slime mould's location (i.e., a set of SVR parameters).
− → X A and − → X B are 2 individuals randomly elected from the swarm.
− → W indicates the slime mould's weight and is listed in Eq. (3). p is calculated via Eq. (5). C denotes the random value generated by chaotic maps [33]. In this paper, Logistic map is elected as the chaotic map method. The mathematical model can be seen in Eq. (6).
where max_t represents the maximum number of iterations.
where condit is S (i) in the top half of the population. bF (wF) represents the optimal (worst) fitness of the current iterative procedure. SmellIndex is the ranked sequence of fitness data.
where i ∈ 1, 2, · · ·, n. S (i) indicates the fitness of − → X . DF is the optimum acquired in all generations.
where the range of C is [0,1] Wrap Food: Eq. (7) is used to renew the location of slime mould.
where U and L are the upper and lower boundaries of the search scope. rand represents the randomness in [0,1]. Grabble Food: − → vb floats randomly in [−a,a] and approaches 0 with the increase of iterations. The − → vc floats from -1 to 1 and ultimately trends to 0.
The pseudo-code of CSMA is listed in Fig. 2. In this work, CSMA is utilized to update − → X based on − − → X opt according to the loop body of the following pseudo-code. The updated − → X is used as the input parameters of SVR to make a new prediction and obtain the new − − → X opt . The iteration does not stop until the cut-off condition is met. Hence, CSMA's operations are the key of KMCM-CSMA-SVR.

B. K-MEANS CLUSTERING METHOD
Although KMCM was proposed more than 60 years ago, and countless clustering algorithms have been published since then, KMCM is still diffusely applied [34]. Let Y = {y 1 , y 2 , · · ·, y m } be the set of m d-dimensional points to be polymerized into a set of K clusters, Cl = {c 1 , c 2 , · · ·, c K }. KMCM seeks out a partition to minimize the square error (SE) of the empirical mean and the points in the cluster. Let µ i be the average of cluster c i , i = 1, 2, · · ·, K . The SE between µ i and the points in cluster c i is calculated below.
The objective of KMCM is minimization of the sum of the SE of all K clusters, KMCM starts from the original segmentation of K clusters and allocates patterns to clusters in order to decrease SE. Since SE always decreases with an increase in the number of clusters K (with J (Cl) = 0 when K = m), it can only be minimized in a fixed number of clusters. The key steps of KMCM are listed below [34].
Step 1: Elect an original segmentation using K clusters; iterate steps 2 and 3 until cluster membership stabilizes.
Step 2: An updated segmentation is generated by assigning each pattern to its nearest cluster centre.
Step 3: Calculate new cluster centers. KMCM demands 3 customized parameters: number of clusters K , cluster initialization, and distance metric. The cluster initialization can choose a random operation. The distance index can be a Euclidean metric, Cosine metric, etc. The hardest parameter to choose is K . In this paper, a crossvalidation method is utilized to find the appropriate K for different datasets (i.e., K varies with the dataset).

C. -SUPPORT VECTOR REGRESSION
The -SVR prediction method was developed by Vapnik [1]. Consider training samples {(xt 1 , z 1 ) , · · ·, (xt l , z l )}, where xt i ∈ R n is a feature vector and z i ∈ R 1 is the objective. > 0 and the penalty factor PF > 0. -SVR is described as The dual form is exhibited below. where LIBSVM is utilized to solve the problem and offer α * − α. The choice of KT (·) has a remarkable influence on production and operational efficiency. KT (·) is the symmetric function satisfying the Mercer condition. Some conventional functions are displayed below.
1) Linear kernel: 2) Polynomial kernel: 3) RBF kernel: 4) Sigmoid kernel: The third and fourth functions are very efficient, especially in the prediction of high-dimensional samples [4]. Hence, the functions are selected in this work. coef is zero. The adjustable parameters are KT , PF, γ and , which are the optimization variables of this work.
The SVR model has two major limitations. First, it is inefficient in processing high-dimensional samples. Second, it is difficult to find the appropriate setting parameters. To overcome these limitations, KMCM is utilized to improve the ability to process high-dimensional data, and CSMA is applied to select reasonable parameter settings.

IV. CASE STUDY
The study conditions include (a) a computer with an Intel Core i5-6600, 3.60 GHz, 8 GB RAM, and Windows 7, (b) Matlab R2017a, (c) LIBSVM development kit, and (d) the datasets, which were acquired from the University of California at Irvine machine learning repository [35]. The information of eight datasets is displayed in Table 1. The number of instances and properties of datasets are different, which are used to test the performance with respect to complexity problems. The minimum number of properties is 6, and the maximum is 128. The minimum number of instances is 205, and the maximum is 72,000. The datasets D1, D4, D6 and D7 are selected to test the ability of each method to handle the relatively small sample size. D2, D3, D5 and D8 are chosen because they include a large number of cases.
The key parameter setting experiments for KMCM and CSMA are first performed. The second test is utilized to verify the forecast accuracy and stability. The computation complexity of each algorithm is calculated and compared in the third experiment. The comparison with highly similar approaches is provided in the last experiment.  Table 2.   According to the time cost data, all datasets exhibit a noisy trend as a whole, with no obvious upward or downward trend. The normalization operation and Eq. (16) were carried out for selecting appropriate K , MaxIter, nsa and z and taking into account two major metrics: MSE and running time. The smaller paSe, the better. The prediction accuracy is examined more than running time in this paper. β is set to 0.8, while δ is 0.2. The paSe outcomes are displayed in Table 3, where ItemOptimum indicates the appropriate values of these parameters for datasets. From the table, D1-D8 have different ItemOptimum. This makes the prediction result superior.
156856 VOLUME 8, 2020   Tables 2  and 3 (ItemOptimum). The ten-fold cross-validation training/testing technology was adopted. The ratio of training data to test data was 9:1. The optimal MSE was achieved using the method of Section III supported by the datasets. Each algorithm repeated the process 10 times. The average (Avg) and standard deviation (Std) of forecast accuracies are listed in Table 4. The box-plot charts are shown in Fig. 7. The results revealed that KMCM-CSMA-SVR produced the optimal (joint best) performance on datasets D2, D3, D4, D5, D7 and D8. KMCM-CSMA-SVR's prediction accuracy was best compared with the other compound algorithms. In addition, KMCM-CSMA-SVR produced the most stable output on D2, D3 and D5 based on the Std data of Table 4 and Fig. 8. Furthermore, Wilcoxon's test was applied to verify the statistical significance with respect to the differences of given forecast results and gain P values with a 5% significance level. The P data of KMCM-CSMA-SVR vs. other models are displayed in Table 5. The P data greater than 0.05 indicate that KMCM-CSMA-SVR has no statistically significant differences compared with other algorithms. The data are underlined in Table 5. From the table, there was no significant difference between the proposed algorithm and HGSO-SVR on 5 datasets (D1, D2, D3, D5 and D6). Compared with SSA-SVR, the number of datasets with no significant difference was 4 (D2-D5). Compared with DA-SVR, there were no significant differences on datasets D1-D3 and D6. Compared with ALO-SVR, the datasets without significant differences were D1, D2 and D4. In particular, compared with PSO-SVR, FA-SVR and KMCM-SVR, this value was 0, which shows that there were significant differences between KMCM-CSMA-SVR and these algorithms. The mentioned outcomes confirmed the ability of CSMA to optimize -SVR's parameters regarding forecast accuracy and stability.

C. EXPERIMENT III: COMPUTATION COMPLEXITY EXPERIMENT
The computation complexities of integrated algorithms vary with the instance and attribute number, so calculating them clearly is a challenging job [4]. This experiment set the same   conditions for these models and achieved the homologous time cost of D1-D8 for quantifying the complexity. The experiment criteria were identical to those of Experiment II. The cut-off condition was when the forecast accuracy reached the mean of MSE for all models in each dataset of Table 4. The time cost was reported. This process was repeated ten times, and the mean was computed. The shorter the time cost, the lower the computation complexity. The outcomes in seconds are shown in Table 6. PSO-SVR obtained the optimal complexity on dataset D4. KMCM-SVR achieved the minimum on D1-D3 and D5-D8. Eq. (17) is utilized to quantify the general evaluation (CCRank) of the computation Z. Chen, W. Liu: Efficient Parameter Adaptive SVR Using K-Means Clustering and CSMA  complexity.
where CCrate is the normalized value of time cost. CCold is the time cost. CCMax (CCMin) denotes the maximum (minimum) time cost for each dataset. The outcome is displayed in Table 9 (CCRank). From the table, KMCM-CSMA-SVR won second place with a score of 0.068 (lower is better). KMCM-SVR had an optimum at 0.000. Other models performed less effectively without KMCM. The abovementioned results confirmed the substantial role of KMCM in reducing computation complexity.  Table 2 and 3. The ten-fold cross-validation VOLUME 8, 2020   training/testing technology was also adopted to access the forecast accuracy and computation complexity. The test process was similar to those of Experiment II and III. The forecast accuracy results are revealed in Table 7. The computation complexity results are displayed in Table 8. From the tables, the time consumption was greatly reduced for all algorithms. However, compared with the original algorithm, the prediction accuracy exhibited different degrees of attenuation, especially for KMCM-HGSO-SVR, KMCM-PSO-SVR and KMCM-FA-SVR. It was likely to fall into local optima. Considering the prediction accuracy, the proposed algorithm was superior to these approaches on datasets D1 and D3-D8. In terms of computational complexity, the proposed algorithm and these methods offered their own merits. More comparison is shown in the discussion section.

E. DISCUSSION
As a summary for all results of Experiment I-IV, Table 9 denotes the statistical data, which are the numbers of datasets each algorithm defeated/tied/lost to with respect to forecast accuracy, associated P values, etc. It discloses that KMCM-CSMA-SVR is superior to others on 6 datasets in MSE. For fully evaluating the algorithms, Eq. (18) was applied to verify extensive performance (Rank) regarding the forecast accuracy, stability and computation complexity. ARank is the normalized win rate of MSE. StdRank is the normalized stability ratio on the Std Data. The lower the Rank, the better the overall performance. The outcomes are shown in Table 9.
where  From the table, KMCM helped SSA-SVR, DA-SVR and ALO-SVR to improve the total score. However, it reduced the scores of HGSO-SVR, PSO-SVR and FA-SVR. It can be found that KMCM is not beneficial to all algorithms. KMCM-CSMA-SVR obtained the optimum of 0.024. This indicates that the proposed algorithm offers the optimal comprehensive performance. Due to CSMA, KMCM-CSMA-SVR has the optimum prediction accuracy on 6 datasets. It has the optimal stability on datasets D2, D3 and D5, and low computation complexity on almost all datasets. This results from the KMCM. It is proven that KMCM-CSMA-SVR exhibits high effectiveness and efficiency and is superior in handling high-dimensional data. Considering prediction accuracy, convergence stability and computation complexity, KMCM-CSMA-SVR shows very strong competitiveness. Thus, KMCM and CSMA are worthy of application in the field of SVR parameter optimization.

V. CONCLUSION AND FUTURE DIRECTIONS
The study proposes a new hybrid algorithm for optimizing SVR parameters using KMCM and CSMA. KMCM is applied to speed up the whole algorithm. CSMA is utilized to improve prediction accuracy. The proposed KMCM-CSMA-SVR achieves the adaptive optimization of SVR parameters. The experiments of key parameter settings, forecast accuracy and computation complexity were performed to compare HGSO-SVR, SSA-SVR, DA-SVR, ALO-SVR, PSO-SVR, FA-SVR and KMCM-SVR. The comparison experiment with KMCM-HGSO-SVR, KMCM-SSA-SVR, KMCM-DA-SVR, KMCM-ALO-SVR, KMCM-PSO-SVR and KMCM-FA-SVR was also implemented. The results on the benchmark datasets revealed KMCM-CSMA-SVR's best prediction performance compared with other famous algorithms. It can also effectively reduce the running cost of this kind of algorithm and allow it to no longer rely on high-performance computers. This work validated the tremendous potential of the KMCM-CSMA-SVR in the field of prediction.
For future study, more big data experiments and real-world engineering optimization problems should be added to test the proposed approach. Moreover, key parameters of KMCM and CSMA exert important effects on clustering and prediction performance. Their adaptive adjustment is a field worth studying. The structure of the whole approach is complicated and can be appropriately simplified in the future.