Predicting Entrepreneurial Intention of Students: An Extreme Learning Machine With Gaussian Barebone Harris Hawks Optimizer

This study aims to propose an effective intelligent model for predicting entrepreneurial intention, which can provide a reasonable reference for the formulation of talent training programs and the guidance of entrepreneurial intention of students. The prediction model is mainly based on the kernel extreme learning machine (KELM) optimized by the improved Harris hawk’s optimizer (HHO). In order to obtain better parameters and feature subsets, the Gaussian barebone (GB) strategy is introduced to improve the HHO algorithm, so as to strengthen the optimization ability for tuning parameters of KELM and identifying the compact feature subsets. Then, an optimal KELM model (GBHHO-KELM) is established according to the obtained optimal parameters and feature subsets to predict the entrepreneurial intention of students. In the experiment, GBHHO is compared with the other nine well-known methods in 30 CEC 2014 benchmark problems. The experimental findings suggest that the proposed GBHHO method is significantly superior to the existing methods in most problems. At the same time, GBHHO-KELM is compared with other machine learning methods in the prediction of entrepreneurial intention. The experimental results indicate that the proposed GBHHO-KELM can achieve better classification performance and higher stability in accordance with the four metrics. Therefore, we can conclude that the GBHHO-KELM model is expected to be an effective tool for the prediction of entrepreneurial intention.


I. INTRODUCTION
Accompanied the accelerating development and constant large-scale enrollment expansion of higher education in China, with the growth of the graduate's number, the college students' pressure on employment is raising. For alleviating job-hunting stress and motivating entrepreneurial potential, the government enacts a series of policies for pioneering The associate editor coordinating the review of this manuscript and approving it for publication was Zhan Bu . startups, which drives mass practitioners and labour employment to spring up, including college students whose enthusiasm for entrepreneurship is worked up. Whereas, it is indicated that the success rate of college students running a business is less than 5% according to statistics. Indeed, college students are to be on the fence about employment or self-employment after graduation, with multiple factors influencing college students' career-choosing. It is a worthy subject to dig in deep how colleges and universities cultivate students' entrepreneurial ability more scientifically VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ through those factors involved, in order to help them target the orientation of careers more specifically and rationally. Moreover, those high-level qualified personnel in the higher education training process as representative objects reflect the mass of characteristics datasets. After the further mining and processing of these data, an intelligent predictive model for training entrepreneurial talents could come into being, by which to clarify the interrelationship among data, and to figure out pivotal factors that could develop entrepreneurial talents, contributing to lead fresh graduates to recognize their own characteristics so as to make a more conscious and dedicated career decision.
Recently the use of data mining technology in higher education teaching has got more attention. Moreover, the intelligent prediction applications of students' career planning guidance and employment selection are increasing gradually. Bharambe et al. [1] carried out the systematic analysis to assess students' employability through data mining technology, which was explored further by Chen and Ma [2] in the assessment of graduates' comprehensive ability, and Guofen and Fukang [3] combined these two aspects into their employment comprehensive abilities to be evaluated by inputting each student's road factor score, to find their appropriate careers scientifically. A serious consideration of variance analysis proposed by Domoto et al. [4] involved a series of influencing factors in each growth phase from the senior high school to a company about how to affect a student to obtain a job at a highly profitable firm. Eid et al. [5] targeted acquiring questionnaires figures from senior students to be entrepreneurs and revealing the relevance and robustness of their purpose to run a business a structural equation model. To classify employment data correlation more quickly, model strategies like the DT model, Information Gain with Weight, introduced by Li et al. [6] and Liu et al. [7] respectively, for tracing what determined the increase of students' employment ratio, acquired further useful values by using the genetic algorithm. The valid algorithms to predict the employability after various classification techniques of data mining also were approved by Mishra et al. [8], one of which was adopted by Nie et al. [9] on the solid basis of students' behavior on and off-campus. And another practical application Machine Learning Algorithms was employed by Rahman et al. [10] to make forecasts and discover the attributes from the students' datasets so as to lead graduates to be qualified to match recruitment requirements almost appeared. Moreover, Tian [11] presented an employment-rate estimation model as the decision tree to estimate the future employment rate among students, and constructed the classification equation of students' employment, which provided an effective fundament for adjusting the employment policies and measures for College students. Further, an adaptive vector machine (SVM) framework was put forward by Tu et al. [12], named RF-CSCA-SVM, for conjecturing college students' intention on self-employment. Although those given approaches can promote more efficient orientations at the forefront in the excellent predictive performance of career planning, they cannot be applied in prediction for career consultation and guidance thoroughly.
In this study, a Gaussian barebone (GB) enhanced Harris hawk's optimizer (HHO) is proposed to train an optimal kernel extreme learning machine (KELM) model, and the resultant GBHHO-KELM is proven for the first time to predict the students' entrepreneurial intention. The active model is constructed based on the information about students' study, school performance, and family economic background. In the developed method (GBHHO-KELM), the GB mechanism was embedded into the HHO to further balanced the exploration and exploitation capability. The improved HHO was employed to train a robust and optimal KELM model to predict the pre-graduate students' entrepreneurial propensity. In the experiment, GBHHO is compared with the other nine renowned methods in 30 CEC 2014 benchmark problems. It is shown that the proposed GBHHO method is significantly superior to the existing methods in most problems. Meanwhile, GBHHO-KELM is compared with other machine learning methods in the prediction of entrepreneurial intention. The results from experiment tests suggest that GBHHO-KELM mentioned above can achieve better classification performance and higher stability through the four metrics.
The main contribution of this study can be listed as follows: a) A Gaussian barebone enhanced HHO (GBHHO), is proposed for function optimization and KELM training. b) The proposed GBHHO successfully tackled the parameter tuning for KELM in an effective manner. c) The established GBHHO-KELM model is applied to predict entrepreneurial intention. The structure of the paper is as bellow. Section 2 gives a brief description of the KELM and HHO. The proposed GBHHO is detailed in Section 3. Section 4 presents the GBHHO-KELM model. The information about data and experimental settings are defined in Section 5. Section 6 analyzes the simulation results of GBHHO on benchmark functions and GBHHO-KELM on the real-life dataset. The discussions are presented in Section 7. Section 8 delivers conclusions and recommendations.

A. KELM
Like many other machine learning methods [13]- [17], extreme learning machine (ELM) [18] is a relatively accessible method for the single hidden layer feedforward neural network. The traditional feedforward neural network has its apparent weaknesses of slow response speed and inefficiency to the local optimum. In addition, the network has a significant dependence on parameters and super parameters, and diverse parameters have a significant influence on the final consequences. The ELM algorithm can randomly generate the connection weight between the input layer and the hidden layer and the threshold value of hidden layer neurons. In the process of training, there is no need to adjust, only the number of hidden layer neurons needs to be set, and then the unique optimal solution can be obtained. After that, compared with the previous traditional training techniques, the ELM method can significantly improve both learning speed and performance. After its introduction, it has found its application in many fields [19]- [21].
KELM [22] is constructed based on the above ELM by adding a radial basis function (RBF) kernel functions. RBF kernel is selected because it can plot the sample to higherdimensional feature space and can manage the sample when the connection amongst class labels and features is non-linear. Meanwhile, the RBF kernel also has the advantage of fewer parameters. When RBF kernel is used, only two parameters C and γ (C represents the penalty factor and γ represents the kernel parameter) should be considered, but the selection of parameters has a great impact on the concluding result. There is no specific prior knowledge in the selection of parameters, so some type of model selection (parameter search) must be made.

B. HHO
Heidari et al. proposed the standard HHO in a recently published paper [23]. HHO is a population-based prospective algorithm optimized. The primary source of inspiration for HHO is Harris's social cooperative behavior and hunting style in natural settings. This form of hunting is often called a surprise attack. This smart strategy involves several eagles chasing prey in multiple directions to make a surprise attack. Facing different hunting environments and different prey escape modes, Harris eagle can dynamically switch the best hunting mode.
In HHO, the position of the prey represents the current optimal solution, while other solutions in the population represent the position of the Harris eagle. In the HHO, Harris hawks randomly search hunting areas or stay in a few locations to find prey through two strategies. The first strategy describes the Harris hawk's movements according to the positions of other members of the population and prey. Another strategy describes a Harris hawk to perch on a random tree. The selection of these two strategies is controlled by random number q. By combining these two strategies, the following update function is obtained: where X t+1 , X t , X m and X p represent the update position in the next iteration, the current position of Harris hawks, the average position of Harris hawks, and the position of prey, respectively. r 1 , r 2 , r 3 , r 4 , and q on behalf of the random numbers between (0, 1). lb and ub on behalf of lower and upper bounds of each dimension. The heavy use of random numbers in update functions is beneficial for exploring different search areas while also helping to preserve population diversity. It successfully simulates the group hunting behavior of Harris eagles. The following function obtains the average position of the population: where X i represents the position of i-th Harris hawk in the current iteration, and N represents the number of Harris hawks in the population.
In the hunting stage, HHO selects different hunting strategies according to the index of prey's escape energy. As the number of iterations increases, the prey's escape energy decreases. The update function is as follows: where E and T represent the escaping energy of the prey, and the maximum iterations, respectively. In HHO, E 0 represents the initial escaping energy of the prey, which is randomly selected in (0, 1) and updated each time.
During the exploration phase, the HHO set up four different strategies to simulate this attack scenario based on different escape modes of prey and different hunting strategies of Harris eagles. HHO adopts a variable r to judge whether a prey escaped before the Harris hawks arrive. If r < 0.5 means it escapes successfully, and r ≥ 0.5 means the prey is caught. The Harris hawks use the variable E to determine whether to perform a soft besiege or a hard besiege. When |E| ≥ 0.5, Harris hawks adopt soft besiege. When |E| < 0.5, Harris hawks adopt the hard besiege.

1) MODE 1: SOFT BESIEGE
When the Harris hawk in a soft besiege mode, the prey will have more energy, which softly surrounds the prey and launch the attack. The description of the soft besiege is as follows: where X t on behalf of the difference between the position of the Harris hawk and the position of the prey. J represents the arbitrary escape power of the prey. R 5 is the number chosen at random between (0, 1).

2) MODE 2: HARD BESIEGE
When the Harris hawk in a hard besieges mode, the prey will lack the necessary energy to escape capture. They rarely fly around the prey and make a final foray into catching it. The description of the hard besiege is as follows:

3) MODE 3: SOFT BESIEGE WITH PROGRESSIVE RAPID DIVES
When the Harris hawk in a soft besieges with progressive rapid dives mode, the prey is hypnotized. HHO will be based on a levy flight to simulate the prey escape route. The Harris VOLUME 8, 2020 hawk will continue to swoop rapidly over the prey's escape route. Levy flight is the best simulation of this process. In nature, many animals share the movement patterns of levy flight. In this situation, the Harris hawks evaluate their next move followed formulas below: Harris hawk will compare the results of the hunt to previous dive chases. If current hunting flight paths are not better at helping catch prey, the Harris eagle will take bolder irregular flights and sudden dives. HHO simulates these dives in the way of levy flight: where Dim, S and Levy on behalf of the dimension, random vector, and levy flight, respectively. Levy flight functions are described as follows: where u, v are both of the random numbers between (0, 1), and β represents a given constant. Therefore, the ultimate effect of soft besiege with progressive rapid dives can be described as follows: Select Y and Z for better fitness values during the update process. Every individual implements this strategy in the population.

4) MODE 4: HARD BESIEGE WITH PROGRESSIVE RAPID DIVES
When the Harris hawk in a hard besieges with progressive rapid dives mode, the escape behavior of the prey will fail. The behavior of the prey, in this case, is similar to its behavior in the case of a soft besiege. In this case, the Harris hawks will directly reduce the distance between them and the prey. Therefore, the ultimate effect of hard besiege with progressive rapid dives can be described as follows:
Recently, HHO finds its applications in several types of problems [54] including micro-channel heat sink design [55], optimization of related components of harmonics polluted distribution systems [56], satellite image de-noising [57], manufacturing optimization problems [58], structural design optimization of vehicle components [59], drug design and discovery [60], and soil compression coefficient prediction [61]. Still, many real-life cases not modeled as an optimization case, whereas there are some aspects that we can solve those cases using new optimizers such as HHO [62]- [65]. Bao et al. [66] described a hybrid algorithm, with the HHO and differential evolutionary (DE) algorithms involved, and used it in the color image multi-threshold segmentation field. The proposed algorithm divided the population into two parts: one part ran HHO on the individuals, and the other part ran the DE algorithm. Jia et al. [67] proposed an improved dynamic HHO for satellite image segmentation. The proposed algorithm added a dynamic parameter control strategy and a mutation strategy. Jia et al. experimented with the method using a series of different satellite images. Golilarz et al. [57] had put forth a new binary version of HHO and improved it for feature selection. In the study, the authors tested the improved binary HHO on 22 datasets and compared them with binary versions of other meta-heuristic algorithms. Chen et al. [68] proposed an improved HHO with enhanced diversity and used it in the photovoltaic cell optimization field. In the proposed algorithm, they applied a chaos drift strategy around the optimal individual and added an oppositionbased exploratory mechanism. The method proposed in this study was tested on three real-world manufacturing datasets and was compared to other well-known methods. Ewees and Elaziz [69] proposed a novel and improved algorithm that introduced HHO and chaos mechanism into Multi-Verse Optimizer. In this improved algorithm, HHO was used in the local search process. This paper has outlined the Gaussian bare-bone strategy into the classic HHO to further improve its convergence speed and solution quality. The goal of the GB strategy is to allow individuals in the population to choose the fittest directions, which is helpful to avoid them falling too early into the local minimum. In HHO, the bare-bone manipulates its decision selection through the probability coefficient CT. If the probability is less than CT, Gaussian distribution is GB strategy provides excellent access to raise convergence rate and seek an optimal solution, namely, characterized by quick calculation and high precision, which is accorded with project need for finding the best solution in the HHO search process.
The Gaussian bare-bone is described as follows: N (µ, σ ) is a Gaussian random function, rand() is a random number between [0, 1]. Indicators i1 and i2 are different random variables, which are from 1 to n. They are all different from the parameter i. It is worth noting that GBHHO updates based on the GB strategy to enrich the diversity of the population. This enriches the exploration of the optimizer. The pseudo-code for the whole process is given below.
The time complexity of GBHHO is mainly related to the population size (n), the dimension (d), the number of algorithm iterations (t), and the GB strategy. To sum up, calculate time complexity can be calculated in three parts. Considering the group of n agents, several factors impacting the computational complexity are analyzed below: With N individuals, the complexity of the initialization is O(n). The related complexity for the individual updating mechanism is O(t × n) + O(t × n × d). Furthermore, the computational consumption of the GB mechanism is O(t ×n×d). Therefore, Finally, the time complexity of the GBHHO is calculated as O(n).

IV. PROPOSED GBHHO-KELM METHOD
In this section, we give details of the mechanisms of the designed HHO-based framework. For this aim, first, we show the flowchart of the GBHHO-KELM in Figure 1. The core of this model is based on the KELM technique. In this model, in the input space, the comprehensive files are diagrammed to a different hidden layer space using RBF kernel. This space includes two factors we search for them and n features. These unknown factors (C and γ ) are mainly optimized by proposed GBHHO to explore and exploit the best parameters and subset of features. To deal with such a binary space (a feature is selected or not), we converted the GBHHO to be a binary optimizer. We utilized the sigmoid function for such a conversion. Consider Z is the result; if Z >0.5, we choose the feature; otherwise, the feature is rejected. The target function is to maximize the accuracy rate of the model. Then, KELM will make the decision using the attained parameters and features. To show all the process of GBHHO-KELM, we list the required steps as follows:

Algorithm 1 Pseudo Code of GBHHO Begin
Initialize solutions (X i , i= 1, 2, . . . ,N ) and maximum number of iterations T ; Initialize swarm at first generation; Assess the initial agents by calculating the function values for each member of swarm; Update the finest solution gotten to this point; While (Ending criterion do not reach (t < T )) Evaluate agents by the objective function; Pick the best individual as the rabbit (X p ); For each X i Update the initial escaping energy E 0 , escaping energy Else If According to Eq. (17), a new candidate solution was obtained by the best position saved in the current situation;

End if
The current individual is replaced if have a better solution; End While Return the best solution obtained so far (X p ); Output: The position of prey and fitness value. End Step 1: Insert the datasets of students.
Step 4: Initialize the swarm size, limits of iterations, and related parameters. The first dimensions of each agent are set as C and γ , and each feature has n dimensions with value 0 or 1.
Step 5: Set the limits of the feature space (boundaries) and the accuracy function (objective). Step 6: Set the optimal value of the accuracy function (AF) after comparing the output rate of AF and the optimal AF.
Step 7: Update the location vectors of agents and keep the best location.
Step 8: If the termination is not reached, go to step 5; otherwise, the optimal solutions (values of C and γ and feature subset) are printed.
Step 9: As soon as the best values of C and γ attained, and we find the best feature subset, the KELM process is reinstructed via the training set.
Step 10: Utilize the best GBHHO-KELM model for the test set for predicting the unknown samples.

V. EXPERIMENTAL DESIGNS A. DATA COLLECTION
The data in this study mainly comes from graduates of Wenzhou Vocational College of Science and Technology from 2011 to 2018. Among these graduates, 842 students were selected as research subjects, of which 421 students were self-employed, and 421 were fully employed regarded as the employment samples and the entrepreneurial samples respectively, as each group sample randomly selected from those five-year graduate experiment samples. The analysis through 10 characteristics like students' gender, political affiliation (PA), major, place of students' source (PSS), family financial situation (FFS), practical experience in innovation and entrepreneurship on campus (PEIEC), training course of innovation and entrepreneurship (TCIE), grade point average (GPA), scholarship awards (SA)and proactive personality (PP), intends to investigate the potential importance and the internal correlation of these ten attributes so as to build a critical entity data model for decision-making. Table 1 shows a detailed description of 10 characteristic attributes.

B. EXPERIMENTAL SETUP
This research has been conducted under a Windows Server 2008 R2 licensed system with Intel(R) Core (TM) i7-6700 CPU (3.40 GHz) and 16GB of RAM. To have a comprehensive study, we utilized MATLAB R2018 software. The data are scaled into [-1, 1] because we need to normalize the info before processing the classification. The tests are based on k-fold cross-validation (CV), which k is set to 10 [32], [51]. Therefore, we have ten parts, in which nine parts employed for training data and the last one utilized for the test phase. With fair numerical tests and mathematical proofs of new models, we can investigate the suitability of any mathematical method [70]- [73]. To implement the mainframes, we used standard codes of KELM, SVM, and random forest (RF) shared in related websites. 1 Also, to set the parameters, we used the trial and error method. The search variety of penalty factor and kernel width in SVM and KELM are set to [-10, 10]. For RF, we set the number of trees to 500, and the number of variables to 3, respectively.

VI. EXPERIMENTAL RESULT A. BENCHMARK FUNCTION VALIDATION
In order to test the performance of the proposed GBHHO, some representative algorithms were selected as comparison algorithms. These algorithms include the classic original algorithms MFO, FA [74], GWO [46], HHO, WOA and some improved algorithms ACWOA [75], OBLGWO [47], OBSCA [76], SCADE [77]. The detailed parameter values are reported in Table 2. The CEC2014 [47] benchmark functions were selected as test functions for the experiment. We kept the required conditions for a fair comparison. In the experiment, the number of particles is set to 30, the dimension is set to 30, and the maximum number of evaluations is set to 300,000. Each algorithm is independently executed 30 times.
As shown in Tables 3-4, the experimental data of GBHHO and other algorithms on the CEC2014 benchmark function are tabulated. The table shows the average fitness values and standard deviations of the algorithm's 30 independent executions on the benchmark function. We boldly represent the optimal fitness value obtained from each function experiment. As data obtained through table review, the algorithm GBHHO in this paper has obtained the best adaptive value on most other benchmark functions. The experiment uses the Friedman test [78] to evaluate the performance of the algorithm. The ARV value of GBHHO obtained by the Friedman test is 1.5, which is superior to all other comparison algorithms. Table 5 lists the results of the Wilcoxon signedrank test [79]. Only when the p-value obtained by the test is less than 0.05, the performance of the GHBBO algorithm is considered to be significantly better than other algorithms. As Table 5 illustrates, we boldly represent the data with a pvalue higher than 0.05. The symbols ''+ / = /-'' indicate the number of GBHHO that are significantly better than, equal to, and worse than other algorithms on 30 benchmark functions. This is because this paper introduces a Gaussian bare-bones method into the HHO algorithm to improve the algorithm's ability to search the optimal solution. This method can enhance the local search ability of the algorithm and enhance the ability of the algorithm to jump out of the local optima. Figure 2 shows the convergence trend of the algorithm on 9 functions. In this paper, the improved values of the GBHHO algorithm on 9 functions are better than other comparison algorithms. On the functions F2 and F5, the adaptive value searched by the GBHHO algorithm in the early iteration is not the best, but the adaptive value searched in the later iteration  is the best. From the figures presented, the convergence trend of the GBHHO in the early iteration period is better than other comparison algorithms. Therefore, the GBHHO can obtain the optimal adaptive value with fewer evaluation times. It can be figured out from the convergence trend that the convergence trend of the GBHHO in the previous iteration is far superior to the HHO, and the optimal fitness value is searched with less iteration. It is proved that the introduction of the GB strategy strengthens the ability of HHO to search the optimal solution in an effective way.

B. PREDICTION RESULTS OF ENTREPRENEURIAL INTENTION
In this experiment, we evaluated the effectiveness of the GBHHO-KELM with the FS model versus other peers. The detailed results are shown in Table 6. From figures obtained, it can be seen that the obtained ACC is 90.02%, MCC is 0.8007, sensitivity is 92.08%, specificity is 88.06% by GBHHO-KELM with FS, and the standard deviation (STD) are 0.0259, 0.0544, 0.03, and 0.0467, respectively. In addition, in the experiments, we can observe that the GBHHO approach can automatically obtain the optimal parameters and feature subset, which shows that the introduction of the GB mechanism brings the HHO algorithm a stronger search ability and accuracy.
For the purpose of verifying the effectiveness of this method, we proposed a comparative study with four other effective machine learning models, including GBHHO-KELM without FS, HHO-KELM, RF, and SVM. A comparison of the five methods is shown in Figure 3 The results shown that the GBHHO-KELM with the FS model is better than the GBHHO-KELM without the FS and HHO-KELM models in terms of four evaluation indicators, and its STD is also smaller than the GBHHO-KELM without FS and   HHO-KELM models. This means that the introduction of the GB strategy makes the GBHHO-KELM with the FS model have better performance and stability. On the ACC evaluation indicator, the GBHHO-KELM with the FS model has achieved the best performance, which is 1.54% higher than the second-ranked GBHHO-KELM without FS. Followed by SVM and RF, HHO-KELM has the worst results, 3.69% lower than GBHHO-KELM with FS. The maximum STD of HHO-KELM is 0.0523, which indicates that the HHO-KELM model is unstable in solving this problem, but the effect of the improved GBHHO-KELM model is greatly improved. On the MCC evaluation indicator, the GBHHO-KELM with the FS model still achieved the best results, followed by GBHHO-KELM without FS. GBHHO-KELM without FS is 3.09% lower than GBHHO-KELM with FS, followed by SVM and RF, HHO-KELM had the worst performance, 7.46% lower than GBHHO-KELM with FS, and the maximum STD of HHO-KELM is 0.1039. In terms of the sensitivity evaluation indicator, the GBHHO-KELM with the FS model has the best performance, followed by the GBHHO-KELM without the FS model, which differs only 1.7%, followed by RF and SVM. The HHO-KELM model has the worst results, but regarding the STD, the largest is RF, reaching 0.0675. In terms of specificity indicator, the GBHHO-KELM with the FS model ranked first, followed by GBHHO-KELM without FS, SVM, HHO-KELM, and RF. Among which HHO-KELM and RF differ by only 0.02%, GBHHO-KELM without FS is 1.31% worse than GBHHO-KELM with FS; the worst is RF, only 84.92%, the STD of SVM is the largest, reaching 0.0657.
In order to describe the convergence of the proposed GBHHO-KELM with the FS algorithm, we also recorded the tendency of the involved models' training accuracy changed with the number of iterations. It is found in Figure 4 that after several iterations, the GBHHO-KELM with the FS model can quickly and continuously jump out of the local optimum to reach the best accuracy, indicating that the GBHHO-KELM with FS method has strong local search capabilities and global search capabilities. The main reason is that the embedded GB mechanism enables the algorithm to enhance the local search ability and the global search ability so that the hybrid algorithm can obtain a better balance between VOLUME 8, 2020  global (exploration) and local (exploitation). By observing the curve in Figure 4, it is found that the GBHHO-KELM without the FS model requires more iteration to converge, and the accuracy is not as high as that of GBHHO-KELM with FS. The accuracy of HHO-KELM is the lowest of all algorithms, much smaller than the GBHHO-KELM with the FS model, and the accuracy does not increase significantly as the number of iterations increases, and it is easy to fall into a local optimum.
In this process, the proposed GBHHO not only realizes the best setting of KELM but also selects the best set of features. We utilized the 10-fold CV technique. Figure 5 explains the frequency of main features determined by GBHHO-KELM through the 10-fold CV outline.
As shown, major (F3), practical experience in innovation and entrepreneurship on campus (F6), proactive personality (F10), training course of innovation and entrepreneurship (F7), family financial situation (F5) and place of students' source (F4) are the five features with the highest frequency, and they appear 10, 10, 10, 9, 8 and 7 times respectively. Therefore, we can accomplish that these features may play a significant role in predicting the students' entrepreneurial goals.

VII. DISCUSSIONS
From the experimental results of this research, the cultivation of entrepreneurial talents in colleges and universities is affected by many factors bounded, which indicates corresponding patterns in specific aspects, and it is positively correlated with most factors involved. The factors, such as major (F3), practical experience in innovation and entrepreneurship on campus (F6), proactive personality (F10), training course of innovation and entrepreneurship (F7), family financial situation (F5) and place of students' source (F4) have a great influence on students' intention to start their own business. Obviously, the major chosen by students has a great significant and positive impact on their entrepreneurial behavior. The data show that students majoring in economic management think much more about entrepreneurship and have a stronger willingness and motivation to run their businesses.
About entrepreneurial education factors, such as practical experience in innovation and training course of innovation and entrepreneurship during school days, have a prominent role in promoting the students' preference for starting a business. On campus, the students who have participated in innovation and entrepreneurship practice obviously have stronger entrepreneurial motivation than those who have no such practice experience, and the former is far more likely to succeed in business. There exists a clear subconscious tendency on selecting the courses that students with strong entrepreneurial motives would lay emphasis on elective courses related to innovation and entrepreneurship and economic management. Additionally, it is found that the objects of entrepreneurial talent training have the feature of ''endogenous growth'', meaning that their entrepreneurial intention, cognition, and behavior of students are deeply influenced by proactive personality. There is also a significant regulating performance between the proactive personality of growth capacity in the new environment and students' entrepreneurial intention, appeared as that the stronger proactive personality the one has, the stronger motivation of entrepreneurial behavior the one arouses, and stronger the entrepreneurial practice ability the one gets.
The family's financial situation and place of students' source have a remarkable impact on students' entrepreneurial preferences. Students who have not been identified as the ones with family financial difficulties are more willing to start a business in school. Due to the living environment, the students from urban areas, especially from families with business backgrounds, have stronger motivation to start up the business after graduation. This research model can be used as an auxiliary tool for entrepreneurial talent training, to provide colleges with more reasonable guiding principles for the development of talent training programs and the guidance of students' entrepreneurial intention.
There are some limitations to this paper. First, the research samples are limited, and then more data samples should be collected to provide the prediction performance of the system with great precision. Secondly, the sample attributes of the study are not complete, so the factors that affect the choice of students' entrepreneurial intention continue to be explored. Besides, the figures over a span of five consecutive years have been processed. However, with the research samples coming from only a college, the applicability of the model needs to be further confirmed, and the reliability of the decision support for the entrepreneurship talents training in colleges and universities is favored to be further verified.

VIII. CONCLUSION AND FUTURE WORKS
This study established an effective hybrid GBHHO-KELM model to predict students' intentions on self-employment. The main innovation of this method is that the introduction of the GB mechanism into the HHO algorithm to balance its global and local search capabilities further. Compared with several competitive optimization algorithms, it is found that the proposed GBHHO can achieve smaller fitness and variance on 30 CEC2014 benchmark problems. At the same time, using GBHHO to optimize KELM can obtain better parameter combinations and feature subset than other algorithms. Compared with other machine learning algorithms, it also has higher prediction accuracy and more stable performance in the prediction of students' entrepreneurial intention.
In the future, we will try to use the GBHHO-KELM model to predict other problems, such as disease diagnosis and financial risk prediction. In addition, the GBHHO algorithm will also be extended to solve different application scenarios, such as solar cell optimization and engineering optimization problems.