Prediction Optimization of Cervical Hyperextension Injury: Kernel Extreme Learning Machines With Orthogonal Learning Butterfly Optimizer and Broyden- Fletcher-Goldfarb-Shanno Algorithms

In this research, X-ray and MRI images of patients suffering from cervical hyperextension injury are investigated. Also, radiographic images are collected from patients who suffered from trauma but without cervical hyperextension injury. The core engine algorithm of the optimized prediction model is kernel extreme learning machine (KELM), and the input data is 17 factors that may cause cervical hyperextension injury. As the optimization core, we utilized the Butterfly optimization algorithm (BOA). Up to now, few improved variants of BOA have been reported. The original BOA converges slowly and quickly falls into a locally optimal solution. An enhanced BOA based on orthogonal learning, Levy flight, and an exploitation engine is proposed in this paper to relieve these two shortcomings, which is called LBOLBOA. Orthogonal learning is utilized to construct guidance vectors for guiding agents toward the global optimum solution aiming to increase the accuracy of the solutions. Also, Levy flight and Broyden-Fletcher-Goldfarb-Shanno mechanisms are utilized to enrich the intensification propensities of BOA and stagnation avoidance. The proposed LBOLBOA is used to deal with continuous function optimization and machine learning problems, including parameter optimization of KELM. We rigorously verified this variant using a comprehensive set of the benchmark test suite and real-world dataset on cervical hyperextension injury. The results indicate that LBOLBOA can achieve improved performance in dealing with the function optimization and machine learning problems, especially the capability for prediction of cervical hyperextension injury.


I. INTRODUCTION
Kernel extreme learning machine (KELM) was developed based on ELM through the introduction of the kernel function, which has attracted much attention in the last ten years. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Because its performance is comparable to SVM and even better than SVM on many other occasions, it has been widely used. The performance of KELM in practical applications is largely affected by its two parameters. So many studies focus on how to find the optimal parameters. The current research can be roughly divided into two categories: grid search [1], [2] and metaheuristic algorithms (MAs) [3]- [7]. Owing to the powerful search and optimization capabilities of MAs, there are many MAs have been proposed to tackle the parameter optimization of KELM.
Xu et al. [3] proposed to utilize a KELM optimized by chaos-based and mutational moth-flame optimizer (MFO) to tackle financial forecasting cases, and it can offer a superior KELM model with an excellent predictive performance by compared with a series of well-known classifiers. Wong et al. [8] proposed to use the KELM for boosting the performance of the biodiesel engine. By comparing with a least-squares support vector machine (LS-SVM) model, ELM can realize comparable performance to LS-SVM, obtaining a reliable prediction result for optimization. Wang et al. [4] suggested the application of a chaotic fruit fly optimizer (FOA) to optimize KELM for the diagnosis of sepsis. It had got more excellent results than other methods across four performance indexes based on a series of experimental results, which demonstrated that it could be a better diagnostic tool for clinical decision support. Tian et al. [9] proposed to use the quantum-based particle swarm optimization (PSO) optimized KELM for activity recognition. By comparing with SVM and ELM, it had more excellent performance. Wei et al. [10] proposed to use the bat algorithm optimized KELM for predicting carbon price, which was more robust than other comparable models for predicting carbon price when analyzing a series of empirical results. Pani and Nayak [11] suggested to apply the chaotic gravitational search algorithm (GSA) based on KELM to analysis and forecast solar irradiance. It achieved more accuracy in prediction within less time and had more performance than the basic KELM. Luo et al. [12] proposed to apply a hybrid GWOMFO technique based on grey wolf optimizer (GWO) and MFO for improving the performance of the KELM for the analysis of somatization disorder. It not only realized higher prediction accuracy but also has excellent robustness compared with other models.
Li et al. [13] proposed to utilize the improved binary GWO wrapped with KELM for disease diagnosis. It significantly outperformed the original GA and GWO compared with them. Kusumo et al. [14] proposed to use an ant colony optimization algorithm (ACO) for optimizing the parameters of KELM for optimization of the transesterification process for Ceiba pentandra oil, which realized the highest Ceiba pentandra methyl ester yield. Hu et al. [15] proposed to use cross-validated PSO for training an optimal KELM for fault diagnosis of wind turbine gearbox. It can accurately identify four states of the wind turbine gearbox by developed a compact wind turbine gearbox test bench. Bisoi et al. [16] proposed to use differential evolution (DE) to train an optimal KELM for prediction of stock price and movement. Compared with Naive-Bayes classifier, artificial neural network, and SVM, its superiority over the other models was proved. Baliarsingh et al. [17] proposed a weighted-chaotic salp swarm algorithm (SSA) for simultaneously optimizing the parameters of KELM and features in the genomic data. The genes yielded higher classification accuracy when it applied to gene selection. Yang et al. [18] proposed to use a selfadapting PSO optimized KELM for forecasting electricity prices. Compared with individual methods and other hybrid methods, it had a more accurate prediction, better generality, and practicability.
Yang et al. [19] proposed to use a PSO optimized KELM for recognition of the Holocellulose and Lignin in Wood. It presented the best recognition performance by comparing it with the other methods. Wang et al. [6] suggested a chaos-based MFO for generating an optimal KELM model for medical diagnosis. It provided significantly more excellent classification performance and also got a smaller feature subset compared with other alternative approaches. Wang et al. [20] enhanced the application of the GWO for fine-tuning the optimum parameters of KELM to predict the bankruptcy of the enterprise. When compared with three competitive KELM methods, it was a more excellent pre-warning tool with better performance for bankruptcy prediction. Luo et al. [21] suggested a multi-strategy heightened grasshopper optimizer (GOA) to improve the KELM for prediction of bankruptcy. By carrying out a series of experiments, it can provide a more stable KELM model with excellent predictive performance. Liu et al. [22] proposed PSO optimized KELM for short-term load forecasting of micro-grids. When four typical micro-grids were applied to test the accuracy, it obtained relatively smaller predicting errors. Liu et al. [23] proposed to use the quantum genetic algorithms (GA) optimized KELM for the reconstruction of 2-D profiles. By conducted some experiments, it was proved that it got faster speed, lower computational complexity and more excellent generalization performance than others. Chen et al. [24] developed a chaos-based and mutation-based bacterial foraging optimizer (BFO) for finding the optimal parameters of KELM for classification problems. Compared to the original BFO, it was more excellent in terms of both convergence speed and solution accuracy.
Through the above analysis, we can know that MAs can find satisfactory optimal parameters for KELM, and have achieved good results in many practical applications. In this work, we explored a new metaheuristic method, a butterfly optimization algorithm (BOA), for KELM training to forecast cervical hyperextension injury. BOA was proposed recently by Arora and S. Singh [25] in 2018, which is based on the feeding behavior of butterflies. Butterflies keep cooperative communication while looking for food. Butterflies communicate while searching for food. They can sense the smells of their peers or food. By analyzing these smells, they can locate their peers or food. BOA imitates the mechanism of butterflies looking for food to look for the global optimum in the search space. Since its introduction, BOA has applied to deal with many problems. Arora and Anand proposed a discrete version of BOA for feature selection purposes. The experimental results have proved that the proposed algorithm has obtained a great effect. Aygäl et al. [27] proposed to use the BOA for dealing with maximum power point tracking of photovoltaic systems; the results demonstrated that BOA could deliver high accuracy and better speed than these other algorithms. In [28], BOA has been applied to realize the optimal gains of the controllers working with the interconnected microgrid system and system participation factors. Compared with other methods, its superiority was proved under different real-world scenarios for frequency deviation, tie-line power, and objective functions.
Like any other MAs [6], [24], [29]- [38], BOA also has its flaws such as slow convergence speed, easy to skip locally optimal solution. There are few studies to improve BOA in the literature. Li et al. [39] developed an improved BOA based on the cross-entropy for dealing with engineering design tasks. When it tested on three classical engineering design problems, it had more superiority than other methods for challenging problems with restricted and unknown search domains. In this study, three mechanisms, orthogonal learning (OL), Lévy flight, and Broyden-Fletcher-Goldfarb-Shanno (BFGS), are introduced into BOA. Using Lévy flight search mode, butterflies can escape from local optimum more easily, accelerate the local search, and improve search efficiency. OL can produce a guidance vector, which plays a guiding role in finding the optimal solution for butterflies. BFGS is introduced as a local search to further improve exploitative capability. The proposed LBOLBOA has been validated on a comprehensive set of problems, including the CEC2017 benchmark testbed, the parameter optimization of KELM for prediction of cervical hyperextension injury [40]. The experimental results have demonstrated that LBOLBOA can achieve significantly better performance in most cases of CEC2017 benchmark problems than other competitive methods. Moreover, LBOLBOA optimized KELM is successfully employed to the prediction of cervical hyperextension injury and has achieved much better prediction performance than other MA optimized KELM methods. The following points can summarize the major contributions of this work a) LBOLBOA, as an improved BOA, is presented by embedding OL, levy flight, and BFGS, into BOA. b) The proposed LBOLBOA has found notable performance on function optimization tasks. c) The proposed LBOLBOA first solves the problem of parameter adjusting of KELM successfully d) A potential model, LBOLBOA-KELM, is successfully used to predict cervical hyperextension injury. This paper is structured as follows. Section 2 shows the description of BOA. The LBOLBOA method is described in detail in Section 3. Section 4 analyzes the simulation results of LBOLBOA on benchmark functions and machine learning problems. The conclusions and recommendations are presented in Section 5.

II. BUTTERFLY OPTIMIZATION ALGORITHM (BOA) A. MOTIVATION TO IMPROVE BOA
According to the theorem ''there is no free lunch'', each algorithm has its own advantages, and the BOA has been proposed and applied in some aspects, for example, Zhi et al. [41] proposed an improved version of BOA to optimize the performance of the heat recovery system, and the results illustrated that high relative humidity, the pressure of inlet gases, and low operating temperature develop GHG emission reduction and system exergy performance. Wen et al. [42] used a newly proposed hybrid model (hereafter referred to as EBOA-LSSVM) to predict residential CO2 emissions in the Yangtze River Delta region. The final simulation results demonstrate the outstanding performance of the new model by comparing its predictive accuracy with other models. Arora and Anand [26] proposed two binary variants of BOA and applied them to select the optimal combination of features. The experimental results confirmed the efficiency of the proposed method in improving the classification accuracy compared to other wrapper-based algorithms. However, the BOA itself has certain limitations, such as the tendency to fall into the local optimum and the slow convergence during the searching space. Therefore, in this paper, we choose BOA for improvement and proposed a scheme that can significantly improve the performance of BOA.

B. PROPOSED LBOLBOA METHOD
BOA is a new member of MA based on the butterfly's foraging [25]. BOA mainly imitates the behavior of the butterfly population looking for food. Each butterfly emits a certain concentration of fragrance. Each butterfly can receive and sense the smell of other butterflies around it, and move towards those butterflies that emit more fragrance. The fitness of butterflies changes when they fly from one place to another. BOA simulates the behavior of the butterfly's foraging to search the optimum value in the search space. The stage that a butterfly approaches its perceived fragrance is called the global search stage in the algorithm. Besides, when a butterfly is unable to perceive the aroma around it, it will move randomly, which becomes a local search stage in the algorithm. The aroma of butterflies depends on perceptual form, stimulus intensity, and power index, which can be expressed as below: In the formulation, F is aroma concentration, c is sensory form, I is stimulus intensity, and a is power index depending on the form. If a = 0, it means that the odor emitted by whichever butterflies can not be perceived by other butterflies. If a = 1, it means that all butterflies can perceive the odor emitted by any butterflies in the space with the same ability. Therefore, different butterflies have different degrees of aroma absorption.
The main steps of BOA are as bellow: Step 1: Initialization: define the objective function f (x), sensory morphology c, power exponent a, boundary probability p, define the butterfly population X = {x 1 , x 2 , · · · , x n } , and determine the stimulus intensity I i of butterfly x i through the objective function f (x).
Step 2: Fitness (i) of the individual: fitness of the butterfly population is calculated, and the best position of butterfly in the current generation will be found.
Step 3: The fragrance of butterflies is calculated, and the random number P is used to determine whether the search method of butterflies is a global search or local search.
Step 4: In the global search stage, butterflies fly toward the butterfly with the highest fitness using the following formula: where x t i is the solution vector of the first butterfly whose number of iterations is t, g * is the best solution found under the current iteration, f i is the fragrance of butterflies, r is a random number of [0, 1].
Step 5: In the local search stage, butterfly individuals fly locally at random, as shown in the following formula: where x t j and x t k are the jandk solution vectors in the first iteration of t. x t j and x t k are random individuals. r is a random number of [0, 1]. f i denotes the fragrance of butterflies.
Step 6: Update power exponent a.
Step 7: If the termination condition is reached, stop the whole procedure.
The pseudocode of BOA is shown below.

III. PROPOSED LBOLBOA METHOD
BOA likes most algorithms, needs to be improved in search accuracy and is difficult to escape from local optimum. Because of the above two points, a BOA based on OL [43], Lévy flight, and BFGS (LBOLBOA) [44] is proposed. The Lévy flight search mode is adopted, which makes it easier for the butterfly to jump out of the local optimum solution and speed up the local search. The most fragrant butterfly in the butterfly population is very important for guiding the butterfly population to the global optimal position in the global search. Therefore, OL is introduced to construct the guidance vector to guide the butterfly to the optimal position. Finally, the BFGS is used as a local search to further enhance the exploitative capability of BOA. These three mechanisms have been widely used in various MAs. Zhan et al. proposed OL-based PSO [45]. In the study, they constructed an effective vector-based on OL strategy to guide particles to fly in a better direction.
Li et al. applied OL to CS to enhance its optimization capability [46]. Xu et al. [47] applied OL to GOA to enhance its diversity. Chen et al. [48] introduced OL to improve the neighborhood search capabilities of the sine cosine algorithm (SCA). Zhu et al. [49] introduced OL into SCA to strengthen the optimization capabilities of the basic SCA. Hakli et al. applied the Lévy flight mechanism to PSO [50].
In the proposed algorithm, the limit value is defined for every particle at first. If the optimal solution cannot be Calculate update the value of sensory_modality iteration = iteration + 1 end while output the best solution improved at the end of particle iteration, the limit is raised. Once the particle exceeds the given limit, the Lévy flight method is used to redistribute the particles in the search space. To deal with the premature convergence on complex nonlinear problems based on SCA, which is prone to fall into local optimum, Li et al. combine Lévy flight with SCA [51]. Nezhad et al. [52] proposes a hybrid optimization algorithm based on a modified BFGS and PSO to solve medium-scale nonlinear programs. The proposed algorithm makes effective use of a hybrid framework when dealing with nonlinear equality constraints. Chen et al. [53], based on the BFGS quasi-Newton algorithm, a BFGS-ICA algorithm framework for brain activity localization using fMRI data is proposed. The new algorithm has good convergence and anti-initial point sensitivity and is applied to MRI. The individual's current fitness and its historical fitness are integrated into the algorithm to mark the individual that may fall into the local minimum. The labeled individual updates its position through Lévy flight so that it can escape from the locally optimal solution.

A. ORTHOGONAL LEARNING
Before solving optimization problems, we usually do not know the location of the global optimal solution. When the algorithm iterates and improves the solution vector continuously, there will always be some solutions that may be closer to the global first-rank solution. As a result, the solution vector obtained by combining orthogonal design with the algorithm can be used as a representative potential sample.
For different optimization problems, different algorithms may need various orthogonal tables. Although many orthogonal tables can be found through the network, not all orthogonal tables can be found, so we need to find the corresponding orthogonal tables. An orthogonal table with N -factor Q level is typically documented as L M Q N , where L denotes the symbol of the orthogonal table, M denotes the number of rows of the orthogonal table, that is, the number of trials, N signifies the number of orthogonal Columns and the maximum number of factors that can be accommodated, and N are not less than the number of actual factors D.
Algorithmic pseudocode for obtaining orthogonal arrays is shown in Algorithm 2.

Algorithm 2 Construction of Orthogonal Array
Input: number of factors D and number of levels Q Output: an orthogonal array According to the selected orthogonal table, representative experiments are selected from the comprehensive experiments, and experiments are carried out to obtain experimental data. Then, according to these data, the influence of different levels on each factor is analyzed. Through the analysis, we can get the influence of different levels of a particular factor on the experimental results. Let f i denote the test results in line i, then the influence of level q on factor jS jq is as follows: The z ijq the formula is as follows: When the dimension of the actual problem is larger than the factor number, the problem dimension needs to be decomposed and mapped to the factor number by grouping. On the contrary, the orthogonal table is directly used.
The value of the element in the selected orthogonal table is discrete, but the search space determined by the position Sol i = (Sol i,1 , Sol i,2 , . . . , Sol i,n ) of the ith agent and the guided vector T 2 = (T 2,1 , T 2,2 , . . . , T 2,n ), so the search space must be quantized to match the selected orthogonal Each dimension of the search space [lb, ub] will be quantized into Q levels such as l i,1 , l i,2 , . . . , L i,Q, where The dimension of the given problem is usually bigger than the factors N of the selected orthogonal table L M Q N , so the dimensions D 1 , D 2 , . . . , D n must be grouped into N groups to apply the selected orthogonal table, N − 1 integers k 1 , k 2 , . . . , k N −1 are generated randomly such that 1< k 1 < k 2 <. . . < VOLUME 8, 2020

end if end for
Update BestFitness, BestPosition else % Question dimension is smaller than the factor number for i = 1 : M for j = 1 : dim

end for end for
Calculating factor influence S For instance, if k 1 = 2, k 2 = 3, k 3 = 4, k 4 = 5, so the dimension of the given problem will be grouped as bellows: So, nine candidate solutions can be obtained through the above search space quantization and dimensions grouped as follows. Pseudocode for generating optimal solution and fitness value of the optimal solution by orthogonal learning is as shown in Algorithm 3.

B. LÉVY FLIGHT
Lévy flight has been successfully applied to PSO, CS, FA, and other intelligent optimization algorithms, and the results are excellent. Lévy flight belongs to a kind of random walk, which is a Markov process. The step size of flight satisfies a stable distribution of heavy-tailed. The characteristic of Lévy flight is that in the process of search, frequent short-range local searches and occasional long-distance walks alternately, so that the search will not be repeated in a smaller location, and it is easier to escape from the constraint of local extreme, thus changing the behavior of a system.
In the present text, the Lévy flight strategy is introduced into BOA, and the final individual is determined by comparing the fitness fitness_T 2 of the new individual T 2 produced by Lévy flight with the current fitness fitness_s. Moreover, introduce it into the orthogonal mechanism. New individuals are generated according to the following formula: Levy (λ) ∼ φ * u (|v|)

C. BROYDEN-FLETCHER-GOLDFARB-SHANNO (BFGS)
Based on the gradient info and the Hessian matrix of second derivatives, we can employ the traditional Newton approaches for minimizing the fitness of the studied functions. In addition, we need to calculate and invert the Hessian matrix. As we observe, the required time for this process is high. Quasi-Newton (QN) algorithms can update and invert Hessian by examining gradient vectors that will result in the minimization of the objective function.
It is quite interesting how the QN methods can generate a model for the objective function at hand. The way is Calculate H k+1 = I − p k s k y T k * H k I − p k y k s T k + p k s k s T k . k = k + 1. end to measure the gradient changes. This model can reveal a more super-linear convergence rate and advance the steepest descent, particularly when realizing the feature space of the hard problems. Also, when we compare it with traditional Newton methods, QN methods do not require obtaining second derivatives info. Hence, generally speaking, QN approaches can show more efficient performances.
For this reason, different versions of the well-known QN methods are available in previous works. Some examples are Broyden Class (BC), Symmetric Rank One formula (SR1), Davidon-Fletcher-Powell (DFP), and BFGS technique. In this paper, the BFGS is entrenched into basic BOA to advance the best search agent of the present population. The pseudocode of BFGS is described in Algorithm 6.
In this study, BFGS is implemented based on MATLAB Optimization Toolbox, the termination conditions of BFGS are TolFun = 1E-15, TolX = 1E-15, it means BFGS will be VOLUME 8, 2020 executed until the tolerance of fitness and position are both smaller than 1E-15.

D. PROPOSED LBOLBOA METHOD
Combining the above two mechanisms, this study proposes an improved BOA based on orthogonal learning and Lévy flight (LBOLBOA). LBOLBOA first updates the global optimal solution according to the original BOA, then updates the global optimal solution through orthogonal learning, finally compares and updates the optimal solution by using Lévy flight to produce a butterfly individual with the current optimal solution. Figure 1 illustrates the flowchart of LBOLBOA.
The pseudocode of LBOLBOA is shown in Algorithm 7.

A. IMPACT OF THE MECHANISMS ON BOA
Overall, it is clear from Figure 2 that the two sub-mechanisms have greatly improved the original BOA, and the effect of the fusion of the two mechanisms has also improved to a certain extent. In F3, F5, F10, F26, BFGS has a significant boost effect on BOA, but LBOLBOA has a better effect on these functions. On F5, F7, F8, F10, it can be shown that fusion of the two mechanisms can achieve better results. Overall, the fusion of the two mechanisms has resulted in a more pronounced boost to the BOA. Regarding the Wilcoxon signed-rank test, if the value of p is less than 0.05, it shows that LBOLBOA is not significantly superior to its peer, and conversely, it shows that LBOLBOA is significantly superior to its peer. It can be seen from Table 1   on the functions F5, F6, F16, F17, F20, F21, F22, F23, but on the other functions, LBOLBOA is significantly superior to the other BOAs, which is enough to show the superiority of LBOLBOA.

B. COMPARISON OF CONVENTIONAL AND ADVANCED ALGORITHMS
To study and analyze the capability of LBOLBOA in detail, LBOLBOA was compared with some conventional and advanced algorithms, including WOA [54], GWO [5], LSFLA [55], SE [56], m_SCA [57], ALCPSO [58], CGPSO [59], SCADE [60], OBLGWO [5], and BMWOA [54] on CEC2017 functions [61]. It is so crucial to validate a method considering all limitations and parameters using some numerical and mathematical methods. Wilcoxon signed-rank test and Friedman test [62] to evaluate the performance of LBOLBOA after comparing with that of other algorithms. The detailed parameter setting of the involved algorithms is presented in Table 2. For the accuracy of the experimental results and the fairness of the experiment, all the algorithms in the experiment are carried out under the same test conditions. The test conditions were as follows: population size SearchAgents_no = 30, number of evaluations Max_iteration = 300000. All algorithms run independently 30 times on the test functions to eliminate the inaccuracy of the results caused by the number of experiments.
Here, this paper gives the original parameter settings of each comparison algorithm.
Friedman's test results are shown in Tables 3 and 4. As shown, we can see that LBOLBOA is not the smallest in F5, F6, F7, F9, F10 F14, F16, F17, F20, F22, F26, and F28, yet the Avg of LBOLBOA is the smallest in other functions, so it can be said that LBOLBOA outperforms other algorithms in CEC2017. The last part of Tables 3 and 4 gives the overall results of LBOLBOA compared with other algorithms, including rank results of algorithm testing. Among them, '+' means that LBOLBOA achieves significantly better results than other peers, '-' means that LBOLBOA achieves significantly worse results than other peers, and '=' means that the performance of LBOLBOA is equal to other peers. By analyzing these two tables, we can conclude that LBOLBOA is inferior to the individual algorithm in 13.33% function tests, equal to the individual algorithm in 26.67% function tests, and superior to other algorithms in 60% function tests. It can be seen that the performance of LBOLBOA is superior to other counterparts.
The solution quality and convergence speed of the involved algorithms are shown in Figure 3. Compared with other algorithms on F1, F3, and F4, LBOLBOA has more excellent solution accuracy. The information of F29 shows that the LBOLBOA not only has high accuracy but also has much faster convergence speed than other algorithms. The other algorithms either converge slowly or have low solution quality. By analyzing F12 and F13, we can find that LBOLBOA performs very well on hybrid functions than other peers. The other peers differ significantly from it, and we can conclude that LBOLBOA can escape the locally optimum value. Additionally, we can see from F24 and F30 that LBOLBOA has achieved a good performance, both with good solution accuracy and convergence rate. In particular, on F9 and F29, LBOLBOA has found the optimum values at the beginning of the iteration. Therefore, from the above comparative analysis, we can conclude that LBOLBOA has achieved a right balance between exploration and exploitation. Figure 4 shows the results of feasibility analysis of CEC2017 functions by LBOLBOA. In the figure, the graphs for the first column (a) and the second column (b) record the search history of LBOLBOA. The third column of graphs (c) recorded the trajectory of the first individual. The graphs for the fourth column (d) record the average fitness of LBOLBOA and BOA during the iteration.
In Figure 4 (b), the red dots represent the optimal solution. We can see that the search position near the optimal solution is the most intensive. This indicates that LBOLBOA has found the target area and further exploits it. In Figure 4 (c), we can observe that LBOLBOA calms down after the initial violent shaking, proving that LBOLBOA is able to complete the exploration phase very quickly and converge rapidly. The violent fluctuation of individual trajectory indicates that LBOLBOA has searched most of the search space.
Similarly, at the beginning of the search, the average fit quickly reaches the optimal value, which shows that LBOLBOA has a strong capacity for exploitation. On all  the function images shown, the convergence speed and accuracy of LBOLBOA are obviously better than that of BOA.
This demonstrates that LBOLBOA does not fall into local optimality and finds a better solution. In summary, analysis VOLUME 8, 2020

1) DATA DESCRIPTION
In this study, X-ray and MRI images of 43 patients suffering from cervical hyperextension injury were collected at the Outpatient and Emergency Department of Jilin University Affiliated Second Hospital from June 2018 to July 2018 [40]. Also, radiographic images were collected from 41 patients who suffered trauma but without cervical hyperextension injury. Two radiologists reviewed the slice with two orthopedic surgeons and gave measurements and evaluations. A definite history of each patient was also taken, along with signs or symptoms of the patient, such as muscle weakness, pain, and numbness in the upper extremities. Details of these characteristics are given in Figure 5.

2) LBOLBOA-KELM MODEL a: DESCRIPTION OF KELM
Compared to other machine learning methods, such as deep neural networks [63]- [65], support vector machines [32], [35], [66]- [71], etc. The extreme learning machine (ELM) [72] is developed based on a single hidden layer feedforward neural network. The ELM can randomly generate connection weights between the input layer and the hidden layer and thresholds for the hidden layer neurons. During training, there is no need to adjust, just set the number of neurons in the hidden layer to get the unique optimal solution. ELM can significantly advance learning rapidity and performance compared to previous training methods. Due to its excellent performance, it has gained a wide range of applications [73]- [76].
KELM [77] is developed by adding kernel functions based on the ELM. The radial basis function (RBF) kernel is the most common type of kernel function. The RBF kernel was chosen in this study because it maps samples to higher dimensional spaces and can manage samples with nonlinear relationships between class labels and features. At the same time, the RBF kernel has the benefit of fewer parameters. When using the RBF kernel, only two parameters, C and γ , are considered (C stands for the penalty factor, and γ stands for the kernel parameter), but the choice of  parameters has a large impact on the result. There is no certain a priori knowledge of parameter selection, so some type of model selection (parameter search) must be performed [4], [12], [20], [21], [5], [6], [78]. In the present study, we applied the proposed LBOLBOA to optimize two key parameters of KELM, and the resulting LBOLBOA-KELM was applied to predict cervical hyperextension injury.

b: PROPOSED LBOLBOA-KELM
The LBOLBOA optimized KELM, termed as LBOLBOA-KELM, and was established to predict cervical hyperextension injury. The flow chart of LBOLBOA-KELM is illustrated in Figure 6. The core engine algorithm of the prediction model is KELM, and the input data is 17 factors that may cause cervical hyperextension injury. In the input feature space, the inserted info is plotted to another hidden layer using the effective role of the RBF kernel function. This space includes two vital parameters and feature subsets, namely the penalty coefficient C, the kernel bandwidth γ , and the optimal feature subset. The optimal two parameters and feature subsets are obtained mainly through LBOLBOA evolution. Finally, based on the two best parameters and the most optimal combination of characteristic features, KELM accurately predicts cervical hyperextension injuries. The overall steps of using LBOLBOA-KELM to make predictions are as follows.
Step 1: Acquire relevant data on cervical hyperextension injury.
Step 2: Separate the training data from the test data. The data were divided, and training data and test data were obtained separately. In this study, the data are parted using a 10-fold cross-validation scheme, where 10% of the data is used as test data, and the remaining 90% is used for training data and performed ten times in a turn.
Step 3: The training data and test data were normalized within [0, 1].
Step 4: Initialize the swarm size, the number of iterations, and associated parameters, where C, γ is sequentially encoded as two dimensions of the proxy, while 17 features are binary encoded with 0, 1.
Step 5: Set the feature space bounds, and the accuracy rate is set based on the objective function.
Step 6: Set the value of the optimal objective function based on the comparison of the output objective function value with the optimal objective function value.
Step 7: Update the location of the agent and save the best location.
Step 8: If the maximum number of iterations is reached, the two optimal parameters and the 17 features are output as final optimal values. Otherwise, go to step 5 and continue the iterative evolution.
Step 9: After obtaining the optimal values of C, γ parameters, and the combination of the optimal features, the KELM was retrained according to the training set.
Step 10: Use the optimal LBOLBOA-KELM model obtained on the test set to predict the new sample.

c: EXPERIMENTAL RESULTS
In this section, we need to run a set of experiments to confirm the impact of the proposed LBOLBOA-KELM model in diagnosing cervical hyperextension injury. The methods involved, including LBOLBOA-KELM, BOA-KELM, PSO-KELM, SCA-KELM, are implemented from scratch using MATLAB 2018. SVM model was implemented using LIBSVM [79], and the backpropagation neural network (BPNN) model was implemented via a neural network toolbox from MATLAB. The maximum number of iterations and the number of populations were set at 50 and 20, respectively. 10-fold cross-validation was also used to evaluate the effectiveness of various taxonomic methods to ensure unbiased results. As two important parameters in SVM and KELM, the search ranges of C and γ were set to [2 −5 , 2 5 ] and [2 −5 , 2 5 ], respectively. Table 5 lists the detailed results of LBOLBOA-KELM on the cervical hyperextension injury data. As shown, we can see that the classification accuracy obtained by LBOLBOA-KELM is 97.78%, MCC is 0.96%, sensitivity is 100.00%, specificity is 96.00%, and the standard deviation (STD) of each metric is 0.0468, 0.0843, 0 and 0.0843 respectively. Also, we find that the optimal parameters of KELM can be acquired by the proposed LBOLBOA automatically, mainly because the optimal parameters can be found adaptively by the improved BOA.
To validate the effectiveness of the developed LBOLBOA-KELM, we propose a comparative study with five other machine learning approaches, three of them are MA-based KELM models (SCA-KELM, PSO-KELM, BOA-KELM) and two of them are commonly used machine learning   Figure 7 shows the comparison results among the six methods. The consequence indicates that the LBOLBOA-KELM model performs better than the original BOA-KELM in terms of ACC, MCC, sensitivity, and specificity, and its STD value is also much smaller than that of BOA-KELM. This means that LBOLBOA-KELM has better classification performance and stability compared with the BOA-KELM. Regarding ACC, LBOLBOA-KELM has achieved the best performance. The next one is the SCA-KELM model. The results of PSO-KELM and BOA-KELM are very close to SCA-KELM. SVM and BPNN were lower than the other three MA-based KELM models. The results of BPNN are the worst, and the STD is also the largest, which indicates that the BPNN model is not stable in solving this task. In terms of MCC, LBOLBOA-KELM still achieves the best results, followed by SCA-KELM, PSO-KELM, BOA-KELM, BPNN, and SVM.  The result of SVM is the worst, and the STD is the largest of all methods. In terms of sensitivity, LBOLBOA-KELM has the best effect, reaching 100.0% with an STD of 0.0, followed by PSO-KELM, SCA-KELM, BPNN, SVM, and BOA-KELM model. The results of BPNN and SVM are very close. The results of the original BOA-KELM model are the worst, and the STD is all the largest. By contrast, the improved LBOLBOA-KELM model produces the best results, and the STD is also the smallest. In terms of the specific evaluation index, the results of the LBOLBOA-KELM model are the best, and the STD is the smallest. SCA-KELM model is the second-best, followed by PSO-KELM, SVM, BOA-KELM, and BPNN. The worst result of the BPNN model is 89.5%, and the variation is the largest.
To illustrate the convergence characteristics of LBOLBOA, we also record the convergence tendency of varieties of MA-based KELM approaches. In Figure 8, it is found that after several iterations, the LBOLBOA-KELM model can quickly and continuously jump out of the local optimum to reach the first-rank precision, which indicates that the LBOLBOA has strong local search capability and global search capability. The main reason is the introduction of orthogonal strategy and levy flight mechanism, which enhances the global search ability and local search ability of classic BOA. As shown in the curve in Figure 8, for BOA-KELM, more iterations are needed to find a better way, and its accuracy is the lowest among all algorithms, far less than LBOLBOA-KELM model. Although PSO-KELM and SCA-KELM are superior to BOA-KELM in accuracy and global search capability, they are still not as good as LBOLBOA-KELM.

V. CONCLUSIONS AND FURTHER WORKS
To optimize the prediction model utilized for cervical hyperextension injury, we have enhanced the performance of the BOA optimizer using some efficient mechanisms. In this study, orthogonal learning and Levy flight mechanism are integrated into BOA, and a new algorithm LBOLBOA is proposed. Firstly, the OL mechanism is introduced to construct a guidance vector to guide the population to seek the optimal direction. Then Levy flight is employed to make butterfly individuals have the ability to escape from the local optimum. The proposed LBOLBOA was compared against some stateof-the-art advanced counterparts; experimental results have shown that LBOLBOA has a strong convergence ability and solution quality, which is better than the current advanced algorithms in a meaningful manner. Moreover, the proposed LBOLBOA has also applied to tackle the machine learning problems, including parameter tuning of KELM. Moreover, the resultant LBOLBOA-KELM has successfully applied for the prediction of cervical hyperextension injury.
On the way of future research, LBOLBOA can be used for other different scenarios like clustering, image segmentation, engineering design optimization, and multi-objective optimization, and social manufacturing optimization.