An Improved Harris Hawks Optimization Algorithm With Simulated Annealing for Feature Selection in the Medical Field

Harris Hawks Optimization (HHO) algorithm is a new metaheuristic algorithm, inspired by the cooperative behavior and chasing style of Harris’ Hawks in nature called surprise pounce. HHO demonstrated promising results compared to other optimization methods. However, HHO suffers from local optima and population diversity drawbacks. To overcome these limitations and adapt it to solve feature selection problems, a novel metaheuristic optimizer, namely Chaotic Harris Hawks Optimization (CHHO), is proposed. Two main improvements are suggested to the standard HHO algorithm. The first improvement is to apply the chaotic maps at the initialization phase of HHO to enhance the population diversity in the search space. The second improvement is to use the Simulated Annealing (SA) algorithm to the current best solution to improve HHO exploitation. To validate the performance of the proposed algorithm, CHHO was applied on 14 medical benchmark datasets from the UCI machine learning repository. The proposed CHHO was compared with the original HHO and some famous and recent metaheuristics algorithms, containing Grasshopper Optimization Algorithm (GOA), Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Butterfly Optimization Algorithm (BOA), and Ant Lion Optimizer (ALO). The used evaluation metrics include the number of selected features, classification accuracy, fitness values, Wilcoxon’s statistical test ( $P$ -value), and convergence curve. Based on the achieved results, CHHO confirms its superiority over the standard HHO algorithm and the other optimization algorithms on the majority of the medical datasets.


I. INTRODUCTION
Recently, high dimensional data become an essential source for many Machine Learning (ML) research, such as data mining and pattern recognition. However, increasing data volume and data dimensionality causes many problems, like the appearance of noisy, irrelevant, and redundant data [1], [2]. This problem increases the ML complexity and decreases the classification performance. Also, the majority of ML classifiers cannot relate with all features included in the complex dataset. However, this stated problem can affect data mining performance and pattern recognition since it mainly depends The associate editor coordinating the review of this manuscript and approving it for publication was Essam A. Rashed . on the ML classifier. Thus, Feature Selection (FS) is a critical process in ML to select relevant features and remove noisy and irrelevant ones. In high-dimensional datasets, choosing the most significant features is a challenging task. However, many studies have proven that; the FS methods can efficiently select the crucial features and remove irrelevant and redundant ones [3], [4], and [5]. Also, reducing the computational complexity and required storage space are essential tasks of FS, which consequently enhance the classifier [6]. Therefore, the FS process has the potential to improve the classification performance of the ML classifier substantially.
Generally, the FS process consists of four parts: feature subset search, evaluation, search stop criteria, and validity [2]. Based on the evaluation criteria, FS methods are separated into two main types: Filters and Wrappers methods. Filter-Based Methods (FBM) utilize statistical functions to choose and rank the feature subsets. Additionally, FBM, such as Chi-square, Information Gain, Relief, and Gini-Index has no direct contact with the classifier, and they operated before employing the classifier [7]. On the other hand, the Wrapper-Based Methods (WBM) have direct contact with the used classifier [8]. Many studies have been applied to WBM methods in optimization algorithms for feature selection purposes [8], [9], [10]. WBM is computationally more expensive, but it achieved better results than the FBM. Commonly, WBM is employed in FS problems since it considers the classification performance and feature reduction conditions, and due to its ability to interact directly with the classifier.
In WBM, the fitness function is applied to assess the FS process depending on the accuracy of classification. In order to improve the accuracy in FS, various studies have been conducted using optimization algorithms [1], [10], [11], and [12]. However, the main goal of using optimization algorithms in FS is to determine the optimal features or the features near to the optimal during a reasonable time (i.e., the optimal feature sets). On the other hand, the standard inclusive-search seeks to find all possible combinations of features from the complete set of features. This process is considered a time-consuming search and is regarded as an NP-hard problem [13]. Therefore, optimization algorithms are needed to solve the problem stated above appropriately due to their ability to obtain a solution that might be optimal or near the optimal solution. However, in high-dimensional problems such as the FS problem, the optimization algorithms suffer from local optima and population diversity problem. To find a tuned algorithm and make it appropriate for feature selection problems, we focus on improving the HHO algorithm by proposing two contributions folds. The first is to employ chaotic maps to improve population diversity. The second is to use the Simulated Annealing (SA) to improve the exploitation capability of the algorithm and avoid local optima problem.
In this study, an improvement of the standard HHO algorithm named CHHO is proposed. The chaotic map algorithm is used to initialize the solutions (search-agents) at the initialization phase of HHO. The proposed version is expected to accelerate the convergence rate of the HHO and diversify the generated solutions of HHO. Furthermore, we used Simulating Annealing (SA) algorithm to improve the exploitation ability of HHO and avoid the local optima problem. In the literature, different forms of hybrids optimization algorithms were proposed for feature selection problems. Still, as far as the authors' knowledge, this is the first time that a hybrid model using HHO with chaos theory and SA algorithm to be proposed and applied in feature selection problems. CHHO will be used to improve the classification performance for the feature selection problem. In this work, the main contributions can be summarized as follows: 1) CHHO: An improved variant of the HHO algorithm to solve its weaknesses and make it suitable for the feature selection problem. 2) Two main improvements were introduced into standard HHO including: • The use of the Chaotic maps at the initialization phase of HHO to improve its solution diversity.
• Simulated Annealing (SA) algorithm is combined with HHO to improve its exploitation and avoid local optima problem. . The experiments were conducted on 14 benchmark medical datasets from the UCI machine learning repository. The used evaluation metrics include the number of features, classification accuracy, fitness values, P-value, and convergence rate. The rest of the paper is structured as follows. Section II formally introduces the related works in the literature. Section III provides all the details about the HHO algorithm. In Section IV, the proposed CHHO algorithm is presented. Section V presents the performed experiments and achieved results. Finally, Section VI concludes the paper.

II. RELATED WORK
Recently, optimization algorithms have become very popular due to their demonstrated efficiency in solving feature selection problems. Examples of these algorithms are Butterfly Optimization Algorithm (BOA) [14], Grasshopper Optimization Algorithm (GOA) [15], Ant Lion Optimizer (ALO) [11], The Whale Optimization Algorithm (WOA) [16], Slap Swarm Algorithm (SSA) [7]. Despite the unique structure of meta-heuristic algorithms, there is a common characteristic. The majority of the techniques start with a random population initialization, solution evaluation on each iteration based on a fitness function, solution updating, and eventually determining the best solution based on the termination criterion. The phases, as mentioned earlier, define the search behavior were it is mostly referred to as exploration and exploitation phases. In the former stage, the optimization algorithm attempts to discover the best region of the search space. The optimization algorithms apply its stochastic operations as much as possible to examine all areas and sections of the feature space deeply.
On the contrary, the next phase intends to enhance the search process for local regions rather than all feature spaces. Usually, exploitation is performed after the exploration phase [17]. In most complex applications, optimization algorithms are trapped in local optima due to the incorrect balance between the exploitation and exploration and the randomness nature of the initialization process. One of the methods used in literature to solve the population diversity problem is chaos theory. Chaos Optimization Algorithm (COA) [18] is one of the chaos implementations that takes advantage of the nature of chaotic structures. It has proven that changing the random parameter values with a chaotic system can enhance classification performance [19]. As a result, several optimization algorithms studies combined the chaos theory to improve the performance and to adjust specific parameters. Examples of these implementations are Chaotic Crow Search Optimization (CCSA) [19], where a chaotic system was used to overcome the low convergence rate and local optima entrapment. Chaotic Whale Optimization Algorithm (CWOA) [20] applied a chaos system to improve the global convergence rate and obtain improved performance. Chaotic Genetic Algorithm (CGA) [21] examined a chaotic system to improve GA performance. Chaotic Gray Wolf Optimization (CGWO) [22] applied a chaotic system to accelerate the global convergence rate. Chaotic Grasshopper Optimization Algorithm (CGOA) [23] utilized the chaos system to balance exploration and exploitation more effectively. These algorithms and more were applied in different fields and applications. The noticeable improvements of adding chaotic maps in these algorithms, which was confirmed by the reported results, motivated the current study to combine chaos theory with HHO to improve population diversity.
On the other side of the search behavior, Simulated Annealing (SA) is proposed to solve the HHO local optima problem in high dimensional FS. SA was presented in 1983 by Kirkpatrick. It is considered to be a hill-climbing method that attempts to enhance the candidate solution for the objective function [24]. SA algorithm was used to improve the exploitative capability of the algorithm and prevent local search problems. Many optimization algorithms used SA to enhance the local search strategy. For instance, it was used to evaluate the performance of feature selection [9], to improve the best solution after each iteration [25], to enhance the exploitation search capability [26], to evaluate PSO performance as a wrapper-based method [8]. The performance obtained by employing the SA in these previous studies inspired this study to include the SA algorithm into the iteration process to enhance the local search in the FS problem.
Optimization algorithms have been applied successfully for FS in many applications such as data mining [27] using Particle Swarm Optimization, pattern recognition [28] using Binary Genetic Swarm Optimization, Medical applications [5] using Crow Search Optimization, and image analysis [29] using Genetic Algorithm Optimization, image processing [30]- [32] using Optimized Deep Neural Network, and there are many more. Nowadays, FS is an essential step to preprocess high-dimensional datasets. It must be pointed that there are representative computational intelligence algorithms that have been applied to improve the FS in different studies such as [7], [9], [27], [33], [34], [46], and [47]. The optimization methods aim to obtain the optimal solution for FS (i.e., significant feature subset) within an appropriate time and cost. The successful application of these algorithms, which confirmed by performance results in different fields, inspired this study to apply the improved HHO to the FS problem.
Harris Hawks Optimization (HHO) is a recent metaheuristic algorithm developed by [35]. Inspired by the cooperative behavior and chasing style of Harris' Hawks in nature called surprise pounce. HHO is capable of solving unconstructed benchmark problems compared to other popular optimization algorithms, as reported by the author. Also, HHO is a population-based and gradient-free optimization technique to be applied to any optimization problem subject to an engineering formulation. However, HHO is considered a random optimization algorithm and suffers from various issues as population diversity and local optima when dealing with high-dimensional datasets. The reasons above and HHO characteristics motived this study to improve the performance of the HHO in the feature selection problem. In addition to that, to investigate the performance of the improved HHO (CHHO) in obtaining the optimal subset of the feature selection while achieving better classification performance. The importance of this work comes from the fact that the Harris Hawks Optimization algorithm has been applied in many fields such as image processing [36], Optimal Power Flow Problem [37], drug design, and discovery [38]. Also, HHO applied in feature selection by using the Elite Opposition-Based Learning method [39] and provide good results. In the following section, basics and background about the HHO algorithm.

Harris Hawks Optimization Algorithm (HHO) is a novel optimization algorithm developed by Heidari and
Mirjalili et al. in 2019 [35]. The algorithm simulates the collaborative behavior and hunting technique of Harris Hawks in nature named surprise pounce. In this smart strategy, the Hawks collaboratively attack from many directions to surprise the prey. Harris Hawks exposes several chase styles based on the nature of the plots and the escaping patterns of the victim. The standard HHO algorithm proposes the exploration and exploitation strategies, motivated by exploring prey, surprise pounce, and the unique attacking technique of Harris Hawks. HHO algorithm. The HHO algorithm is a population-based and slope optimization method. Therefore, it was utilized to many optimization problems subject to an appropriate formulation. In the next steps, the HHO mathematically simulates these useful techniques and behavior to develop an optimization algorithm.

1) INITIALIZATION PHASE
In this phase, the objective function and the search-space are defined. Also, the initial population-based chaotic maps are initiated. In addition, all parameter values are set.

2) EXPLORATION PHASE
In this phase, all Harris hawks are considered as candidate solutions. In each iteration, the fitness value is computed for all these possible solutions based on the intended prey. Two approaches are applied to mimic the exploration performances of Harris Hawks in the search-space specified in (1) where X (t + 1) is the position-of Hawks in second iteration t. X rabbit (t) is the prey position and the X rand (t) stands for the random solution chosen in the current population. X (t) is the position vector of Hawks in the current iteration t, the r 1 , r 2 , r 3 , r 4 and q are random scaled factor within [0, 1], which are updated in each iteration, LB and UB are the upper and lower bounds of variables, and the X m is the average number of the solutions. This intended approach generates the positions of Hawks within (UB − LB) bounds based on two rules; 1) create the solutions based on randomly selected hawk from the current population and the other hawks. 2) create the solutions based on the prey location, the average position of Hawks, and random scaled factors. While r 3 is a scaling factor, once the value of r 4 is close to 1, it will help increase the randomness of the rule. In this rule, a randomly scaled movement length is added to LB. A random scaled component is considered to provide more diversification techniques to explore different areas of the feature space. The average position of hawks (solutions) is formulated in (2).
where X m (t) is the average number of the solutions in the current iteration. N indicates all possible solutions. X i (t) implies the location of each solution in iteration t, which created based chaos theory. Usually, in Eq. (1), rule one is applied when the hawk uses the information from the random hawks to catch the prey. While rule two is applied when all hawks share the best solution and the best hawk employed.

3) TRANSITION FROM EXPLORATION TO EXPLOITATION
This phase explains the movement of HHO from exploration to exploitation, based on the energy of the prey (E). HHO assumes that the energy of prey is reduced gradually through the escaping actions. E 0 is the initial energy decreases from [1, −1], modeled in (3).
where T indicates the maximum number of iterations, and t is the current iteration.

4) EXPLOITATION PHASE
In this phase, the exploitation phase is accomplished using four approaches at parameter sets. These approaches are based on the position identified in the exploration phase. However, the prey tries to escape frequently, while the hawks tracing and try to catch it. HHO exploitation is mimic the attacking strategy of the Hawks by using four possible approaches. These approaches are the soft besiege, hard besiege, soft besiege with progressive rapid dives, and hard besiege with progressive rapid dives. These approaches based on two variables r and |E|, which specify the executed approach. Where |E| is the escaping energy of the prey, r refers to the probability of escaping, where r < 0.5 indicates the higher possibility for the prey to escape successfully and r ≥ 0.5 for unsuccessfully escape. A summary of these approaches are presented as follows: In the soft besiege approach, where r ≥ 0.5 and |E| ≥ 0.5, the rabbit still has some energy to escape, while the hawks are softly encircling the prey make it lose more energy before performing the surprise pounce. Soft besiege mathematically formulated in (4), (5), and (6).
where x (t) is the difference among the position vector of the prey and the current location in iteration t, and J presents the jump power of the prey and r 5 is a random variable.
In the hard besiege strategy, where r ≥ 0.5 and |E| < 0.5, the prey is tired with a weak escaping chance. In this condition, the hawk hardly encircles the prey to perform the final surprise pounce. Thus, the solution is updated using (7).
Eq. (8) shows the soft besiege with progressive rapid dives approach. In this condition r < 0.5 and |E| ≥ 0.5, the prey still has the energy to escape. The hawk moves smartly around the prey and patiently dives before the surprised pounce. This action is considered as intelligent soft besiege, where the position of the hawks is updated in two steps. In the first step, the hawks move toward the prey by estimating the next move of the prey as formula (8) In the second step, the hawk decided whether to dive or not, based on the comparison between the previous dive and the possible result. If it is not, the hawks producing irregular dive, based on the Levy Flight (LF) concept, as formulated in (9) where the dimension of solutions is defined as Dim, S is a random vector of size 1 × dim. LF is the Levy Flight function VOLUME 8, 2020 calculated using (10).
where β is a default constant set automatically to 1.5, and u, v are random values within [0, 1]. Therefore, updating the Harris hawks positions in with progressive rapid dives can be formulated in (11) where Y and Z are performed using (8) and (9), and both refer to the new iteration's next location. The last approach is called hard besiege with progressive rapid dives, where r < 0.5 and |E| < 0.5. In this condition, the prey has no energy to escape, and the Harris hawks attempt to reach the prey by rapid dives before performing a surprise pounce to catch the prey. The movement of the hawks in the condition is formulated in (12) where Y is set as in (13), and Z is updated as in (14) Finally, the classification accuracy computed using the fitness function set in Eq. (15). The fitness function includes the computation of classification error, as mathematically formulated in (15) where αγ R (D) refer to the classification error rate of the used classier KNN. Besides, |R| is a cardinal number of the selected subset and |N | is the total number of features in the dataset, α, and β are two parameters corresponding to the importance of classification quality and subset length, α ∈ [0, 1] and β = (1 − α) approved in [25].

B. CHAOTIC MAPS
Chaos optimization is a dynamic system. This system is one of the most modern methods to search for the global optimum solutions in a search space. In this study, we have implemented ten chaotic maps to replace the Harris Hawks position's random variables as listed in Table 1. The main idea of it is to replace the random initialization variables with chaotic maps variables. The initial value set to all chaotic maps x 0 is 0.7. While y refers to the symbol of chaotic sequence x. Also x y is the yth number on the chaotic sequence. The remaining variables d, c and µ are the control variables that help to define the chaotic performance of the algorithm. There are some studies available in the literature were they utilized chaotic maps to improve HHO. Examples of these studies, in [40], they replaced the random parameters in HHO with a chaotic logistic map. Moreover, in [41] they replaced the random parameters in Multi-Verse Optimizer (MVO) using chaotic maps. Also, they used HHO as a local search operator within MVO to solve its local optima problem. Furthermore, in [42], they replaced the random parameter in HHO with chaotic map value. However, the main differences in our work compared to these previous improvements on HHO include the following: 1) chaotic maps used to initialize the solutions (search_agents) positions at the initialization phase of HHO instead of using the standard random numbers for initializing the HHO solutions positions. 2) we utilized SA as a local search operator within HHO to solve its local optima problem.

C. SIMULATED ANNEALING
Simulated Annealing (SA) was proposed by Kirkpatrick et al. in 1983. It is a local search algorithm based on a single solution. It is considered a hill-climbing method that repeatedly tries to improve the available solutions for the objective function [24]. The improved solution will be accepted, while the worst solution will be taken with a well-defined probability of the algorithm to avoid the local optima. The probability of choosing a worse solution is determined by Boltzmann probability function P = e − θT , were θ is the difference of evaluation of the objective function between the best solution (Soltrial) and the trial solution (Solbest). In the same time, T is a parameter (named temperature) that periodically decreasing throughout the search process [24].

IV. THE PROPOSED CHAOTIC HARRIS HAWKS ALGORITHM (CHHO)
In this study, feature selection is regarded as a multi-objective optimization problem, in which two contradictory goals must be achieved. These goals are to minimize the number of selected features and maximize the classification accuracy. In other words, to reach a minimum number of selected features in the solution that leads to higher classification accuracy. Every solution is calculated according to the proposed fitness function, which depends on the KNN classifier [43], to obtain the classification accuracy of the solution as well as the number of selected features. To balance the number of selected features in each solution (to be minimum) with the classification accuracy (to be maximum), we have chosen the fitness function in the equation (15) is applied for evaluating the search agents in the algorithm.
Based on the previous studies, which utilized HHO for solving different problems and confirmed its outperformance in comparison to other recent and well-known optimization algorithms, we have been motivated to apply HHO on feature selection problem. However, the standard HHO algorithm suffers from two significant problems when applied to highdimensional problems such as the feature selection problem. These problems are including 1) problem of solutions diversity; 2) problem of local optima. Therefore, to improve the HHO algorithm and make it suitable for the feature selection problem, two main improvements are introduced in this study to solve the weakness of the HHO algorithm. The first improvement includes the use of chaotic maps at the initialization phase to improve the diversity of the solutions. The second improvement consists of using the SA algorithm with the HHO algorithm to enhance its exploitation and avoid being stuck in local optima. The details of these contributions into HHO are detailed as follows: In the CHHO algorithm, the chaotic map value replaced the randomly generated values for initializing the Harris Hawks population positions at the initialization phase. The chaotic values are generated from chaotic maps. In this work, ten chaotic maps were applied to the algorithm to contrast the effect of employing different chaotic maps. These maps are Singer, Sinusoidal, Chebyshev, Circle, Tent, Sine, Piecewise, Logistic, Iterative, and Gauss/mouse. The maps with its statistical equations are listed in Table 1. These maps significantly increase the convergence rate and the fitness performance of the HHO, as will be demonstrated later in the experimental discussion section.
Moreover, the second improvement is to embed the SA in the CHHO algorithm to enhance its local searchability. This embedding will improve the exploitation capability of the algorithm. After implementing chaotic maps and obtaining the best solution, SA is used to improve the current best solution at the end of each HHO iteration. The pseudocode of the proposed CHHO algorithm is illustrated in Algorithm 1.
To explain the computational complexity of the CHHO algorithm. The computational complexity stands on initialization, fitness evaluation, and updating of candidate

Algorithm 1 Pseudo-Code of CHHO Algorithm
Inputs: The population size N and maximum number of iterations T Outputs: The location of the rabbit and its fitness value Initialize the chaotic population X i (i = 1, 2, . . . , N ) while (fitness value != stopping criteria) do Compute the fitness values of hawks Set X rabbit as the location of rabbit (best location) for (each hawk (X i )) do Update the initial energy E 0 and jump strength J Exploration phase if (|E| ≥ 1) then Update the location vector using Eq.
(7) else if (r < .5 and |E| ≥ .5) then Soft besiege with progressive rapid dives Update the location vector using Eq. (11) else if (r < .5 and |E| < .5) then Hard besiege with progressive rapid dives Update the location vector using Eq. The proposed CHHO is also presented in the form of a flowchart in Figure. 1. The starts of the CHHO process by initializing the Harris Hawks (search-agents) population using chaotic maps. Then, compute the fitness value of the candidate solution. After that, SA applied in each iteration. Then the evaluation of fitness value using wrapper FS based KNN classifier. All earlier process will be reiterated until the stopping condition is satisfied.

V. EXPERIMENTAL RESULTS AND DISCUSSION
To validate and evaluate the performance of the proposed CHHO algorithm. CHHO was compared with some famous and recent optimization algorithms, including GOA, GA, PSO, BOA, and ALO algorithms. All experiments were conducted on 14 benchmark datasets from the UCI repository. The used datasets and all experiment details presented in the following steps:

A. DATASETS DETAILS
In this experiment, fourteen medical benchmark datasets were used from the UCI machine learning repository. The details of these datasets are presented in Table 2. All experiments were conducted using the settings stated in Table 3.

B. ALGORITHMS AND EXPERIMENTS PARAMETER SETTING
In all experiments, the wrapper method based KNN classifier (10-fold cross-validation) was utilized to validate the fitness performance of the proposed algorithm. This validation technique uses k − 1 folds for training and one-fold for testing. Also, the parameter settings of other baseline optimization algorithms GOA, GA, PSO, BOA, and ALO, are shown in Table 4. Furthermore, for all algorithms, the population size was set to 10, and the maximum number of   iterations was 50. The classification accuracy was chosen as a critical metric for evaluating and validating the optimization algorithms performance. The results are presented in Table 5, were the results performed based on the average number of 20 runs, in each run 50 iteration modified by the SA algorithm.

C. RESULTS AND DISCUSSION
In this section, we present the summary and results of all experiments. Two main experiments were conducted using the CHHO algorithm to customize the algorithm to solve the feature selection problem. The first experiment includes evaluating the performance of the CHHO with the original

1) THE EFFECT OF DIFFERENT CHAOTIC MAPS WITH SA ON THE STANDARD HHO ALGORITHM PERFORMANCE
The first aim of this experiment is to evaluate the performance of CHHO with ten chaotic maps and determine the best chaotic map while including the SA algorithm into the iterations process. In Table 5, the evaluations of CHHO with ten chaotic maps and the standard HHO are reported. The P-value of Wilcoxon's statistical test was used to evaluate the HHO with ten CHHO variants to highlight the significance of the improvement. In Table 5, P-value is underlined were the significant P-value < 0.5. It must be indicated that CHHO1, CHHO2, . . . , CHHO10 in Table 5 refer to the ten implemented chaotic maps as presented in Table 1. Additionally, it is worth mentioning that Ds1, Ds2, . . . , Ds14 in Table 5 refer to the 14 benchmark datasets as shown in Table 2. It can be seen from Table 5. that CHHO with chaotic maps outperformed the standard HHO. Also, for the statistical comparison between chaotic maps among all datasets, the best results of the P-value are underlined as presented in Table 6. P-value results in Table 6 show the outperforming expansion of Sine chaotic map in comparison with all other maps. As observed in Table 6, the CHHO6 (Sine map) variant obtained significant statistical results compared to the others in most cases. Therefore, this result concludes that CHHO6 is a significant improvement over the original HHO algorithm. As shown in Table 5, the CHHO2 variant (Sinusoidal) provided the best number of selected features followed by the CHHO6 variant. In addition, the CHHO6 variant obtained the best classification accuracies while the CHHO2 variant obtained the best fitness values.
Generally, most of the CHHO variants with the chaotic maps produced better solutions than the standard HHO in all metrics. This shows the importance and effect of employing a chaotic map and SA to improve the population diversity and enhance the local search. However, in most cases, the worst results were found by the CHHO7 (Piecewise map) variant, which makes it incompatible with the search mechanism the standard HHO algorithm. Based on the reported results in Tables 5 and 6, it can be concluded that the Sine chaotic map seems to be a proper choice to enhance the performance of the standard HHO algorithm and customize it for the feature selection problem. It is worth mentioning that the highest classification performance results were obtained on Ds 5. This is due to the large number of experiments required from a small number of patients to identify Parkinson's disease. In the following section, Sine chaotic map (CHHO6) has been selected for further investigation along with other states of the art algorithms to verify the performance of the CHHO in the feature selection problem.

2) COMPARISON OF THE PROPOSED CHHO WITH STATE OF THE ART ALGORITHMS
The second experiment in this study includes the comparison of CHHO performance with other optimization algorithms. The baseline algorithms are GOA, GA, PSO, BOA, ALO, and HHO. The parameter settings for all algorithms are shown in Table 4 were the maximum iterations and the search-agents set to 50 and 10, respectively, for all algorithms. The performance score was calculated based on 20 runs. Table 7. shows the number of selected features in all evaluated algorithms.  It is observed that the CHHO achieved the best results of selected features in 11 datasets, while HHO succeeded in two datasets and GA in one dataset. In terms of classification accuracy presented in Table 8, it is observed that the CHHO obtained the best results in most of the cases. Still, it gave similar classification accuracy to the HHO in four datasets. Also, PSO provided similar classification accuracy in two datasets. In the second place, PSO and HHO obtained similar classification accuracy in most cases, and GA comes in third place. However, in Table 9 CHHO algorithm outperformed all other algorithms in terms of fitness value, considering it attained the minimum classification error among all the algorithms.
Graphical representation of the convergence-curves was also considered to evaluate the convergence speed of CHHO on 14 benchmark datasets as displayed in Figure 2. From Figure 2, it is observed that the CHHO algorithm achieved higher performance results on 13 datasets while it is comparable with standard HHO in Ds5. Also, it is observed that the performance of HHO is comparable with the PSO in most cases where the ALO achieved the worst convergence speed. PSO algorithm is considered to be the second efficient method after HHO in all benchmark datasets. In other words, CHHO has a higher converged rate and lower classification error than the different competing algorithms. This superiority came from the improvement stated in the initialization VOLUME 8, 2020 and exploitation phases. The enhanced population diversity in the initialization phase leads to accelerate the convergence speed. Also, the enhanced in the exploitation phase provided high fitness value. These superiority results are a clue of the higher algorithm capability to avoid the local optima problem and solve the problem of feature selection.
Precisely, the proposed CHHO framework succeeded in balancing the search process among the exploration and exploitation over the search iterations.

D. THE LIMITATIONS OF CHHO ALGORITHM
The proposed CHHO is a promising algorithm that can solve high dimensional and complex optimization problems. CHHO improved the standard HHO in different aspects, such as the reduction of selected features, increasing classification accuracy, and fitness values. However, similar to other optimization algorithms, CHHO also has some limitations. The primary limitation is that it is comparatively time-consuming in comparison to different algorithms. However, the reason for the time-consumption is the computational complexity of the standard HHO, not because of the proposed improvements. Also, we believe that the time-consumption could be decreased if we reduced ten iterations of SA.

VI. CONCLUSION AND FUTURE DIRECTIONS
In this study, an improved CHHO algorithm is proposed by including chaotic maps to the HHO algorithm at the initialization phase and including the SA algorithm to the exploitation phase. Ten different chaotic maps were tested to determine the best compatible choice with the HHO algorithm to enhance the population diversity and improve the convergence speed. Furthermore, the SA algorithm was employed to improve the exploitation phase, which avoids the local optima problem. The proposed framework CHHO was applied for the feature selection problem. Fourteen medical benchmark datasets from the UCI machine learning repository were selected for the experiments along with Five evaluation criteria. These criteria are the number of selected features, classification accuracy, fitness value, P-value, and convergence speed.
Additionally, the performance of the CHHO was compared with other recent and famous optimization algorithms. These algorithms are GOA, GA, PSO, BOA, ALO, and original HHO. The experimental and evaluation results demonstrated the superiority of the CHHO in comparison with other optimization algorithms in all metrics. Moreover, the results showed that the CHHO with the Sine map could significantly improve the performance of the standard HHO in terms of classification performance, the number of selected features, and convergence rates. Also, the results showed that applying the SA algorithm in the exploitation phase enhanced local search. The modifications achieved a balanced search behavior and suggested that the proposed framework is convenient for medical applications. For future research, it could be attractive to investigate the performance of the proposed CHHO algorithm on more sophisticated science and engineering problems and further enhance its complexity without affecting the current performance.
MOHAMMAD TUBISHAT received the B.Sc. degree in computer science and the M.Sc. degree in computer and information sciences from Yarmouk University, in 2002 and 2004, respectively, and the Ph.D. degree in computer science (artificial intelligence-natural language processing) from the University of Malaya, in 2019. He is currently working as a Lecturer with the Asia Pacific University of Technology and Innovation. His research interests include natural language processing, data mining, artificial intelligence, machine learning, optimization algorithms, data science, and sentiment analysis. SEYEDALI MIRJALILI (Senior Member, IEEE) is currently an Associate Professor with the Centre for Artificial Intelligence Research and Optimization, Torrens University Australia. He is internationally recognized for his advances in swarm intelligence and optimization, including the first set of algorithms from a synthetic intelligence standpoint, a radical departure from how natural systems are typically understood, and a systematic design framework to reliably benchmark, evaluate, and propose computationally cheap robust optimization algorithms. He has published over 200 publications with over 20 000 citations and an H-index of 50. His research interests include robust optimization, machine learning, multi-objective optimization, swarm intelligence, evolutionary algorithms, artificial neural networks, and applied optimization. As one of the most cited researcher in artificial intelligence, he is in the list of 1% highly-cited researchers and named as one of the most influential researchers in the world by Web of Science. He is an Associate Editor of several journals, including Neurocomputing, Applied Soft Computing, Advances in Engineering Software, Applied Intelligence, PLOS One, and IEEE ACCESS.