BinHOA: Efficient Binary Horse Herd Optimization Method for Feature Selection: Analysis and Validations

In the domains of data mining and machine learning, feature selection (FS) is an essential preprocessing step that has a significant effect on the machine learning model’s performance. The primary purpose of FS is to eliminate unnecessary features, resulting in time-space reduction as well as improved the corresponding learning model performance. Horse herd optimization algorithm (HOA) is a new metaheuristic algorithm that mimics the herding behavior of horses. Within a wrapper-based approach, a binary version of HOA is proposed in this study to select the optimal subset of features for classification purposes. The transfer function is the most important aspect of the binary version. Eight transfer functions, S-shaped and V-shaped, are tested to map the continuous search space into binary search space. Two main enhancements are integrated into the standard HOA to strengthen its performance. A Levy flight operator is added to improve the HOA’s exploring behavior and alleviate local minimal stagnation. Secondly, a local search algorithm is integrated to enhance the best solution obtained after each iteration of HOA. The purpose of the second enhancement is to increase the exploitation capability by looking for the most promising places discovered by HOA. Large-scaled, middle-scaled, and low-scaled datasets from reputable data repositories are used to validate the performance of the proposed algorithm (BinHOA). Comparative tests with state-of-the-art algorithms reveal that the Levy flight with the local search algorithm have a significant favorable impact on the performance of HOA. An enhancement of the population diversity is observed with avoidance of being trapped in local optima.


I. INTRODUCTION
Data mining is one of the fastest-growing subfields of information technology, owing to the vast amounts of data collected on a daily basis and the need to convert this data into meaningful knowledge [1]. The process of converting raw data into a comprehensible form is known as data preprocessing. Data preprocessing is a crucial aspect of the data mining [2]. Feature selection (FS) is one of the most important data preprocessing procedures, and it generates robust models by selecting the most informative features from a certain dataset and deleting features that are irrelevant or redundant.
Generally speaking, FS approaches are mainly classified into wrapper or filter approaches. In wrapper methods, the evaluation step of the feature subset quality is based on the performance of the classification algorithm [3]. On the other hand, filter methods are independent of any kind of learning model, while statistical methodologies are used to select and rank features [4]. In literature, wrapper-based techniques for classification algorithms usually outperform filter-based methods [5]. When using wrapper-based methods, three crucial items must often be specified: (i) classifiers such as k nearest neighbor (kNN) [6], support vector machines (SVMs) [7], etc., (ii) criteria for evaluating feature subsets, and (iii) the search algorithm for selecting a subset with the best features [8].
The interactions and connections between features have made FS one of the most challenging and computationally expensive processes. The interaction between features can be two-way, three-way, or indeed, comprise several features. When a feature is used individually, it may not have a significant impact on the target, but when paired with additional features, the effect might be accentuated. Furthermore, a feature that seems to be beneficial by itself may become useless when paired with other features. Another challenge is exploring a large search space 2 n , where n is the total number of features.
In contrast to exact search algorithms [9]- [11], metaheuristics appear to be trustworthy and effective techniques for tackling numerous optimization problems [12]. In fact, exact methods guarantee finding the best solution to a problem that has adequate time and memory. However, they are inefficient when dealing with problems of high computational complexity. In an attempt to overcome the FS problem, wrapper techniques based on metaheuristics have demonstrated their productivity and efficiency [13]. The goal of using metaheuristic algorithms to solve FS problems is to give a solution that is close to the optimal solution in an acceptable time frame [14]. To effectively explore the search space, metaheuristics start the optimization process by generating random solutions, as they have a stochastic nature. Metaheuristics are easily adaptable to a specific problem due to its basic premise and ease of implementation. The ability of these algorithms to prevent algorithms from converging prematurely is their hallmark.
Exploration (global search) and exploitation (local search) are the two fundamental aspects of the optimization process in metaheuristic algorithms. In exploration, updating solutions results in significant changes, allowing more regions to be examined and diverse solutions to be found. On the other hand, exploitation focuses on the current solution's possible neighborhood space and looks for superior alternatives. There is no certainty that a local search will yield a global optimum. The optimization process is improved by establishing a balance between exploration and exploitation. This is due to the fact that over-exploration results in the loss of the best solution. On the contrary, over-exploitation causes premature convergence and trapping in local minima. Researchers presented various modification tactics to alleviate the shortcomings of metaheuristics and improve their efficacy in solving the FS problem. The modification tactics can be classified into: new initialization approaches [15], [16], parallelism [17], new update approaches [18], [19], new operators [20], and hybridization [21].
Basically, metaheuristics can be classified into four types based on the source of inspiration: (i) evolutionary algorithms [22], (ii) physics-based methods [23], (iii) humanbased methods [24], and (iv) swarm intelligence [25], [26]. Evolutionary algorithms mimic natural evolution laws and are inspired by darwinian evolutionary theory. Genetic algorithm (GA), an instance of evolutionary methods, has gained popularity as a result of its excellent performance in handling FS challenges [27]. It offers the possibility of dealing with complex optimization issues. To update a population and achieve the goal of optimization, GA involves three main steps: selection, crossover, and mutation [28]. For the FS of cancer micro-array datasets, a nested-GA structure is used in [29]. The structure includes two nested genetic algorithms, external-GA and internal-GA, and it operates on datasets of different types. The findings of nested-GA provided a high classification performance with a minimal subset of features. For text categorization, the authors in [30] employed the GA algorithm combined with chaotic optimization. Differential evolution (DE) [31], co-evolving algorithm [32], biogeography based optimizer [33], genetic programming [34], and stochastic fractal search [35] are another instances of evolutionary methods.
Natural physics principles also brought out a lot of metaheuristic methods that are used in FS problems. Representatives of this class: chemical reaction optimization (CRO) [36], lightning search algorithm (LSA) [37], henry gas solubility optimization (HGSO) [38], multi-verse optimizer (MVO) [39], electromagnetic field optimization (EFO) [40], simulated annealing (SA) [41], and gravitational search algorithm (GSA) [42]. A well-known recent contribution to physics-based methods is the equilibrium optimizer (EO) algorithm [43]. The EO algorithm has the advantage of being able to change the solution randomly with great exploration and exploitation capabilities. The authors in [44] extended the EO with an opposition-based learning method to improve its population diversity, and a local search algorithm was integrated at the end of each iteration to improve its exploitation capability. Ahmed et al. proposed an automata-based improved version of EO based on a U-shaped transfer function to solve FS problems (AIEOU) [45]. The method was compared to eight well-known methods, including classical and hybrid meta-heuristic algorithms and it was evaluated on 18 datasets with the help of kNN.
Human behavior and human interaction in society inspire human related algorithms. Some instances of this class are: league championship algorithm [46], volleyball premier league algorithm [47], society and civilization [48], and tug of war optimization [49]. Teaching-learning-based optimization is a well-known method in this class that was developed by following the class grade improvement procedure [24]. Imperialistic competitive algorithm is based on the process of human sociopolitical evolution [50]. The individuals are separated into two groups: colonies and imperialist states. The concept behind this algorithm is based on imperialists competing for control of colonies. One imperialist emerges victorious at the end of the competition, taking control of all the colonies, while the weak empires crumble.
Social behaviors of swarms, teams, herds, schools, or any group of animals are imitated by swarm intelligence algorithms. Representative approaches are: particle swarm optimization (PSO) [51], firefly algorithm (FA) [52], cuckoo search (CS) [53], grasshopper optimization algorithm (GOA) [54], grey wolf optimizer (GWO) [55], flower pollination algorithm (FPA) [56], artificial bee colony (ABC) [57], ant colony optimization (ACO) [58], and harris hawks optimization (HHO) [59]. Fong et al. in [60], has purposed a convenient approach for high-correlation feature spaces. A meta-heuristic algorithm, called swarm search, was used as a particle search to pick features. Swarm search has the advantage of being able to use any classifier as its fitness function. Crow search algorithm (CSA) is another effective method that based on imitating crow flocks' intelligent behavior [61]. A binary version of CSA called BCSA was introduced in [62]. BCSA proposed a V-shaped transfer function for mapping a continuous search space into a discrete one. However, it has a low convergence rate and is prone to local optima trapping. A hybrid crow search algorithm was proposed by Anter et al. for FS problems [63]. The authors combined chaos theory and the fuzzy c-means objective function, bringing out a new efficient algorithm denoted as CFCSA. PSO has caught the interest of numerous scholars since its inception. Xue et al. used a FS novel initialization and update methodology for PSO with the purpose of reducing the computational time, minimizing the number of features, and maximizing the classification accuracy [64].
Levy flight, named after the French mathematician Paul Levy, is a kind of random walking pattern that follows the Levy distribution [65]. The fat-tailed distribution describes the step length of Levy flight. Walking in a multidimensional space causes the directions of the steps to be random and isotropic. Several animals' searching behaviors for food resources exhibit Levy flight characteristics. They consume most of their feeding time in close proximity to a food source, and sometimes they require long-distance travel to efficiently locate the next destination [66]. Several studies have employed Levy flight to enhance the performance of metaheuristics and improve their ability to explore diverse solutions in the search space [67]- [70]. Recently, GWO has been combined with levy-flight random walk to improve the classification accuracy over a direct optimization algorithm [71]. In [72], randomizing the location of salps using Levy flight improves SSA's exploitation potential, causing the model to converge to the global optima.
Horse herd optimization algorithm (HOA) is a recent metaheuristic method that was introduced by MiarNaeimi in 2021 [73]. HOA belongs to the swarm intelligence class. It mimics the social behaviors of horses at different ages. As previously stated, metaheuristics have demonstrated a favourable impact in recent decades on FS problems. Despite all of the study, the majority of metaheuristics still face a number of issues that must be addressed. For example, a lack of diversity, trapping in local optima, and an imbalance between the algorithm's exploration and exploitation abilities. More optimization strategies are still required in order to achieve further better results. This motivated us to develop a binary version of HOA and test its applicability as a binary optimization approach for FS problems. The primary contributions of this study: 1) BinHOA: a binary improved version of HOA is introduced. The FS problem is addressed using eight different transfer functions. 2) Levy flight is integrated into HOA to provide a high level of randomization and improve the diversity of solutions.
3) Local search algorithm is integrated after each iteration of HOA to avoid being trapped in local optima through enhancing the best solution. 4) Compare the results of the proposed algorithm against nine state-of-the-art FS approaches.
The remainder of this paper is organized as follows: The original HOA is presented in section 2. Section 3 discusses the BinHOA algorithm in detail. In section 4, the experimental results are presented. The final section concludes the paper.

II. HORSE HERD OPTIMIZATION ALGORITHM: AN OVERVIEW
MiarNaeimi et al. introduced the horse herd optimization algorithm (HOA) as one of the most recent swarm-based optimization techniques [73]. Horses' herding behavior is inspired by HOA. The social behavior of horses can be divided into six categories depending on their age: grazing (G), hierarchy (H), sociability (S), imitation (I), defense mechanism (D) and roam (R). HOA, like most metaheuristic algorithms, starts the optimization process by setting up and initializing the control parameters such as the maximum number of iterations and the population size. Fig. 1 represents the flowchart of the HOA algorithm. At each iteration, the horses are moved in accordance with Eq.(1). It's worth mentioning that the horses in the swarm are ranked from best to worst in terms of fitness values for their position. The ages of the horses are selected per iteration using the following rule: α horses are in the top 10% of the sorted population, the following 20% of the horses are selected as β. γ horses are the next 30%, and the age of the last 40% is designated as δ. The following mathematical formulas are used to calculate the velocity vector of each horse based on its age: A description of the six key social skills will be listed in the following subsections.

A. GRAZING
Horses, like plant animals, rely on plants to survive. Horses graze at any age and for the rest of their lives. According to the coefficient g, horses graze around their region. Eq.(6) defines the social behavior of grazing: where G t,AGE m is the m th horse's grazing motion parameter at the t th iteration, and it depicts the horse's proclivity to graze. At each iteration loop, G t,AGE m is reduced linearly in accordance with a reduction factor denoted by ω g .l andȗ stand for the lower and the upper grazing space. In [73], the value ofl is equal to 0.95 and the value ofȗ is equal to 1.05. p is a random value between 0 and 1. Eq.(7) updates the value of g t,AGE m . At the beginning of the search process, the coefficient g is set to 1.5 for all horse ages.

B. HIERARCHY
Horses are divided into two categories in the population: leaders and slaves. Leaders are always followed by slave horses. The coefficient h represents the horses' proclivity to follow the most experienced and strongest horse: The level of the horse's hierarchy is denoted by H t,AGE m . It shows how the best horse location affects the velocity parameter. The best horse location is denoted by X t−1 best at iteration (t − 1). H t,AGE m is reduced linearly in accordance with a reduction factor denoted by ω h . At the beginning of the search process, the coefficient h is set to 0.5, 0.9, and 1.5 for horses of age α, β, and γ, respectively. Eq.(9) updates the value of h t,AGE m .

C. SOCIABILITY
Horses have a lifestyle that is comparable to that of other social animals. They band together to increase their chances of survival and make escaping from attackers easier. The coefficient s demonstrates this social behavior, which is defined as a movement toward the other horses' average position: where N is the population size. The social motion vector of the m th horse at the t th iteration is denoted by S t,AGE m that is reduced linearly in accordance with a reduction factor denoted by ω s . At iteration t, the s t,AGE m reflects the horse's interest in the herd. It is set to 0.2 and 0.1 for horses of age β and γ, respectively. Eq.(11) updates the value of s t,AGE m .

D. IMITATION
A horse has the ability to imitate the behavior of other horses. As a result, the horse is able to pick up the bad and the good actions of other horses. The following is the mathematical formula describing the social behavior of imitation: where I t,AGE m is the m th horse's motion vector in the direction of the average of the best horses with X positions. p N is the number of horses in the current population that are in the fittest positions. Whereas, the value of this parameter is recommended in [73] to be 0.1XN . Eq.(13) updates the value of i t,AGE m using a reduction factor denoted by ω i .

E. DEFENSE
The horse defense is structured in such a way that the horse escapes from other horses in the worst positions, which are far from optimal ones. As shown in Eq. (14), the coefficient d is set with a negative sign to preserve the current horse out of undesirable locations: where D t,AGE m represents the m th horse's escape vector from the average of the worst horses with X positions. q N is the number of horses in the current population that are in the worst positions. Whereas, the value of this parameter is recommended in [73] to be 0.2XN . Eq.(15) updates the value of d t,AGE m using a reduction factor denoted by ω d .

F. ROAM
In pursuit of nourishment, horses migrate from pasture to pasture. The movement from one location to another at random can be simulated as: where R t,AGE m is the m th horse's random velocity vector. P is a random value between 0 and 1. Eq.(17) updates the value of r t,AGE m using a reduction factor denoted by ω r .

III. PROPOSED METHOD
This section details the proposed BinHOA, a wrapper-based strategy to deal with the problem of FS. The main phases of BinHOA are: transformation function, Levy flight operator, local search algorithm, and evaluation. Each phase will be thoroughly discussed in the following subsections.

A. TRANSFORMATION FUNCTION
The FS problem has been modeled as a binary problem. However, the new positions of the horses that are generated by the original HOA have continuous values. In the feature subset selection problem, only 0 or 1 values can be assigned to these positions. As a result, converting from the original HOA's continuous space to a binary search space necessitates the use of a transformation function. As shown in Fig. 2, a matrix of size n × N , n is the population size and N is the number of features, represents the binary solution space in BinHOA. The 1/0 numbers indicate whether or not the corresponding feature is selected. Each row of this matrix describes a solution. The fitness value of each row is then calculated using the objective function, i.e. classification accuracy in the FS case. The fitness values are determined using kNN or SVM classifiers in the proposed method. Logistic transformation functions, called the S-shaped family, are ideal for mapping processes as they generate output in the range [0,1], which is needed to specify the probability of updating an element in a binary solution from 0 to 1 and vice versa [74]. In addition, the V-shaped family was proposed by Mirjalili et al. as a potent transformation function family that performed the same goal as the S-shaped family [75]. The slope of the transformation function is crucial in determining exploitation and exploration. When the curve of the transformation function is too steep, it contributes to bad exploration; nevertheless, when the curve is flat or less steep, it results in poor exploitation and readily slips into local minima [76].
In this study, we have investigated eight different transformation functions mentioned in Table 1 to find out which one is the most effective with HOA, as there is no transfer function that has been determined to be the best. The different transfer functions employed in this study are shown in Fig.  3. The effectiveness of these functions will be investigated later in the experimental section. To explain how to convert continuous search space into a binary one, the sigmodal FIGURE 3: S-shaped and V-shaped families function was selected and described below as an example of the S-shaped family: where x d i denotes the i th individual's position in the d th dimension at the t th iteration, x i is calculated by Eq.(1). The output of the S-shaped function is still shown in a continuous manner in Eq. (18). Thus, the i th position is updated in the following way to get the binary value: where rand denotes a random value in the range [0,1].

B. APPLYING LEVY FLIGHT TO BINHOA
Several modification tactics can be employed to increase the performance of metaheuristic algorithms. The most typical technique is to incorporate a new operator into the algorithm's structure. The imported operator, when used in conjunction with the original operators, may improve the optimization's performance. The steps of a random walk are described by Levy flight, as seen in Fig. 4, which is a heavytailed probability distribution. Figure 5 displays the difference between normal, Cauchy, and Levy distributions. Levy distribution has a fat tail, which means that the possibility values towards the tail of the curve are larger than in other distributions. After updating the position of horses using Eq.(1), Levy flight is integrated into the BinHOA structure. As a result, each updated horse is scheduled to use Levy flight once to enhance search space diversity. This is accomplished by changing the horses' positions in the search space in a significant way. Hence, more randomness will be obtained, resulting in a higher level of exploration. This allows trapped horses to escape from local minima.   20), where X t i denotes the i th horse at iteration t, rand denotes a random value in the range [0, 1], ⊕ represents the dot product, and u is a randomly generated parameter with a uniform distribution. As previously stated, Levy flight is a random walk in which the step lengths support a Levy distribution as indicated in Eq. (21). In Eq. (22), µ and ν are standard random distributions. Eq. (23) shows how to calculate φ, where β = 1.5, mentioned in [77], and Γ represents a standard Gamma function. It's worth noting that combining the uniform distribution with the Levy flight operator in one equation results in a high level of randomness, which increases the search's diversity and reduces the possibility of local minima trapping.
The best solutions are not always the best ways to achieve the optimal solution, they may bias the search process. As a result, the optimizer loses diversity, and the solutions become stuck in local minima. Thus, other solutions with lower fitness values may share their historical data and experiences gathered along the way to the global solution. This experience could be more valuable in directing the optimizer forward into the global solution.

1) Roulette wheel selection
Roulette wheel selection (RWS) is a well-known selection method. BinHOA's performance is improved by combining the RWS operator with Levy flight, as no horse will be neglected in the population. The RWS operator is used to select a random horse from the swarm of horses per iteration. The RWS is based on horses' fitness values. Hence, a horse with a high fitness value has a high probability of being selected, whereas a horse with a lower fitness value has a lesser chance of being selected as seen Fig. 6. The sum of all the horses' fitness values is calculated. After that, the roulette wheel is built with a circumference equal to the total of fitness values. Each horse is given a roulette wheel sector that is proportionate to its fitness value. The selection of a random horse is accomplished by revolving the wheel and selecting the sector that is indicated by the pointer when the wheel is stopped. After applying the RWS, the crossover operator yields a new candidate solution. Eq.(24) defines the crossover operator: where rand denotes a random value in the range [0,1], x Best is the current best solution, x RW S is the RWS solution that was obtained in the previous subsection, and the new candidate solution is denoted by x N ew . As seen in Fig. 7, the crossover operator is performed on two binary solutions per iteration, x Best and x RW S . Then, the new candidate solution's fitness value is determined. The current best solution will be updated only if the new candidate provides a higher fitness value. One of the drawbacks of selecting more features from the data is that the classifier's performance is negatively affected by the redundant and irrelevant features in different ways. Thus, the data dimensionality must be reduced. The necessity of having an efficient search strategy for the FS approaches was demonstrated in the previous talks. An additional issue with FS techniques is determining ways to evaluate the effectiveness of the selected subset. Because the suggested method is a wrapper-based approach, the evaluation process should include a learning algorithm. kNN and SVM are employed in this article [7], [78]. If a dataset has two classes, we prefer Algorithm 1 The pseudo code of BinHOA 1) Initialize the parameters such as population size (n), number of maximum iteration (max_iterations), and other parameters. 2) Initialize population X as for each horse do 6) Transform the positions of horses into binary space using a transfer function (Table 1).

7)
Use kNN or SVM classifiers to evaluate each horse in the population.

11)
Update the positions of horses using Eq. (1)  12) Levy flight is used to update the position of each horse.

13)
end for 14) x Best ← best solution. 15) x RW S ← random solution obtained using roulette wheel selection method.

16)
Determine a new candidate solution (x N ew ) by applying the crossover operator on x Best and x RW S using Eq.(25).

17)
Use kNN or SVM classifiers to evaluate x N ew .

18)
Measure the fitness value of to utilize the SVM classifier algorithm. The kNN classifier is used in all other cases. The FS problem is referred to as a multiobjective optimization problem since it requires achieving the following conflicting objectives: 1) The first goal is to reduce the number of selected features. The fewer features in the solution, the better the solution is. 2) The other objective and most significant goal is to have high classification accuracy. The computed classification accuracy will be better if the features in the selected subset are relevant. The following fitness function is employed to assess the solutions in BinHOA and achieve a balance between the two main objectives: where Error (D) denotes the classification error rate calcu-lated with a kNN or SVM classifier. α ∈ [0, 1] and β = 1 − α are the weight parameters. The importance of classification accuracy and the length of the selected feature subset are reflected by these two parameters. The number of selected features is denoted by |M |. The number of the original features is denoted by |N |. In Algorithm 1, the pseudo code of BinHOA is shown. The FS process in BinHOA is depicted in Fig. 8.

IV. EXPERIMENTS
All methods were evaluated using Matlab Software (version R2021a) running on macOS Big Sur, version 11.2.3, apple M1 processor, 3.2 GHz and 8 GB RAM machine.

A. DATASETS
Thirty benchmark datasets from the UCI data repository were selected to validate the performance of the proposed method. The datasets used in this study come from a variety of domains, including medical, social science, physics, life science, etc. The motivation for choosing these datasets is that they include a variety of instances and features that reflect a number of challenges on which the proposed approach will be evaluated. In addition, we choose a set of high-dimensional datasets to evaluate the performance of the BinHOA algorithm in high search space. Table 2 depicts a summary of the used datasets, that comprise varying amounts of classes (from 2 to 16), instances (from 32 to 9298), and attributes (from 9 to 10000).

B. PARAMETER SETTINGS
BinHOA's performance is compared to several well-known FS approaches, summarized as: Each algorithm has 20 independent runs. All tests have a population size of 10. As previously stated, the value of the current best solution will be updated by the local search algorithm if the new solution has a lower fitness than the current one. Thus, the evaluation function in each iteration is called   Table 3. The classification process is in charge of identifying the class label for a new incoming instance. The preferred classifiers in this investigation are kNN and SVM. For datasets with more than two classes, the 5-NN classifier is preferred to provide the best subset. On diverse datasets, multiple trials are undertaken to specify the best k value in kNN. To relieve overfitting, K-fold cross-validation is utilized for evaluation purposes. The concept behind k-fold cross-validation is that the dataset is divided into k subsets (folds) of nearly equal size. The classifier will be trained on k − 1 folds, and the remaining fold will be used for the test purpose. The classification percentage error rate is then calculated as a percentage of incorrect class label predictions.

C. RESULTS AND ANALYSIS
Experiments are carried out in three stages. In stage one, the effects of eight different transfer functions are studied on the original HOA, as we seek out the best transfer function that can be integrated into the proposed method. In the second stage, the suggested BinHOA is compared to the original HOA. While, a comparison between BinHOA and other competitive wrapper FS approaches is performed in the third stage. These experiments are based on three main measures:

1) Stage-I: Comparison between different variations of HOA
In this section, the performances of eight transfer functions with HOA are analyzed on the FS problem using different measures: execution time, best fitness value, mean fitness value, worst fitness value, standard deviation for the mean fitness values, average number of the selected features, and maximum accuracy value obtained.  Table 4. This table clearly shows that the S-shaped transfer function HOA-s1 is ranked first among 13 cases out of the 30 datasets (43%), followed by the V-shaped transfer function HOA-v4 that has got the best fitness in 9 cases out of 30 datasets (30%). A bold font is used to highlight the best values. Based on the standard deviation of fitness values, the HOA-v4 variation has a high rate of success for FS problems (30%).
The best and worst fitness values for the HOA versions are displayed in Table 5. This table demonstrates that the HOA-s1 variant outperforms the other variations in terms of the best fitness values. Whereas, HOA-s1 achieves higher results at a rate of 37% of the time, 11 cases out of 30 datasets. The second highest rank has been achieved by the HOA-v2 with 17%. Based on the worst fitness values, HOA-s1 has the lowest value for 9 cases out of 30 datasets (30%).   highest rank has been achieved by the HOA-s2 with 23%.
The average number of selected features of different HOA variations is listed in Table 7. HOA-s1 outperforms the other variants, as it ranks first in 27 cases out of 30 datasets (90%). Across all datasets, the average number of selected features is also determined in Fig. 10. It is easy to see that there is a significant difference between the S-shaped HOA variants and the V-shaped HOA variants. For instance, HOA-s1 has selected many fewer features than HOA-v4. In Fig. 9 (a), it can be observed that HOA-s1 shows significant differences in time spent with the high dimensional datasets. In Fig. 9 (b), the average time for running across all datasets is shown. Based on the results of the foregoing analysis, we can conclude that the transfer function HOA-s1 is the best alternative among the others. Since it gives better results in terms of the best fitness, mean fitness, worst fitness, maximum accuracy, number of features, and execution time, HOA-s1 has been chosen and identified as the most effective variation for the HOA algorithm.   HOA will affect it. Average fitness values are listed in Table  8. In terms of fitness values, this table clearly shows that BinHOA outperforms the original method for all 30 datasets. In addition, observation of this table reveals that BinHOA achieves high accuracy 100% of the time. There is a significant improvement in most of the datasets. For instance, the CNAE dataset (UCI Machine Learning Repository), a high dimensional dataset, shows a significant impact on its classification accuracy that is increased by 15%. The average number of the selected features of the two algorithms also listed in Table 8. BinHOA ranks first in 11 cases out of 30 datasets (37%). Fig. 12 presents a comparison between HOA and BinHOA in terms of the total average number of selected features, fitness value, and classification accuracy across all the datasets. The BinHOA outperforms HOA with a total average classification accuracy of 86.7% and a total average fitness value of 0.1245. BinHOA is ranked second with 272.2 features. From the observation of this figure, there is almost no difference in the number of selected features between the two algorithms. Despite the fact that HOA had the minimum average number of selected features, BinHOA is more useful since classification accuracy should take precedence over feature count. Furthermore, the fitness value is affected more by the classification accuracy than the number of features selected. Fig. 11 shows the average time for running through all of the datasets 20 times. In addition, the total improvement percentage (IP) is used to compare the performance of the two algorithms in Fig. 13. The ratio of positive change is the IP of the two algorithms. The formula for calculating IP is as follows: where full is the fitness values obtained by selecting all of the dataset's original features, see Table 4. f alg is the best fitness value, mean fitness value, or worst fitness value obtained in HOA and BinHOA. The number of datasets is denoted by m (m=30). In the comparison of selecting all the original features in each dataset, the IP of the two methods is significantly increased, as seen in Fig. 13. Furthermore, we can see that BinHOA's performance gets a higher IP than HOA.          Because of its potential to balance exploration and exploitation, as well as its capability to improve population diversity and escape from local optima, BinHOA outperformed the standard HOA algorithm. The final experiment is carried out to verify the results by comparing our suggested method to nine other well-known approaches (EO, FDA, GWO, WOA, MFO, PSO, GOA, SSA, and DA). In the comparative performance evaluation, the following metrics were used: number of features selected, fitness values, standard deviation, classification accuracy. Table 9 displays the average fitness values of the competitors' approaches and the proposed algorithm across 20 independent runs on each of 30 datasets. It can be seen that BinHOA shows the lowest fitness values in the majority of datasets compared to other metaheuristics. According to this table, BinHOA has attained the best fitness value 93% of the time, 28 cases out of the 30 datasets, while it ranked second among two datasets: Vowel and USPS after the results of EO algorithm. In addition, Fig. 14 shows the average fitness values of all optimizers averaged across all the datasets. The fitness result of BinHOA is 0.125, which is the lowest value among those of other competitors. BinHOA, as shown in Fig.  21, is ranked fourth in terms of the computed total average standard deviation for the mean fitness values, coming after the results of EO, PSO, and DA, respectively. A low standard deviation means that the dataset's fitness values are near to the mean fitness value. As a result, the worst and best fitness values are quite close to the mean value.   Vowel and USPS datasets, while the DA algorithm ranks first in Lung Cancer datasets. Fig. 15 shows the average classification accuracy of all optimizers averaged across all the datasets. In comparison to other algorithms, BinHOA has reached the best classification accuracy of 86.7%. The average number of selected features for all datasets is shown in Fig. 20. Observation of this figure shows that BinHOA is ranked fourth, following the GWO, WOA, and DA outcomes, respectively. It can be easily seen that there is no huge difference between the first-rank algorithm (GWO) and the proposed method.  Table 2). Based on the observations shown in these figures, BinHOA outperformed all other methods and showed a higher performance in convergence over the majority of the datasets. Thus, we can conclude that the BinHOA prevented premature convergence in most datasets by achieving a balance between exploitation and exploration and enhancing population diversity.
The boxplots for the 30 datasets are shown in Fig. 18, and 19 to measure the average performance of algorithms. It's worth noting that boxplots reflect the classification results and are displayed after implementing each method 20 times. From these figures we can visually see: minimum values, maximum values, median, first quartile, and third quartile of the data. The minimum values and the maximum values are represented by the underside and the top whisker, respectively. Quartile one and quartile three are represented by the bottom and the top of the rectangle, respectively. The median value is represented by the line that separates the box into two parts. Outliers are also plotted as individual points. In the majority of the datasets, BinHOA has higher median values and higher boxplots than the other methods as seen in these figures.
In addition, the Wilcoxon signed-rank test based on fitness function is performed for 14 randomly selected datasets to assess whether there is a statistically significant difference between BinHOA and the competitors. The test is conducted  Table 11 displays the results of the Wilcoxon test with a significance level of 0.05. R + refers to the number of positive ranks wherein BinHOA outperforms the competitive methods, while R − represents the number of negative ranks for which BinHOA fails to outperform the competitive methods. Ties is the number of times BinHOA and the competitors have the same rank. Sum_R+ and Sum_R-reflect the total of positive and negative ranks, respectively.
From the observations in Table 11, it can be seen that BinHOA outranks EO, FDA, GWO, WOA, MFO, PSO, GOA, SSA, and DA in terms of R + in all the 14 randomly selected datasets. For example, BinHOA outperforms the FDA algorithm in 19 independent runs out of 20 for the HeartEW dataset. In the Lung Cancer case, BinHOA outperforms DA 15 times out of 20 runs, fails to outperform DA four times, and performs similarly in one run. It is worth noting that BinHOA has consistently outperformed the competitors in all 20 runs for the datasets: IonosphereEW, Robot2, DNA, Arrhythmia, Colon cancer, and Arcene. For the datasets used in this test, we can also see that Sum R-is lower than Sum R+. The p-value in this table shows whether there is a significant difference between the suggested method and the competitors. As the p-value decreases, the evidence becomes stronger. A p-value of less than 0.05 is considered statistically significant. It means that there is substantial evidence against the null hypothesis. On most of the tested datasets, the pvalues confirm that the performance of the suggested method is statistically significant. A p-value above 0.05 is underlined in Table 11.

V. CONCLUSION
This paper introduces a binary improved version of HOA. Eight different transfer functions were put to the test with the HOA algorithm using thirty UCI benchmark datasets. According to the findings of the experiments, HOA-s1 had the best performance. The HOA-s1 approach was then compared to the most popular and high-performing methods in the literature. In addition, two main enhancements are integrated into the standard HOA to strengthen its performance. A Levy flight operator is added to improve the HOA's exploring behavior and alleviate local minimal stagnation. Secondly, a local search algorithm is integrated to enhance the best solution obtained after each iteration of HOA. According to the experimental data, the proposed method BinHOA had pretty high performance among the comparable approaches for solving FS problems. As a result, when tackling FS difficulties, BinHOA can be prioritized.
Future research could look towards proposing various binary HOA conversion types. The continuous HOA was effectively transformed into a binary form, which may be utilized to address other discrete optimization issues such as scheduling chores, traveling salesman problem, and knapsack problem.