Improved Binary Sailfish Optimizer Based on Adaptive β-Hill Climbing for Feature Selection

Feature selection (FS), an important pre-processing step in the fields of machine learning and data mining, has immense impact on the outcome of the corresponding learning models. Basically, it aims to remove all possible irrelevant as well as redundant features from a feature vector, thereby enhancing the performance of the overall prediction or classification model. Over the years, meta-heuristic optimization techniques have been applied for FS, as these are able to overcome the limitations of traditional optimization approaches. In this work, we introduce a binary variant of the recently-proposed Sailfish Optimizer (SFO), named as Binary Sailfish (BSF) optimizer, to solve FS problems. Sigmoid transfer function is utilized here to map the continuous search space of SFO to a binary one. In order to improve the exploitation ability of the BSF optimizer, we amalgamate another recently proposed meta-heuristic algorithm, <italic>namely</italic> adaptive <inline-formula> <tex-math notation="LaTeX">$\beta $ </tex-math></inline-formula>-hill climbing (<inline-formula> <tex-math notation="LaTeX">$\text{A}\beta $ </tex-math></inline-formula>HC) with BSF optimizer. The proposed BSF and <inline-formula> <tex-math notation="LaTeX">$\text{A}\beta $ </tex-math></inline-formula>BSF algorithms are applied on 18 standard UCI datasets and compared with 10 state-of-the-art meta-heuristic FS methods. The results demonstrate the superiority of both BSF and <inline-formula> <tex-math notation="LaTeX">$\text{A}\beta $ </tex-math></inline-formula>BSF algorithms in solving FS problems. The source code of this work is available in <uri>https://github.com/Rangerix/MetaheuristicOptimization</uri>.


I. INTRODUCTION
With the recent advancements of computing devices, huge amount of data has become available in the domain of image processing, pattern recognition, financial analysis, business management, and medical studies [1], [2] amongst others. As a consequence, data dimensionality has increased a lot, which has huge impact on the performance of machine learning and data mining algorithms, both in terms of time and storage requirements. However, not all the attributes or features are important for the corresponding learning model. In this context, feature selection (FS) method is used as a data pre-processing step which helps remove all such irrelevant and redundant features [3], and thereby reduces required processing time and storage space. This, in turn, increases the overall classification (or prediction) The associate editor coordinating the review of this manuscript and approving it for publication was Ioannis Schizas . accuracy of the corresponding machine learning or data mining algorithms [4]. Depending on the evaluation criteria of features, FS techniques are divided in two categories [3]: filter and wrapper. Filter method evaluates features based on pre-defined criteria, (e.g., Information Gain [5], ReliefF [6], Chi-square [7], Fisher Score [8], Laplacian score [9] etc.), and thereby selects the most important features according to that criteria. On the other hand, wrapper method uses a learning algorithm to evaluate feature subsets and thereby selects the optimum feature subset for the corresponding task [10]. Due to non-requirement of learning algorithms, filter methods perform faster than wrapper methods but wrapper methods, in general, achieve higher accuracy [11].
In the last decade, meta-heuristic algorithms have become quite popular in solving various optimization problems due to their ability to avoid local optima, non-derivative mechanism, and flexibility [12]. Two major characteristics of a meta-heuristic algorithm are [10]: exploration or diversification, which is the ability to search the whole solution space for best solution in each iteration avoiding local optima, and exploitation or intensification, which implies finding a better solution in the neighborhood of the obtained solution, leading to faster convergence. A good meta-heuristic algorithm tries to balance between exploration and exploitation.
The presence of a significant number of meta-heuristic and hybrid meta-heuristic FS strategies clearly brings up the issue about presenting another hybrid meta-heuristic FS algorithm. In any case, as indicated by No Free Lunch theorem [13] for optimization, there cannot be any single algorithm that can take care of all the improvement issues. With each new algorithm following any regular or natural phenomena, researchers primarily focus to give some new facet to the algorithm where both exploration and exploitation will have a superior trade-off, so that it ultimately gets away from the local optima and compasses to the global optima. Nevertheless, accomplishing these objectives are not simple, explicitly in the event that one needs to propose an algorithm that can be applicable to different domains. This is the motivation to the researchers to formulate better methods over the past which, thus, keeps the research area alive. For a specific problem, in order to discover the best algorithm, the No Free Lunch theorem ought to advise us that we have to concentrate on the particular problem at hand, the hypotheses, the priors (additional data), the information and the cost.
While considering an optimization problem, the multimodal functions suffer from huge dimensions and finding an ideal value for every dimension at the same time is almost next-to-impossible. This is the reason why researchers attempt to take care of these sorts of issues utilizing some meta-heuristic strategies where the aim is to get an optimal solution within a reasonable amount of time. As FS is considered as an optimization problem [10], there may exist numerous optimal subsets i.e., having same dimension and same precision. Here likewise, it would be extremely hard to discover an optimal feature subset keeping in mind the extra storage space and running time alongside the performance of the machine learning algorithm. In this regard, research is still going on in order to meet these prerequisites. This has inspired us to propose a hybrid meta-heuristic FS method based on Sailfish Optimizer (SFO) [14] algorithm. SFO is proposed by mimicking the behavior of sailfish group hunting that follows attack-alternation strategy to hunt a school of sardine. In this paper, we have hybridized the binary variant of SFO, known as Binary Sailfish (BSF) optimizer with another recently proposed meta-heuristic algorithm, adaptive β-hill climbing (AβHC) [15]. Generally, two different models are followed for hybridizing meta-heuristic algorithms [16]: low level and high level. In low level hybridization, a function in a meta-heuristic is replaced by another meta-heuristic. In its high level variant, the candidate meta-heuristics are executed in sequence. We have hybridized BSF and AβHC in a high level fashion, following the pipeline model, where each meta-heuristic optimization algorithm works on the output of previous optimization algorithm. To the best of our knowledge, this is the first time BSF is used in FS where it is hybridized with AβHC algorithm. In a nutshell, the main contributions of this work are as follows: • A new FS method known as BSF optimizer is introduced using the recently proposed meta-heuristic SFO.
• The newly proposed BSF optimizer is hybridized with another recently proposed meta-heuristic called adaptive β-hill climbing (AβHC) algorithm.
• The proposed hybrid FS approach is evaluated on 18 standard UCI datasets [17] using K-nearest Neighbors (KNN) classifier.
• The proposed FS approach is compared with 10 stateof-the-art meta-heuristic based FS methods.
The rest of the paper is organized as follows: Section II provides a brief review of meta-heuristic FS methods present in the literature. Section III provides detailed description of the proposed FS method. The results obtained by the proposed method are reported in Section IV. In Section V, the proposed method is compared with 10 state-of-the-art meta-heuristic and hybrid meta-heuristic FS methods. Lastly, Section VI concludes this work, discusses its limitations and provides direction of possible future work.

II. LITERATURE SURVEY
Recently, the field of optimization has gained much attention from researchers, especially the field of hybrid metaheuristics. Meta-heuristic is a high-level procedure [18] designed to find, generate, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimization problem. Meta-heuristic algorithms can be divided into different categories: single solution based and population based [19], nature inspired and non-nature inspired [20], metaphor based and non-metaphor based [21] etc. From the 'inspiration' point of view, these algorithms can roughly be divided into four categories [22]: Evolutionary, Swarm inspired, Physics based, and Human related.
• Evolutionary algorithms are basically inspired from biology. It utilizes crossover and mutation operators to evolve the initial random population over the iterations and eliminates the worst solutions to obtain improved solution. Genetic algorithm (GA) [23] is a well-known method of this category which follows the Darwin's theory of evolution. Co-evolving algorithm [24], Cultural algorithm [25], Genetic programming [26], Grammatical evolution [27], Bio-geography based optimizer [28], Stochastic fractal search [29] etc. are some well-known evolutionary algorithms.
• Swarm inspired algorithms imitate individual and social behavior of swarms, herds, schools, teams or any group of animals. Every individual has its own behavior, but the behavior of the accumulated individuals helps to solve complex optimization problems. One of the most popular algorithms of this category is Particle swarm optimization (PSO) [30], developed VOLUME 8, 2020 by following the behavior of flock of birds. Another notable method of this category is Ant colony optimizer (ACO) [31], inspired from the foraging method of some ant species. Some other methods belonging to this category are: Shuffled frog leaping algorithm [32], Bacterial foraging [33], Artificial bee colony (ABC) [34], Firefly algorithm [35], Grey wolf optimizer (GWO) [12], Crow search algorithm [36], Whale optimization algorithm (WOA) [37], Grasshopper optimization algorithm [38], Squirrel search algorithm [39] etc.
• Physics based algorithms are inspired by the rules governing a physical process. The inspiring physical process ranges from music, metallurgy to mathematics, physics, chemistry, and complex dynamic systems. One of the oldest algorithms of this category is: Simulated annealing (SA) [40], developed by following the annealing [41] process of metals studied in metallurgy and materials sciences. Another popular method of this category is Gravitational search algorithm (GSA) [42], developed by following gravity and mass interaction. Some other methods of this category are: Self propelled particles [43], Harmony search algorithm [44], Black hole optimization [45], Sine cosine algorithm [46], Multi-verse optimizer [47], Find-Fix-Finish-Exploit-Analyze [48] etc.
• Human related algorithms find global optima by following human behavior. Teaching-Learning-Based optimization [49] is one such popular method belonging to this category, developed by following the enhancing procedure of class grade. Some other methods of this category are: Society and civilization [50], League championship algorithm [51], Fireworks algorithm [52], Tug of war optimization [53], Volleyball premier league algorithm [54]. FS, however, is a binary optimization problem and most of the above mentioned optimization algorithms have been applied for solving FS problems. Different applications of GA for FS can be found in [55]- [58]. PSO based FS methods can be found in [59]- [61]. ACO and GSA based FS methods can be found in [62] and [42] respectively.
In recent times, hybrid meta-heuristic algorithms have been used in a large margin for solving FS problems. These algorithms are reported to achieve better performance in various real-life problems [63]. In [64], the first hybrid meta-heuristic algorithm is proposed for FS by combining GA with local search algorithm. The hybrid of Markov chain and SA is proposed in [65]. Memetic algorithm and Late acceptance hill climbing have been hybridized and used for FS for facial emotion recognition [66]. Spotted hyena optimizer is combined with SA and used for FS on UCI datasets in [3]. Hybrid of GA and SA has been used for FS on UCI datasets in [67]. In [68], Salp swarm algorithm is hybridized with Opposition based learning (OBL) and a local search method which are applied on UCI datasets for FS. In [69], GA and PSO have been hybridized for FS and applied on Digital mammogram datasets. In [70], the hybrid of GWO and PSO has been applied on UCI datasets for FS. Hybridization of PSO with GSA can be found in [71]. Hybrid combination of ACO and GA has been proposed in [72]. In [73], a hybrid version of Differential Evolution (DE) and ABC for FS has been proposed and applied on UCI datasets.
The recently developed SFO is inspired by a group of hunting sailfish. This method is previously applied on 20 uni-modal and multi-modal mathematical functions. It is also applied for solving five engineering design problems such as Circular antenna array design problem, Gear train design problem, I-beam design problem, Three-bar truss design problem, and Welded beam design problem.

III. PRESENT WORK
A. SAILFISH OPTIMIZER: AN OVERVIEW SFO [14] is a population based meta-heuristic algorithm, which is inspired from the attack-alternation strategy of a group of hunting sailfishes which hunts a school of sardines. This hunting strategy gives an upper hand to the hunters by providing them the chance of saving their energy. It considers two populations: sailfish population and sardine population. The sailfishes are considered as the candidate solutions and the problem's variables are positions of sailfishes in the search space. The algorithm tries to randomize the movement of search agents (both sailfish and sardine) as much as possible. Sailfishes are considered to be scattered in the search space, whereas the positions of sardines help to find the best solution in the search space.
The sailfish with best fitness value is called 'elite' sailfish and its position at i th iteration is given by P i SlfBest . In case of sardine, the 'injured' is the one with the best fitness value and its position at i th iteration is given by P i SrdInjured . At each iteration, the positions of sardines and sailfishes are updated. At i + 1 th iteration, the new position P i+1 Slf of a sailfish is updated using 'elite' sailfish and 'injured' sardine as per Equation 1.
where P i Slf is the previous position of the Slf th sailfish, rnd is a random number between 0 and 1 and µ i is a coefficient which is generated as per Equation 2.
where PrD is prey density, which indicates the number of preys at each iteration. At each iteration, the value of PrD, calculated by Equation 3, decreases as the number of prey decreases during group hunting.
where Num Slf and Num Srd are the number of sailfishes and sardines respectively.
where Prcnt denotes the percentage of sardine population that forms the initial sailfish population. Initial number of sardines is always considered to be larger than number of sailfishes. The sardine positions are updated in each iteration as given by Equation 5.
where P i Srd and P i+1 Srd denote the previous and updated positions of the sardine respectively and ATK represents the sailfish's attack power at iteration itr. Now, the number of sardines that updates their positions and the amount of displacement depend upon ATK . Reducing the ATK assists the convergence of search agents. Using the parameter ATK , the number of sardines that updates their position (γ ) and the number of variables of them (δ) are calculated as follows: where v is the number of variables and Num Srd is the number of sardines. If any of the sardines becomes fitter than any sailfish, the sailfish updates its position following this sardine, and the sardine is eliminated from its population. Random selection of sailfishes and sardines guarantees the exploration of search space. As the attack power of sailfishes decreases after every iteration, it provides a chance to a sardine to escape from the best sailfish, which assists in exploitation. The ATK parameter tries to find a balance between exploration and exploitation.

B. ADAPTIVE β-HILL CLIMBING
AβHC [15] is a recently proposed meta-heuristic algorithm, an adaptive version of the βHC [74], which is, in fact, an improved version of Hill climbing (HC) algorithm. HC algorithm is a simplest form of local search method. But it often gets stuck in local optima. To overcome this limitation, βHC is proposed. Given a solution R = (r 1 , r 2 , . . . , r D ), βHC iteratively generates an improved solution R = (r 1 , r 2 , . . . , r D ) based on two operators: N − operator (Neighborhood operator) and β-operator. The N − operator randomly chooses a neighbor R = (r 1 , r 2 , . . . , r D ) of the solution R, which is defined in Equation 9 as: where i is chosen randomly in the range [1, D], D is the dimension of the problem, N denotes the highest possible distance between current solution and its neighbor. β − operator is inspired from the uniform mutation operator of GA. New solution is assigned values either from the current solution or randomly from the corresponding range with probability value β ∈ [0, 1].
where rnd is a random number, rnd ∈ [0, 1], r r is another random number within the range of that particular dimension of the problem in consideration. Now, the outcome of this βHC largely depends on the chosen values of N and β. Setting values for these two parameters requires exhaustive experiments. To avoid this overhead, AβHC is proposed. In AβHC, N and β are expressed as a function of iteration number. N (t) is the value of N in t th iteration. N (t) is defined as Equation 11 following the work presented in [47].
where K is a constant, MaxIter is maximum number of iterations.
The value of β in t th iteration is denoted as β(t). As per [75], it is adapted within a specific range [β min , β max ] and defined as Equation 12.
Now, if the generated neighbor R is better than the solution in consideration R, R is replaced with R . To map the continuous search space of the standard SFO to a binary one, we use a transfer function [76]. We have used Sigmoid transfer function, depicted in Figure 1 and expressed by Equation 13.
Now, using the probability values generated by Equation 13, the current position of the Sailfish will be updated as per Equation 14.
Generally, FS is a multi-objective problem, with two objectives: (a) to achieve highest classification accuracy (i.e. maximization problem), and (b) to select lowest number of features (i.e. minimization problem). Now, these two VOLUME 8, 2020 objectives are contradictory in nature. To eliminate this contradiction, we have considered classification error rate. Using Equation 15, these two objectives are combined and the FS problem is converted to a single objective problem.
where S represents the selected feature subset, |S| represents cardinality of the selected feature subset or number of selected features, γ (S) represents classification error rate of S, D is the original dimension of the dataset and ω ∈ [0, 1] represents weight. Now, in SFO, the exploration is taken care of [14] by using random initialization of the sailfish and sardine population, and encircling strategy following hyper-sphere neighborhood. Exploitation is taken care of by using sardine population and movement of sardines around best sailfish and sardine. The parameter, ATK (defined in Equation 6) tries to find a balance between exploration and exploitation capabilities of the algorithm. Now, to find the optmial feature subset, the FS technique needs to find the global optima which requires proper exploration and exploitation of the search space. So, we have used AβHC to enhance both the exploration and exploitation abilities of BSF.
The analysis of AβBSF shows that, its worst case time complexity is O(maxIter×(N srd ×t fitness +D)), where maxIter is the maximum number of iterations, N srd is the number of sardines, t fitness is the time requirement for calculating the fitness value of a particular agent using a given classifier and D is the dataset dimension.

IV. EXPERIMENTAL RESULTS
We have used KNN classifier [77] with Euclidean distance metric to measure the classification accuracy of the selected feature subset obtained by applying the proposed FS method on the entire (i.e. original) dataset. As per the recommendations provided in [10], [78], [79], we have set K = 5. For each dataset, 80% of the instances are used for training the model and rest 20% are used for testing. We have applied the FS methods on the train data, and determined which features are to be included in the selected feature subset. From test data, only those features are selected and then classification accuracy is measured based on test data using KNN classifier. Proposed method is implemented using Python3 [80] and graphs are plotted using Matplotlib [81].

A. DATASET DESCRIPTION
In order to assess the performance of BSF and AβBSF, 18 UCI datasets [17] have been considered. The datasets are selected from various backgrounds. The description of these datasets is presented in Table 1. Table 1 shows that there are 15 bi-class and 3 multi-class datasets. These datasets are diverse in terms of both number of attributes (features) and instances. These variances help in establishing the robustness of the proposed methods.

B. PARAMETER TUNING
There are two parameters which are always very important for any multi-agent evolutionary algorithm: population size and number of maximum iterations. Population size characterizes how a single agent learns from other agents' experiences, whereas iterations provide step-wise evolution of the agents. In order to find the proper values for these two parameters, experiments have been performed by varying one parameter w.r.t. the other.
We have experimented with different values of A, κ and Prcnt mentioned in Equation 6 and Equation 4. Both A and κ are responsible for the decrease in attack power. ATK decreasing linearly from A to 0, intends to lead the method from exploration towards exploitation. As already mentioned in [14], we have also set A = 4 and κ = 0.001 through thorough experimentation. In case of Prcnt, mentioned in Equation 4, decrease in value implies more Sardines (larger N srd ), which in turn implies increased exploration. Hence, with decrease in Prcnt, the classification accuracy improves. But, as per the computational complexity of AβBSF mentioned in Section III-C, the time requirement also increases with the increase of N srd . So, in order to maintain a trade-off between the obtained classification accuracy and time requirement, we have decided to set the value of Prcnt = 0.1. Figure 2 shows the effect of the size of the population on achieved classification accuracy using the proposed methods. Considering that the time requirement increases with increase in population size and the effect of the same on classification accuracy, we have decided to use population size as 20 for all the further experiments. Figure 3 shows the value of the fitness function in each iteration.

C. DISCUSSION
In this section, we have discussed about the results of the proposed BSF and AβSF methods for the datasets mentioned in Section IV-A. From Table 2, it is quite evident that the proposed methods have performed quite well for FS. BSF has achieved ≥ 90% accuracy in case of 11 (61.1%) datasets, whereas for AβSF,, that count is 15 (83.33%). AβSF has achieved 100% accuracy for 10 (55.5%) datasets such as Breastcancer, BreastEW, CongressEW, Lymphography, M-of-n, PenglungEW, Vote, WineEW, and Zoo, which is quite impressive. For KrvskpEW dataset, it has achieved 99.06% accuracy. Now, observing Table 2, we can also conclude that AβHC significantly helps the BSF algorithm to explore different parts of the search space and achieve better solution. In case of some datasets like Exactly, HeartEW, M-of-n, PenglungEW, SonarEW, and SpectEW, the classification accuracies have improved by a significant margin (> 9%).

V. COMPARISON
To check the applicability of the proposed methods, we have compared it with 10 state-of-the-art methods: 4 popular meta-heuristic FS methods: GA, PSO, Ant Lion optimizer (ALO), and GSA and 6 hybrid meta-heuristic FS methods. GWO and WOA are hybridized following three different strategies [78]: serial grey-whale optimizer (HSGW), random switching grey-whale optimizer (RSGW), and adaptive switching grey-whale optimizer (ASGW). WOASAT-2 [10] is hybrid of WOA and SA. BGWOPSO [70] is developed by hybridizing GWO and PSO. In WOA-CM [82], the performance of WOA is enhanced by using crossover and mutation. The values of the control parameters of these methods are described in Table 3. Table 4 shows the performance of AβHC in terms of achieved classification accuracy. For each dataset, the methods are ranked as per their corresponding achieved classification accuracies. The Average rank is obtained by taking average over the utilized 18 UCI datasets. Assigned rank indicates the rank assigned to the methods as per their obtained average rank. From Table 4, it can be observed that AβHC performs the best in 16 cases (88.9%), which is quite high. In case of Exactly2 dataset, it performs as the 2nd best following HSGW, whereas for SonarEW dataset, it performs as the 3rd best following BGA and RSGW.
It is worth mentioning that AβHC outperforms BGA completely in 15 cases, and ties in 2 cases. For SonarEW dataset, BGA performs better than AβHC. AβHC outperforms PSO in 16 cases with ties in 2 cases. AβHC completely outperforms both BALO and BGSA methods for all the 18 datasets, which is quite impressive.
The HSGW method outperforms AβHC in case of Exactly2 dataset. Both HSGW and AβHC methods ties in 4 cases. However, AβHC outperforms HSGW method in 13 cases. With respect to RSGW method, AβHC has 4 ties, 13 wins and 1 loss. The ASGW method is unable to outperform AβHC for any of the datasets. In comparison with RSGW method, AβHC has 5 ties and 13 wins. With respect to WOASAT2 method, AβHC has 0 loss, 2 ties and 16 wins. Again, the BGWOPSO method fails to outperform AβHC in any case. Moreover, both AβHC and BGWOPSO methods tie in 4 cases whereas in 14 cases, AβHC algorithm outperforms BGWOPSO. Both the methods namely WOA-CM and AβHC tie in case of Exactly dataset and for the rest of 17 datasets, the proposed AβHC algorithm wins. Table 5 shows the performance of AβHC w.r.t. number of selected features. Average rank is calculated by averaging the ranks obtained for each dataset based on the number of features selected. AβHC has selected the lowest number of features in case of 6 datasets. For SonarEW dataset, though AβHC algorithm fails to achieve the highest classification accuracy, it has selected lowest number of features and achieved 3rd best accuracy. So, considering both Table 4 and Table 5, we can say that AβHC performs best with respect to the state-of-the-art methods considered here for comparison. Figure 4 shows the average classification accuracies achieved by AβBSF and 10 state-of-the-art methods used here for comparison. It clearly shows that AβBSF has    achieved highest average classification accuracy as compared to state-of-the-art methods. Figure 5 shows the average number of features selected by AβBSF and the 10 state-of-the-art methods used. Now, from Figure 5, it can also be observed that the AβBSF algorithm has selected second lowest number of features.
To determine the statistical significance of AβBSF algorithm, Wilcoxon rank-sum test [83] has been performed. It is a non-parametric statistical test where pairwise comparison  is performed. Here, the null hypothesis states that the two sets of results have same distribution. This implies that if the distribution of two results are statistically different, then the   generated p-value from the test statistics will be < 0.05, when the test is performed at 0.05% significance level. This will result in the rejection of the null hypothesis. From the test results provided in Table 6, we can conclude that the results of the proposed AβBSF algorithm is found to be statistically significant.
A meta-heuristic algorithm can fail to find the optimal subset if (i) it cannot find the 'promising' area where the optimal solution (global optima) may lie, and gets stuck in local optima, or (ii) it fails to properly search the promising areas discovered, and cannot converge, or (iii) both. We have tried to address both the issues in the proposed method. The proposed AβBSF method uses an encircling strategy [14], that provides a hyper-sphere neighborhood around the solutions. This ensures that all the possible solutions are examined properly, both in terms of exploration and exploitation: the two main aspects of any meta-heuristic algorithm.
In terms of the factors present in the proposed methods, exploration is controlled by sardine population and their diverse movement, and bandwidth of neighborhood (N ). VOLUME 8, 2020 Exploitation is mainly taken care by the movement of sailfishes around best sardine, sailfish's attack power and, random-walk strategy in the β-operator.

VI. CONCLUSION
In this work, we have proposed binary version of SFO to solve FS problems. To convert continuous SFO to BSF (which is its binary variant), we have used the Sigmoid transfer function. Besides, we have hybridized SFO with AβHC algorithm and proposed a binary hybrid meta-heuristic AβBSF for FS. Both the proposed BSF and AβBSF methods are applied on 18 standard UCI datasets. From the obtained results and comparison with 10 state-of-the-art meta-heuristic FS approaches, we can conclude that the proposed AβBSF performs significantly well for solving FS problems. Now, the proper functioning of the AβBSF algorithm depends on the parameters of AβHC and SFO. The optimal values of these parameters for a different set of problems may be completely different, which would require exhaustive experiments to determine. Therefore, it can be considered as a limitation of the proposed method. Also, as per No Free Lunch theorem [13], just like any other meta-heuristic optimization algorithm, the AβBSF algorithm is not guaranteed to produce the best results for all FS problems. For future studies, we can apply the AβBSF algorithm to different real world problems like facial emotion recognition, handwriting recognition, script recognition, etc. It would be interesting to hybridize BSF with other meta-heuristic algorithms as well as hybridize AβHC with other meta-heuristic algorithms.