Hybrid of Harmony Search Algorithm and Ring Theory-Based Evolutionary Algorithm for Feature Selection

Feature Selection (FS) is an important pre-processing step in the fields of machine learning and data mining, which has a major impact on the performance of the corresponding learning models. The main goal of FS is to remove the irrelevant and redundant features, resulting in optimized time and space requirements along with enhanced performance of the learning model under consideration. Many meta-heuristic optimization techniques have been applied to solve FS problems because of its superiority over the traditional optimization approaches. Here, we have introduced a new hybrid meta-heuristic FS model based on a well-known meta-heuristic Harmony Search (HS) algorithm and a recently proposed Ring Theory based Evolutionary Algorithm (RTEA), which we have named as Ring Theory based Harmony Search (RTHS). Effectiveness of RTHS has been evaluated by applying it on 18 standard UCI datasets and comparing it with 10 state-of-the-art meta-heuristic FS methods. Obtained results prove the superiority of RTHS over the state-of-the-art methods considered here for comparison.


I. INTRODUCTION
In this era of computer and technology, with the advancements in the fields of image processing, pattern recognition, financial analysis, business management, medical studies [1], [2] and many more, we have to deal with huge amount of data, whose dimensions are increasing everyday. This has a great impact on the performances of different algorithms used in the field of machine learning and data mining in terms of time and space requirements. There may be numerous features in a dataset, but not all of which are useful or important for a particular task. Feature selection (FS), a data pre-processing step, can be used to remove the irrelevant and redundant features [3], resulting in optimized time and space requirements. Basically, these redundant features act The associate editor coordinating the review of this manuscript and approving it for publication was Adnan Kavak . as noise, and removal of those results in better performing ability of the corresponding machine learning or data mining algorithm [4]. There are two different categories of FS techniques based on evaluation criteria of features [3]: Filter and Wrapper. A filter method evaluates features based on predefined mathematical or statistical criteria, (e.g., Relief [5], Information Gain [6], Laplacian Score [7], Chi-Square [8], Fisher Score [9], etc.) and selects most important features according to that. Whereas, a wrapper method uses a learning algorithm to evaluate feature subsets and selects the optimum subset for the corresponding task [10]. Filter methods are relatively faster than wrapper methods as the former do not use learning algorithm, whereas the later, in general, achieves higher accuracy [11].
In recent times, the meta-heuristic methods have become popular in solving various optimization problems due its advantages over traditional optimization methods, such as VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ avoidance of local optima, non-derivative mechanism and flexibility [12]. Two important aspects of a meta-heuristic algorithm are [13]: exploration and exploitation. Exploration means the ability to search the solution space for new potential solution in each iteration avoiding local optima, and exploitation means finding a better solution in the neighborhood of the solution obtained so far. A good meta-heuristic algorithm has a characteristic of maintaining the balance between both exploration and exploitation phases.
In this work we have introduced a new hybrid metaheuristic based on a well-known meta heuristic Harmony Search (HS) algorithm [14] and a recently proposed Ring Theory based Evolutionary Algorithm (RTEA) [15]. HS is inspired from an artificial phenomena, musical harmony. Just like musical performances seek a best state (fantastic harmony) which is determined by aesthetic estimation, HS seeks a best state (global optimum) determined by fitness or objective function. RTEA draws inspiration from algebraic theory on evolution process. Generally, there are two models followed for hybridizing meta-heuristic algorithms [16]: low level and high level. In low level hybridization, a function in a meta-heuristic is replaced by another meta-heuristic. In high level version, the base meta-heuristics are executed in sequence. We have hybridized HS and RTEA in a high level fashion, which follows the pipeline model, where each meta-heuristic optimization algorithm works on the output of previous optimization algorithm. To the best of our knowledge, this is the first time HS is hybridized with RTEA for solving FS problems. In a nutshell, the main contributions of this work are as follows: • RTHS: A new FS method named Ring Theory based Harmony Search is introduced using a popular metaheuristic HS and recently proposed RTEA.
• The proposed hybrid FS approach is evaluated on 18 standard UCI datasets [17] using K-nearest Neighbors (KNN), Random Forest, and Naive Bayes classifiers.
• The proposed FS approach is compared with 10 state-of-the-art meta-heuristic FS methods.

II. LITERATURE REVIEW
Meta-heuristic algorithms are categorized differently in the literature: single solution based and population based [18], nature inspired and non-nature inspired [19], metaphor based and non-metaphor based [20]. These algorithms can also be divided into four different categories from 'inspiration' point of view [21]: Evolutionary, Swarm inspired, Physics based and Human behavior related. FS is a binary optimization problem and most of the standard and popular optimization algorithms introduced so far in the literature have been applied for solving FS problems. Different applications of Genetic Algorithm (GA) for FS can be found in [22]- [25]. Particle Swarm Optimization (PSO) based FS methods can be found in [26]- [28]. Ant Colony Optimization (ACO) and Gravitational Search Algorithm (GSA) based FS methods can be found in [29] and [30] respectively. Different types of meta-heuristics algorithms based on the source of inspiration are briefed here.
• Evolutionary algorithms are inspired from biological science. It uses the concept of mutation and crossover to evolve the randomly generated initial population over iterations and eliminates the worst solution in order to obtain a better solution. GA [31] is inspired from the biological evolution. Mutation and crossover are two of the most common operators used in GA. Mutation operates on a single solution and generally changes a feature randomly or following some pre-defined criterion. Crossover, on the other hand, operates on two parent solutions to produce two offspring, resulting in new and better solutions. Co-evolving algorithm [32], Differential evolution (DE) [33], Genetic Programming [34], Evolutionary programming [35], Bio-geography based optimizer [36], Stochastic fractal search [37] etc. are some of the well-known evolutionary algorithms.
• Swarm inspired algorithms mimic individual and social behavior of swarms, herds, schools, teams or any group of animals. Every individual has its own behavior, but the behavior of the accumulated individuals helps us to solve complex optimization problems. One of the most popular Swarm inspired algorithms is PSO [38], which is proposed by simulating social behavior, as representation of the movements of organisms in a bird flock or fish schools. This method performs the search to obtain the optimal solution through agents, referred to as particles. The movement of these particles is influenced by local optima in search space, and are updated if a better solution is found. Another approach of this category is ACO [39], inspired from the foraging method of ant species. Grey Wolf Optimizer (GWO) [12] is inspired by the leadership hierarchy and hunting mechanism of grey wolves. Four types of grey wolves such as α, β, δ and ω are employed for replicating the leadership ranking. The three main steps of hunting, which are searching for prey, surrounding a prey, and attacking the prey are used as evolution. Ant Lion Optimizer (ALO) [40] proposed mimicking the hunting strategy of antlions in nature.
• Physics based algorithms are inspired by the physical processes in nature. The inspiring physical processes include music, metallurgy to mathematics, physics, chemistry, and complex dynamic systems. One of the oldest algorithms of this category is Simulated Annealing (SA) [50], developed by following the annealing [51] process of metals present in metallurgy and materials sciences. Another popular method of this category is GSA [30], developed by following gravity and mass interaction. The search agents are considered as collection of masses, which interact with each other based on the Newton's gravitational law and the laws of motion. The used HS in the present work belongs to this category. Some other methods of this category are Self propelled particles [52], Black hole optimization [53], Charged system search [54], Sine Cosine algorithm [55], Multiverse optimizer [56] etc.
• Human related algorithms are inspired from human behavior and interactions. Teaching-Learning-Based optimization [57] is a very popular method of this category, developed by following the enhancing procedure of class grade. Imperialistic competitive algorithm [58] is inspired from the human socio-political evolution process. Here, the populations are divided into two categories: colonies and imperialists states. The idea of this algorithm stands upon the competition among imperialists to take control of the colonies. At the end of the competition, only one imperialist stands out as victor and takes control over all the colonies, and the weak empires collapse. Some other methods of this category are: Society and civilization [59], League championship algorithm [60], Tug of war optimization [61], Volleyball premier league algorithm [62]. Nowadays, hybrid meta-heuristics algorithms have been used frequently for solving FS problems. Hybrid metaheuristics have been proven to be an efficient approach to achieve better performance in various real-life problems [63]. In [64], the first hybrid meta-heuristic method is proposed for FS by combining GA with local search algorithm. The hybrid combination of Markov chain and SA is proposed in [65]. Memetic algorithm and Late acceptance hill climbing have been hybridized and used for FS for facial emotion recognition [66]. Spotted Hyena optimizer is combined with SA and used for FS on UCI datasets in [3]. Hybrid of GA and SA has been used for FS and applied on UCI datasets in [67]. In [68], Salp Swarm algorithm (SSA) is hybridized with Opposition Based Learning (OBL) and a local search method which is then applied on UCI datasets for FS. In [69], GA and PSO have been hybridized for FS and applied on Digital Mammogram datasets. In [70], the hybrid of GWO and PSO has been applied on UCI datasets for FS. Hybrid of PSO and GSA can be found in [71]. Hybrid of ACO and GA has been proposed in [72]. In [73], a hybrid version of DE and ABC for FS has been proposed and applied on UCI datasets. Therefore, it is a time to raise a question. Why do we even need any new hybrid meta-heuristic algorithm for solving FS problems, as we have abundant of such algorithms? This question is quite logical and obvious. The question is best answered by a work reported in [74], which proposes No Free Lunch theorem and summarizes that there is not a single optimization algorithm which is capable for solving every type of optimization problem. With each new optimization algorithm following any regular phenomena, researchers primarily focus to give some new facet to the algorithm where both exploration and exploitation will have a superior tradeoff, so it ultimately gets away from the local optima and compasses to the global optima. Nevertheless, accomplishing these objectives are not simple, particularly in the event for which one needs to propose an algorithm that can be applicable to different domains. This practically motivates the researchers to come up with better optimization algorithms in comparison with the previously proposed algorithms. This is the inspiration which keeps the research active in the field of FS and motivates us to propose a new hybrid FS algorithm called RTHS algorithm based on HS method [14] and recently proposed RTEA [15].

III. PRELIMINARIES
A. HARMONY SEARCH: AN OVERVIEW HS algorithm [14] is inspired from an artificial phenomena, musical harmony. It transforms the qualitative improvisation process into quantitative optimization process with some well defined rules and thus turning the beauty and harmony of music into solution for several optimization problems. Just like the musical performers seek a fantastic harmony, this algorithm seeks the best state determined by fitness (objective) function. Just like the music can be improved for better aesthetic estimation, the fitness value can be improved in every iteration on order to find a better solution. Three important components of this algorithm are: harmony memory, pitch adjustment and randomization. HS algorithm does not require differential gradients, thus it can consider continuous as well as discontinuous functions. The basic steps used in HS algorithm are as follows: Step 1: Randomly initialize initial population, here addressed as Harmony Memory (HM).
Step 2: Improvise a new harmony from it.
Step 3: If the new harmony is better than the worst harmony in HM, replace it.
Step 4: If stopping criterion is not met, goto Step 2. Now, it assumes that all the parts of a global solution exist initially in HM, which is not necessarily the case always. So, to bring diversity, it utilizes Harmony Memory Considering Rate (HMCR), HMCR ∈ [0, 1]. If this rate is too low, then very few elite harmonies are selected and it may converge too slowly. On the other hand, if this rate is extremely high (near 1), the pitches in the HM are mostly used, and other ones are not explored well, not leading to good solutions. Therefore, typically, we use HMCR = [0.7, 0.95] [75]. Another factor called Pitch Adjustment Rate (PAR) is introduced by mimicking pitch adjustment procedure. This produces a new pitch by adding small random amount to the existing pitch.
Below, we have discussed application of HS algorithm as a meta-heuristic FS method in different domains as well as some of the modifications of it found in the literature. In [76], the authors have tried to generate a new solution vector that improves accuracy and convergence rate of HS algorithm and applied it on constrained functions (minimization of the VOLUME 8, 2020 Algorithm 1 Pseudocode of HS Algorithm Input: popSize, maxIter, HMCR, PAR Output: Randomly generate initial population HM (0) end if end for end for weight of the spring, Pressure vessel design, welded beam design etc.) and unconstrained functions. This work mainly discusses about the effect of constant parameters of HS algorithm. Another work reported in [77] describes a new version of HS algorithm for engineering optimization problems with continuous design variables. This algorithm has been applied on unconstrained function minimization problems (Rosenbrock function, Eason and Fenton's gear train inertia function, Wood function, Powerwell quartic function etc.), constrained function minimization problems, structural engineering optimization problems (Pressure vessel design, Welded beam design etc.) and more. In [78], a new variant of HS algorithm has been proposed, which is known as Globalbest Harmony Search (GHS). This algorithm takes the help of swarm intelligence to improve the performance of HS algorithm. It has been applied on Sphere function, Schwefel's problem, Step function, Rosenbrock function, Rastrigin function etc. In [79] the authors present a cost minimization model for the the design of water distribution network. It is applied on five water distribution networks, which are Twoloop water distribution network, Hanoi water distribution network, New York City water distribution network, GoYang water distribution network, and BakRyun water distribution network. The work proposed in [75] reviews and analyzes the HS algorithm in the context of meta-heuristic algorithms.

B. RING THEORY BASED EVOLUTIONARY ALGORITHM
RTEA [15] is an recently proposed approach to solve combinatorial problems using algebraic theory. A global exploration operator (R-GEO) and local development operator (R-LDO) are proposed using the addition, multiplication and inverse operation of direct product of the rings. Then by using R-GEO and R-LDO, new individuals are generated following a greedy strategy.

1) SOME PROPERTIES OF RING a: DEFINITION OF RING [80]:
A ring (R, +, ·) is a nonempty set R together with two binary operations, + and ·, defined on R such that: where n 2 and Z is the set of all integers. We define two binary operations as following: [ Hence, Z n is a ring with operations ⊕ and .

b: DIRECT PRODUCT OF RINGS [80]:
The direct product of rings is another ring, whose every element is an ordered m-tuple. If R i are rings, i ∈ I = 1, 2, 3, . . . ,m, then i∈I R-GEO reflects global exploration ability and R-LDO which acts as the local search operator is given by 2.
Concisely R-LDO and R-GEO are used to generate new individuals, and a greedy strategy is used to select individuals to form the new generation. where o < D, F ⊂ F and F has lower classification error rate than any other subset having same size or any proper subset of F . FS is a binary optimization problem, where 1 is indicates that the corresponding feature is selected whereas 0 indicates that the corresponding feature is discarded. So, we select the features having value 1 and discard those features having value 0. Our main goal is to decrease the number of 1's along with increasing the classification accuracy. In RTEA, which has been applied for solving Knapsack problem (KP) problem in the original paper, there are two cases. First, it is considered that the problem has a feasible solution, and second, it considers the opposite i.e. there is no feasible solution of the problem under consideration. For us, as we assume that there are potential solutions in the search space, so we consider this particular case [15] which is a binary problem in itself. Z[r 1 , r 2 , . . . , r D ] in the previous section becomes Z[2, 2, . . . , 2], so we do not have to use any transfer function. Now, we need to consider another major aspect of FS, that is whenever we talk about achieving higher classification accuracy with lowest number of features by the algorithm, it can be observed that these two objectives are contradictory in nature. To get rid of this issue, classification error rate has been considered here. Using Equation 2, these two driving candidates have been combined.
where F represents the set consisting of selected features, |F | represents number of selected features ζ (F ) represents classification error rate of F , |F| is the original dimension of the dataset and ω represents weight ∈ [0, 1]. HS algorithm finds the global optima by initiating HMCR and uses PAR to escape local optima. RTEA uses R-LDO for local exploration and R-GEO for global search. As FS is a binary optimization problem, R-LDO has been modified slightly as the population contains binary values. This modified version of R-LDO is given by Equation 3.
where X d (t) represents the d th dimension in t th iteration of the solution X . The direct product of rings is a ring too. As we utilize the {0, 1} version of RTEA, the obtained results will be in {0, 1}. With the help of R-GEO, RTEA performs the global exploration, and with R-LDO, it performs the exploitation. So, there is a balance between exploration phase and exploitation phase. The flowchart of our proposed method is given in Figure 1.
The HS algorithm depends on the value of HMCR and PAR in order to find out the global best solution. HMCR [76] mainly helps to find the 'promising' areas where optimal solution (global best) may lie i.e., it ensures exploration. On the other hand, PAR [76] helps to properly search the areas already discovered i.e., it ensures exploitation. Therefore, chances of obtaining the best solution depend on this HMCR and PAR, but their values need to be defined at the beginning of the HS algorithm. Appropriate setting of these values is needed for convergence of this algorithm. Again the same set of values may not to produce optimum results for all problems. In [76], the authors have tried to address this issue with an iterative approach, but then precision becomes another problem to deal with. With increase in precision by 10% margin, the time requirement increases exponentially. Here, we have tried to solve this problem using a completely different approach. We have enhanced both the exploration and exploitation operations of HS algorithm by a recently proposed meta-heuristic RTEA. In each iteration, we not only 'enhance' a particular harmony by finding its fitter neighbor using the R-LDO operator but also we excel the 'improvise a new harmony' step used in the HS algorithm with the help of R-GEO operator. This implies that we are able to reduce the dependency of the HS algorithm on initial values of HMCR and PAR to obtain the global optima. R-GEO, aids in the exploration process by considering randomly 4 harmonies present in current HM (population) and checking whether is it possible to form a better harmony or not. R-LDO tries to improve the exploitation by finding any better neighbor for a particular harmony. The results shown in subsection V-C and section VI validate this claim.

V. RESULTS AND DISCUSSION
We have evaluated the proposed RTHS algorithm using three popular classifiers: KNN [81], Random Forest [82], Naive Bayes [83] for assessing the effectiveness of the same. For each dataset, 80% of the instances are used to train the model and the rest 20% are used for testing. We have applied the FS methods on the trained data, and determined the features which are useful. These features form the optimal feature subset. From test data, only those features are selected and the VOLUME 8, 2020  test classification accuracy is measured based on these using the above mentioned classifiers. The proposed FS method is implemented using Python3 [84] whereas the graphs are plotted using Matplotlib [85].

A. DATASET DESCRIPTION
In order to examine the performance of HS, RTEA and RTHS algorithms, 18 standard UCI datasets [17] have been considered. These datasets are selected from various backgrounds. The description of these datasets is presented in Table 1, which shows that there are 14 bi-class and 4 mutliclass datasets. The datasets are diverse in terms of number of features and instances. These mixture helps us in establishing the robustness of the proposed FS method.

B. PARAMETER TUNING
When we talk about multi-agent evolutionary algorithm, both population size and maximum number of iterations play a significant role for characterizing the behavior of one agent's learning ability from others' experiences and the step-by-step evolution of the agents respectively. For finding the appropriate values of these two parameters, we have performed experiments by varying one parameter w.r.t. the other. Figure 2 shows the effect of the size of the population on achieved classification accuracy using the proposed FS method. Considering that the time requirement increases with increase in population size and the effect of the same on classification accuracy, we have fixed the values of population size to 20 as the standard population size, the maximum number of iterations to 30, HMCR to 0.8, PAR to 0.2 and Prb m to 0.005 (as suggested in [15]) for all further experiments. Figure 3 shows the best value of the fitness function in each iteration.

C. DISCUSSION
This section reports the results of the proposed FS method called RTHS for the datasets mentioned in Section V-A.  Table 3 and Table 4 describe the results obtained by the proposed RTHS algorithm, as compared to RT and HS algorithms using KNN, Random Forest and Naive Bayes classifiers respectively. It can be concluded that the proposed RTHS algorithm performs the best over UCI datasets with KNN classifier. Besides, KNN is also widely used in the literature for FS purpose on UCI datasets [10], [86], [87]. Hence, for further experiments and analysis, we have used only KNN classifier with K = 5.
From Table 2, it is quite evident that the proposed FS method has performed significantly well. The RTHS algorithm has produced accuracy > 90% (83.33%) for 15 datasets. Whereas it has achieved 100% accuracy for 9 (50%) datasets: CongressEW, Exactly, Ionosphere, M-ofn, PenglungEW, Sonar, Vote, WineEW, and Zoo, which is quite impressive. Out of 18 datasets, it achieves the highest accuracy in almost 17 cases (94.44%). Comparing these results with HS algorithm, it can be observed that the RTHS algorithm performs better than HS algorithm in exactly 15 cases and in 3 cases they produce equivalent result. Comparing RTHS algorithm with RTEA, it is found that in 12 cases, they achieve the same result, and but in 5 cases, the RTHS algorithm outperforms RTEA. However, in the case of Tic-tac-toe dataset, the RTHS algorithm could not outperform RTEA in terms of classification accuracy. Now, if we focus on the number of features selected, then it is quite clear that the RTHS algorithm selects the least number of features in exactly 15 cases (83.33%). Careful observation of Table 2 reveals that the RTHS algorithm outperforms both HS algorithm and RTEA with significant margin in most of the cases. It outperforms HS algorithm in 11 cases and gives equivalent result in 4 cases (CongressEW, Exactly2, HeartEW and M-of-n datasets). In case of Exactly, Ionosphere and Tic-tac-toe datasets, HS algorithm is able to produce better results than RTHS algorithm. But RTEA is unable to show better performance than RTHS algorithm     for none of the cases, though it provides equivalent result in 4 cases (Breastcancer, BreastEW, Exactly2 and M-of-n datasets).
To visualize these results, bar charts showing the comparison of both average accuracies achieved and number of features utilized by the three algorithms, namely, RTHS algorithm, HS algorithm and RTEA, using KNN classifier have been plotted in Figure 4 and Figure 5 respectively. These bar charts show that all the three FS methods perform almost  equivalently in terms of average accuracy, but in terms of average number of features selected, the RTHS algorithm has the upper hand.
From Table 3, it is quite clear that the proposed FS method performs much better than both RTEA and HS algorithm. For 14 datasets (77.78%), it has achieved > 90% accuracy. Whereas for 7 datasets (38.89%), it has achieved an 100% accuracy. These datasets are: BreastEW, IonosphereEW, M-of-n, PenglungEW, Vote, WineEW and Zoo. For all the 18 datasets, it achieves the highest accuracy along with ties  in case of 10 datasets with RTEA and 3 datasets with HS algorithm. This proves that using Random Forest classifier, the RTHS algorithm still shows superior performance. Figure 6 illustrates the comparison of average accuracies achieved by the three algorithms using Random Forest classifier. Now, if we consider the number of selected features, we can see that RTHS algorithm is slightly behind the HS algorithm and RTEA. For 8 datasets (44.44%), it selects the least number of features along with ties in 8 datasets with RTEA and in 4 datasets with HS algorithm. From this perspective, the RTHS algorithm may seem to be inefficient, but looking at the achieved classification accuracy, it can be said that the RTHS algorithm produces far better results than others. Figure 7 shows the comparison of number of features chosen by the three algorithms using Random Forest classifier. By looking at Figure 7, it is obvious that in terms of average number of features selected, the RTHS algorithm selects least number of features than RTEA.
Going through Table 4, it can be easily said that the RTHS algorithm has achieved the highest accuracy for all the 18 UCI datasets along with ties for 4 datasets with HS algorithm and 9 datasets with RTEA. In case of 14 datasets (77.78%), the proposed FS method has achieved accuracy > 90%. On the other hand, for 7 datasets (38.89%), which are: BreastEW, InonosphereEW, Lymphography, PenglungEW, Vote, WineEW and Zoo, the proposed RTHS algorithm has produced an 100% accuracy. These prove that the RTHS algorithm produces the best results in terms of classification accuracy. Figure 8 illustrates the comparison of average accuracies achieved by the three algorithms using Naive Bayes classifier. Now, considering the number of selected features, Figure 9 tells us a lot. For 13 datasets (72.22%), the RTHS algorithm selects the least number of features along with ties VOLUME 8, 2020  Based on the above discussion as well as observing Table 2, we can conclude that the RTHS algorithm significantly helps both HS algorithm and RTEA to explore different parts of the search space and to achieve better solution in terms of both achieved classification accuracy and selected number of features.
During this comparison, if the classification accuracy is found to be higher for any particular classifier, then that classifier is considered as better and if the classification accuracies remain the same for any two classifiers, then the number of selected features is used as the deciding factor to break the tie. While observing Table 2, Table 3 and Table 4, we can also conclude that the KNN classifier produces better results than both Random Forest and Naive Bayes classifiers.
Comparing the results obtained from both KNN and Random Forest classifiers, it can be observed that the KNN classifier performs better than Random Forest for exactly 13 datasets (72.22%) considering both the achieved classification accuracy and number of selected features. In case of BreastEW, Tic-tac-toe and WaveformEW datasets, Random Forest has upper hand over KNN classifier. Again, in case of Exactly2 and M-of-n datasets, two classifiers produce the same classification accuracy and choose the same number of features.
Furthermore, comparing the results obtained by KNN and Naive Bayes classifiers, it can be inferred that the KNN classifier outperforms the Naive Bayes classifier for exactly 12 datasets (66.67%) while considering the above mentioned factors. For 4 datasets, namely, BreastEW, Lymphography, WaveformEW and Zoo, the Naive Bayes classifier produces the best results. For Exactly2 and Vote datasets, the two classifiers result in a tie.
We can finally conclude from the above discussion that the KNN classifier produces better results than Random Forest and Naive Bayes classifier. So, for performing comparison of the proposed RTHS algorithm with state-of-the-art FS methods, we consider KNN classifier only.

VI. COMPARISON
To check the the effectiveness of the proposed FS method, we have compared it with 10 state-of-the-art FS methods that include four popular meta-heuristic FS methods namely, GA, PSO, ALO, and GSA, and six hybrid meta-heuristic FS methods namely, Serial grey-whale optimizer (HSGW), random switching grey-whale optimizer (RSGW), adaptive switching grey-whale optimizer (ASGW), WOASAT-2, BGWOPSO and WOA-CM. HSGW, RSGW, and ASGW are three different FS strategies formed by hybridizing both GWO and WOA methods [86]. WOASAT-2 method [10] is hybrid of WOA and SA methods. BGWOPSO method [70] is developed by hybridizing GWO and PSO methods. In WOA-CM method [88], the performance of WOA is enhanced by using both crossover and mutation. The values of the control parameters of these FS methods are described in Table 5. Table 6 shows the performance of the RTHS algorithm in terms of classification accuracy. From Table 6, it can  be observed that the RTHS algorithm performs the best in 16 cases (88.9%), which is quite impressive. In case of Tictac-toe dataset, it stands third position following ASGW and RSGW methods.
It is worth mentioning that the RTHS algorithm outperforms BGA method completely in 15 cases, and ties in 2 cases. For Exactly2 dataset, BGA performs better than RTHS algorithm. On the other hand, the RTHS algorithm outperforms BPSO in 15 cases and ties in 2 cases. For Excatly2 dataset, BPSO performs slightly better than RTHS algorithm. However, the proposed algorithm completely outperforms both BALO and BGSA methods for exactly 17 datasets except Exactly2.
Again, the HSGW method outperforms RTHS algorithm in case of Exactly2 dataset and ties in 4 cases. The RTHS algorithm outperforms HSGW in 13 cases. With respect to RSGW method, the RTHS algorithm has 4 ties, 12 wins and 2 losses. ASGW ties with RTHS method in 4 cases whereas the former method outperforms the latter in 3 cases. Furthermore, it loses to RTHS algorithm for the rest 11 cases. In comparison with WOASAT2 method, the RTHS algorithm has 2 ties and 16 wins. Again, the BGWOPSO method has 3 ties and 15 loses when comparing with RTHS algorithm. Both WOA-CM and RTHS tie in case of Exactly dataset whereas for the rest 17 datasets, the proposed RTHS algorithm wins. Table 7 shows the performance of the RTHS algorithm w.r.t. number of selected features. The RTHS algorithm selects the lowest number of features in case of 8 datasets. In case of Exactly2 dataset, the RTHS algorithm fails to achieve the highest accuracy but in terms of number of selected features, it performs the best followed by BGA, BPSO and BGSA methods. So, considering both Table 6 and Table 7, it can be concluded that the RTHS algorithm   performs the best w.r.t. the 10 state-of-the-art FS methods considered here for comparison. Figure 10 shows the graphical comparison of the average classification accuracies achieved by RTHS algorithm and 10 state-of-the-art FS methods. It clearly shows that the RTHS algorithm attains the highest average classification accuracy as compared to other state-of-the-art FS methods. Figure 11 depicts the graphical comparison of the average number of features selected by RTHS algorithm as well as 10 state-of-the-art FS methods. Figure 11 implies that the proposed RTHS algorithm selects the lowest number of features as compared to other state-ofthe-art FS methods.
To determine the statistical significance of the RTHS algorithm, the Wilcoxon rank-sum test [89] has been performed. It is a non-parametric statistical test where pairwise comparison of the proposed FS method is done w.r.t. 10 other state-of-the-art FS methods. Here, the null hypothesis states that the two sets of results follow the same distribution. If the distribution of two results are statistically different, then the generated p-value obtained from the test statistics will be < 0.05, when the test is performed at 0.05% significance level. If this condition is satisfied, then the null hypothesis is rejected. From the test results provided in Table 8, it can be concluded that the proposed RTHS algorithm is statistically significant w.r.t. 10 other state-of-the-art FS methods.
Observing both Table 6 and Table 7, we can conclude that the top − 3 FS methods which perform better as compared to the proposed RTHS algorithm are: HSGW, RSGW, and ASGW. Table 9 provides the detailed performance results attained by the proposed FS method as well as these top − 3 methods in terms of four statistical popular measures such as precision, recall, f-score, and roc_auc_score for all the 18 UCI datasets considered in the present work.

VII. CONCLUSION
In this paper, we have proposed a hybrid meta-heuristic method for FS, named as RTHS, based on a well-known metaheuristic called HS algorithm and a recently proposed metaheuristic called RTEA. The proposed RTHS has been applied on 18 standard UCI datasets and compared with 10 state-ofart meta-heuristic and hybrid meta-heuristic FS approaches. The obtained results prove the superiority of RTHS over other methods. Hence, we can say RTHS can be considered as a competent method for solving FS problems. Observing the results meticulously, we can understand that RTEA helps HS algorithm to overcome its limitations in terms of exploration and exploitation (as discussed in section IV). But, there may be some cases, where it may fail to find global optima as per the requirement of the problem, which is in accordance with No Free Lunch theorem [74]. At the same time, we need to perform a bit exhaustive experiments to find the ideal value of the parameters used in this algorithm for different problems, which is another shortcoming of the proposed work. As future scope of the work, we can apply the proposed RTHS on other popular and interesting research problems, like facial emotion recognition, musical symbol recognition, handwritten or printed script recognition, etc. RTHS can be applied on high dimensional datasets like gene expression data. It would be interesting to hybridize this with other recently proposed or classical meta-heuristic algorithms.