EHHM: Electrical Harmony Based Hybrid Meta-Heuristic for Feature Selection

Selecting the most relevant features from a high dimensional dataset is always a challenging task. In this regard, the feature selection (FS) method acts as a solution to this problem mainly in the domain of data mining and machine learning. It aims at improving the performance of a learning model greatly by choosing the relevant features and ignoring the redundant ones. Besides, this also helps to achieve efficient use of space and time by the learning model under consideration. Though over the years, many meta-heuristic algorithms have been proposed by the researchers to solve FS problem, still this is considered as the open research problem due to its enormous challenges. Particularly, these algorithms, at times, suffer from poor convergence because of the improper tuning of exploration and exploitation phases. Here lies the importance of the hybrid meta-heuristics which help to improve the searching capability and convergence rate of the parent algorithms. To this end, the present work introduces a new hybrid meta-heuristic FS model by combining two meta-heuristics - Harmony Search (HS) algorithm and Artificial Electric Field Algorithm (AEFA), which we have named as Electrical Harmony based Hybrid Meta-heurtistic (EHHM). The proposed hybrid meta-heuristic converges faster than its predecessors, thereby ensuring its capability to search efficiently. Usability of EHHM is examined by applying it on 18 standard UCI datasets. Moreover, to prove its supremacy, we have compared it with 10 state-of-the-art FS methods. Link to code implementation of proposed method: khalid0007/Metaheuristic-Algorithms/FS_AEFAhHS.


I. INTRODUCTION
Whenever we have something in abundance, choosing the best one or the required ones becomes very difficult. The reason for choosing may be we do not need all, or we cannot process all. It becomes difficult because we have to search from a huge pool of data which is time consuming as well as error-prone. Owing to the recent advancement and growth in technologies, datasets consisting of hundreds or thousands of attributes/features have become omnipresent in The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Ayoub Khan . the field of machine learning, pattern recognition and data mining [1], [2]. However, selection of the most significant attributes/features from these high dimensional data is a challenging task. To address this problem, FS techniques came into play [3], which is considered as a prepossessing step. FS tries to select a subset of features which is useful for both efficient learning and classification purposes. In other words, FS technique tries to get rid of the features which are irrelevant, redundant or act as noise, and hence do not contribute to the learning process of a model. If we have a 'd' dimensional feature set, then there are 2 d numbers of possible feature subsets which makes FS a NP-hard problem. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ To this end, several methods have been proposed by the researchers to solve FS effectively and efficiently throughout the years. FS methods can broadly be classified into two different categories: • Filter: This method utilizes some statistical means to find out correlation between the attributes and the outcome variables, and based on the score, it ranks the features. Some of the filter popular methods are Chi-square [4], Fisher score [5], Mutual information [6], Relief [7] etc.
• Wrapper: A learning model is trained using a subset of features. Based on the observations and inferences, some feature(s) are added or removed. So, the problem becomes changing size of the subset efficiently. Some of the wrapper based FS methods are binary partcile swarm optimization using SVM [8], FS using ant colony optimization (ACO) [9], adaptive β binary sailfish optimizer (AβBSF) [10], FS using hybrid of grey wolf optimizer(GWO) and whale optimization algorithm (WOA) [11] etc.
• Embedded: This method is based on the combination of filter and wrapper methods. It is usually implemented by algorithms which have their own built-in FS methods. LASSO [12] and RIDGE [13] regressions are two such popular methods which have penalization function to reduce over-fitting of the model. Generally, filter methods are faster as it is not used a supervised learning method, whereas wrapper methods are time consuming as it uses learning algorithm. Most often, wrapper methods produce better classification accuracy than the former [14].
Because of the flexibility, avoidance of local optima, non-derivative nature, ease of implementation and effectiveness, meta-heuristic algorithms have been successfully used in different optimization problems over the years [15]. One of the complex characteristics of any meta-heuristic algorithms is to maintain a good trade-off between its exploration and exploitation phases. As these two phases decide the success of an algorithm, to tune them properly is a challenging task. Simply speaking, exploration is the process where solutions visit the un-visited spaces in anticipation of finding better solution, and exploitation means to plough a neighboring space effectively to discover more appropriate solution. Some of meta-heuristic cost based Feature Algorithms are proposed in [16], [17], [18], [19]. FS is considered as an optimization problem where we try to find the optimal set of relevant features [20]. This property motivates us to use meta-heuristic algorithms to solve the FS problem efficiently. Because of such effectiveness of meta-heuristics, it is applied on various domains like facial emotion recognition [21], image contrast enhancement [22], deluge based FS [23], wrapper-filter FS [24], FS for handwriting classification [25], digit classification [26], feature combination for handwritten numeral recognition [27] etc.
As mentioned before, for any meta-heuristic algorithm, exploration and exploitation decide the capability of the algorithm to find the optimal solution. Not only these two phases must be efficient, but also proper tuning between them is very important to obtain the final result. The meta-heuristic algorithms try to achieve the objective of this tuning between exploration and exploitation through some operations which are generally inspired from natural phenomena. However, this does not guarantee the efficient inspection of the whole search process. In a nutshell, we can say that any meta-heuristic algorithm may fail to find the optimal solution if (i) it cannot find the 'promising' area where the optimal solution may belong, and gets stuck in local optima, (ii) it fails to properly search the promising areas discovered, hence fails to converge towards optimal solution,or (iii) both. To overcome this shortcoming, use of hybrid meta-heuristic has come into existence. As a hybrid algorithm is a result of union of two different meta-heuristic algorithms, we generally expect it to perform better than its underlying methods. The main reason of hybridizing two methods resides in the fact that the final algorithm will take advantage of both of its parent algorithms' exploration and exploitation phases to maintain the appropriate tuning between them, thereby ensuring desired result.
This approach has been proven to achieve better results in real-life problems [28]. Hybrid meta-heuristics have been applied for solving FS problems effectively and efficiently. This work [29] proposes hybridization of GA with a local search. Memetic algorithm and Late acceptance hill climbing have been hybridized and used for FS for facial emotion recognition [30]. Hybrid of GA and SA has been used for FS and applied on UCI datasets in [31]. In [32], GA and PSO have been hybridized for FS and applied on Digital Mammogram datasets. In [33], a hybrid version of DE and ABC for FS has been proposed and applied on UCI datasets.
The effectiveness of the hybrid meta-heuristic algorithms gives us the reason to explore the world of hybridized metaheuristics. As they are generally better at searching the optimal solution than their underlying algorithms, hence we get motivated to propose a new hybridized method to solve an important real-life problem called FS. To this end, we introduce a hybrid of two meta-heuristic algorithms, namely, the popular harmony search (HS) [34] algorithm and recently proposed artificial electric field algorithm (AEFA) [35]. HS algorithm, an old and popular optimization algorithm, takes motivation from the charm of musical harmony to find a better solution hence converts qualitative improvisation into quantitative optimization process. Whereas AEFA, a recently proposed optimization algorithm, takes inspiration from Coulomb's law of electrostatics. It mimics the behavior of a charged body to attract opposite charge and repel similar charge which eventually helps to find the optimal solution. Our proposed work tries to simulate an artificial phenomena, which we have named as Electrical Harmony based Hybrid Meta-heurtistic (EHHM), such that because of the interactions of the charged bodies with each other, they move in harmony as to create a natural and yet artificial environment. The proposed method is used for FS. Talbi et al. proposed various ways of hybridizing meta-heuristic algorithms in their work reported in [36]. Broadly, it can be categorized into two approaches: high-level and low-level hybridization. The high-level approach can be further classified into two subcategories: 1) High-level relay hybrid (HRH): Two meta-heuristics are executed in sequence. 2) High-level teamwork hybrid (HTH): Two metaheuristics are executed in parallel. Similarly, low-level approach can also be classified into two sub-categories: 1) Low-level relay hybrid (LRH): A meta-heuristic is strung into a single solution algorithm. 2) Low-level teamwork hybrid (LTH): A meta-heuristic algorithm is strung into a population based algorithm. The proposed hybrid method follows the HRH approach, where output of one algorithm acts as input of the other in a pipeline fashion. To the best of our knowledge, this is first time HS algorithm is hybridized with AEFA to solve FS problem. In short, the contributions of this work are as follows: 1) A hybrid of popular HS algorithm and recently proposed AEFA is introduced which is known as Electrical Harmony based Hybrid Meta-heurtistic (EHHM) algorithm. 2) 18 standard UCI datasets are used to evaluate the proposed method using K-Nearest Neighbors (KNN) classifier. 3) To prove the supremacy of EHHM algorithm, it is compared with 10 state-of-the-art meta-heuristic based FS methods. The rest of the paper is organized as follows: section II gives a basic idea of various meta-heuristic algorithms, section III discusses about HS algorithm and AEFA, section IV explains the hybrid meta-heuristic proposed in this work, section VI discusses about the obtained results and depicts the position of the proposed method when compared with other state-of-the-art methods and last but not the least, section VII gives the conclusive remarks along with possible future extension of this work.

II. LITERATURE SURVEY
Throughout the years, numerous optimization algorithms have been proposed to solve the complex optimization problems. Naturally, these huge number of algorithms have different behaviors and properties. Based on their nature, we can categorize the algorithms in several ways. These algorithms may be classified as single solution based or population based [37], metaphor based or non-metaphor based [38], nature inspired or non-nature inspired [39]. Based on the inspiration used to implement the algorithms, we may categorize these as evolutionary, swarm based and physics based [40]. Here, we have briefed some of these algorithms.
• Evolutionary algorithm: This class of algorithms is motivated from biological evolution which incorporates Darwinian concepts of evolution. The most popular of them are genetic algorithm (GA) proposed in [41]. Initially, a population of solutions is generated randomly, and these solutions are updated in every iteration of the algorithm. GA uses the crossover and mutation operators to search the optimal solution efficiently. As the best individual has more probability of participating in producing new offspring, it is expected the newly formed solution to be better. Some other popular EAs are differential evolution (DE) [42], evolutionary programming (EP) [43], evolution strategy (ES) [44], bio-geography based optimizer (BBO) [45] etc.
• Swarm based: Algorithms of this category are inspired from the behavior of swarms, herds, schools or flocks of creature in nature. The movements of the search agents are navigated using simulated collective and social information and intelligence. Particle swarm optimization (PSO) [46] is the most famous swarm based metaheuristic. PSO is inspired from social behavior of birds flocking. It employs the use of global best solution to ensure exploration and local best position to ensure exploitation. Ant colony optimization (ACO) [47] is another popular algorithm which belongs to this class. It is inspired from ants in an ant colony. This method relies upon the strategy of ants to find the shortest route between their nest and source food. Some other popular swarm based algorithms are artificial bee colony (ABC) algorithm [48], grey-wolf optimizer (GWO) [40], whale optimization algorithm (WOA) [49], cuckoo search (CS) [50] etc.
• Physics based: This category of algorithms take motivations from physical laws. The oldest physics based algorithms are simulated annealing (SA) [51] and HS algorithm [34]. SA is inspired from annealing process in metallurgy, where metals are heated up to a certain degree followed by controlled cooling to remove the impure particles. HS algorithm is inspired from musical performances where musicians try to achieve the best musical harmony. This phenomenon is adapted in HS algorithm to find optimal solution for an optimization problem. Another popular algorithm of this family is gravitational search algorithm (GSA) [52], which mimics Newton's gravitational law. The search agents are considered as masses which interact with each other following this law to find optimal solution. Charged system search (CSS) [53], follows the Coulomb's law of electrostatics and Newton's law of mechanics to search optimal solutions efficiently. Equilibrium optimizer (EO) [54] is a recently proposed algorithm, which is inspired from well-mixed dynamic balance on a control volume, where mass balance equation is used to describe the concentration of a non-reactive constituent in a control volume. Some other physics based meta-heuristics are multi-verse optimizer (MVO) [55], sine-cosine algorithm (SCA) [15], black hole optimization optimization (BHO) algorithm [56] etc. There has been a rapid inflow of new meta-heuristics in the field of optimization. Naturally, a question may arise that whether we need any more new meta-heuristics algorithms or not. As of now, there are abundant meta-heuristic algorithms which are used to solve various complex optimization problems. Table 1 summarizes pros and cons [57] of some state-of-the-art meta-heuristic methods. So, this question is quite pertinent to ask. But, following the No Free Lunch (NFL) theorem [58], it can be said that there is no such optimization algorithm which can solve every optimization problem efficiently. This implies that if a particular optimization algorithm performs effectively for an optimization problem, that does not guarantee that it will perform in same way for some other optimization problems. More specifically, there may be another optimization algorithm, which may perform better in the second case. Hence, we cannot draw a conclusion about the superiority of a particular meta-heuristic algorithm over others considering the huge varieties of the optimization problems. This very fact gives the researchers to come up with some new or in many cases hybrid version of meta-heuristic algorithms to solve the optimization algorithms at hand effectively and efficiently. The same reason has motivated us to introduce a new hybrid meta-heuristic for FS algorithm named EHHM which is based on HS algorithm and AEFA. While proposing this hybrid meta-heuristic algorithm, the primary concern is that it should produce better results than its parent algorithms by overcoming the limitations of the same. We have proposed our hybrid meta-heuristic FS algorithm known as EHHM, keeping all these factors in mind.

III. PRELIMINARIES A. HARMONY SEARCH (HS) ALGORITHM: AN OVERVIEW
HS algorithm [34] is a meta-heuristic algorithm inspired from the artificial phenomenon of musical harmony. A better musical harmony means that a certain musical composition is aesthetically more pleasing to the ears than other compositions. The process of finding the best musical performance requires practice, and along with that it needs improvising different compositions.With this one can improve the aesthetic quality of musical harmony. Similarly, in optimization problem the solution can be improved over the iterations. This has inspired authors of [34] to incorporate this phenomena into an optimizing algorithm. While aesthetic quality of musical composition is determined solely based on listeners, optimizing capacity of a specific solution is determined by an objective function or fitness function. HS algorithm follows few simple steps: It is to be noted that standard HS algorithm has lot of appreciable qualities in comparison to other state-ofthe-art meta-heuristic algorithms at our disposal. They are as follows: 1) HS algorithm does not require too much mathematical parameter setting.
2) It follows purely random stochastic measures to generate new solution from the available local solutions in the HM. 3) Unlike GA, it does not take into consideration only the best two solutions in order to generate a new solution, rather all the harmonies in HM contribute in finding a new harmony. This is because the best two harmonies may skip some parts of global search space. Due to these features, HS algorithm often has an upper-hand in locating global best solution in solution space within reasonable amount of time, in compare to its contemporaries. Despite these impressive attributes, HS algorithm sometimes lack fine tuning. The fact that it builds new solution solely from randomly initialized HM which may not necessarily contain the good candidate solutions, rather it may be restricted to a local solution, thereby failing to achieve a desired global optimal solution.
For this reason, in [59], the authors proposed few modifications to standard HS algorithm. They introduced two major additions to standard solution namely, Harmony Memory Consideration Rate (HMCR) and Pitch Adjustment Rate where, N is the population size or harmony size. For example, if HMCR = 0.75, it means probability of using HM is 75% and probability of using all range of values is 25%. Now, every h i calculated using HMCR will undergo PAR.
where, Rand() is the uniform random number generator which gives values in the range of [0,1]. In case of Feature Selection, each row of HM consists of two parts, agent or harmony (as used in [34]) and fitness value. An agent can be defined in following way: where, FL is length of feature set. Here h i = 1 corresponding to selection of that i th feature in reduced feature set and h i = 0 correspond to discarding of i th feature from reduced feature set. For each agent, the fitness value is calculated using the fitness function. Here, the optimization goal is to achieve an optimal agent, in other words, we have to maximize the fitness function of the agents. if random(0, 1) ≥ HMCR then 6: for j ← 1 . . . FL do 10: i ← random(1, popSize) 11: end for 13: end if 14: w be index of the worst harmony in HM (t) 15: 16: if fitness(h(t + 1)) < fitness(HM w (t)) then 17: end if 19: 20: for i ← 1 . . . popSize do 21: 22: if random(0, 1) < PAR then 23: end if 25: end for 26: end for In Step 1, the population (collection of agents) is initialized randomly and then for each agent in the population, the fitness value is calculated by equation (20). Both the HMCR and PAR values are also initialized. In Step 2, an improvised agent is created in following method, considering HMCR and PAR values using equations (1) and (2) respectively. In Step 3, the fitness value is calculated for newly improvised agent. If the calculated fitness value is greater than the following least fitness value in the HM, the least performing agent is replaced with newly formed agent. In Step 4, the stopping criterion is checked. If the stopping criterion is not satisfied, the whole procedure again starts from Step 2.

B. ARTIFICIAL ELECTRIC FIELD ALGORITHM (AEFA)
AEFA [35] is a meta-heuristic algorithm inspired by Coulomb's law of electrostatic. According to Coulomb's law, the electrostatic force (attractive or repulsive) between two charged particles is directly proportional to the product of charges of two particles, and inversely proportional to the square of Euclidean distance between two particles. The charged particles attract and repel each other in space to find VOLUME 8, 2020 optimal position in Euclidean space through the movement guided by Coulomb's Law of electrostatic. In AEFA, agents are considered to contain charges and their charges are the measurement of their strengths. Hence the agents can interact with each other through electrostatic force, and agents move in search space. The positions of these charged particles are considered to be the solutions of the problem. The authors only considered attractive force through out the algorithm. Here, the charge on each particle is the function of fitness value of the agent. Therefore, the agent with more charge value (''best individual in the population'') will attract other agents towards itself. As electrostatic force is acting on each individual charge particles, it will result in the change in velocity of each particle, which, in turn, results in the change of its position. Therefore, the position is a solution to the optimization problem. AEFA works in following way: • Step 2: Calculate fitness: Fitness for each agent is calculated by equation (20). Let for i th agent, the fitness value is fitness i (t) at time t whereas the worst and best fitness values, denoted by fitness worst (t) and fitness best (t) respectively, are calculated as follows: fitness worst (t) = min(fitness i (t)), i ∈ [1, N ] (6) fitness best (t) = max(fitness i (t)), i ∈ [1, N ] (7) • Step 3: Calculate the charge for each agent: The fitness value for charge Q i (t) is calculated from the following equations: • Step 4: Calculation of electrostatic force: Here, we calculate the electrostatic force acting on all agents. At time t, the the force acting on i t h agent due to j t h agent is given by: where, is a very small value and Dist ij is Euclidean distance between i th and j th agent.
The authors of [35] proposed the Coulomb's constant (given by equation (12)) as exponentially decaying function. Initially the constant is set to a rather high value to increase the exploration of the algorithm and the value gets reduced iteration by iteration to facilitate the control over search accuracy of the algorithm.
where κ 0 is the initial Coulomb's constant. Finally, the total electrostatic force acting on i th agent due to all other agents is calculated as: Here, Rand() is defined as a uniform random number generator which gives values in range of [0,1]. By introducing this term, the stochastic nature of the algorithm increases.
• Step 4: Update velocity and Position: The acceleration of i th agent in d dimension, d i (t), can be represented as follows: where, M i is mass of i th agent. Now, the velocity and position are updated using the following equations: As in case of FS problem, the value of φ d i (t) at any time should be either 1 or 0, therefore, the s-function, defined in equation (16) is used as transfer function to calculate the updated position of the agents.
Finally, the value of ρ i is updated using equation (4).

• Step 5: Final Solution:
If the stopping criterion is not met, the algorithm again begins from Step 2. Otherwise, the agent with the best fitness value is considered as a global optimal solution.

IV. PROPOSED METHOD: ELECTRICAL HARMONY BASED HYBRID META-HEURISTIC (EHHM)
The proposed EHHM is a meta-heuristic FS algorithm which hybridizes the HS algorithm and AEFA, and tried to overcome the limitations of its patent algorithms. Let, for a particular classification problem, the feature set is represented The optimization goal is to select a subset of such that the classification accuracy achieves an optimal value. The way to approach this problem is to introduce an agent namely EH, a name which is coined based on its working principle found in this hybridization. An EH can be represented as h = [ξ 1 , ξ 2 , . . . ., ξ FS ], where, ξ i = 1 or 0. ξ i = 1 means θ i is included in , and a value of 0 represents it is not selected in . So, the FS problem is now reduced into finding the optimal EH. The proposed method introduces another term called EHM (Electrical Harmony Memory).
where N is the population size. The reason for calling the proposed method as EHHM is that a charge value (or an additional feature) is introduced for each harmony in EHM. AEFA is basically a population based algorithm where the population gets updated based on movement of the charged particles following the Coulomb's law of electric field, But here we aim to boost up its exploration and exploitation capabilities by using the parameter called HMCR used in HS algorithm. HMCR is the probability of using the past information available in the memory, which in turn helps increasing the exploitation capability. Here, (1 -HMCR) is considered as the probability of using range of available values, which improves the exploration capability of proposed algorithm. In each iteration, the EH interacts with each other following the equation (10), and the improvement is assessed by its fitness function as defined in equation (20). The EH value is equivalent to the position analogy in III-B. In each iteration, a new EH is improvised from EHM to increase the stochastic nature of the algorithm, which can easily reduce the chances of the algorithm in getting stuck in local optima, thereby making a path towards a global optima. (20) where, Acc classifier is the desired classification accuracy attained using feature set and is the weightage given to the accuracy on fitness function fit(EH). Similar binary encoding is used in [60].
In short, our proposed EHS algorithm follows the following steps: • Step 1: Introduce EHM and initialize it on random basis.
For each electrical harmony charge and velocity vector is set to 0. Calculate fitness value for each EH using equation (20).
• Step 2: A new EH is improvised from EHM similar to Algorithm 1. If the newly improvised EH outperforms the worst performing EH, replace it with the least performing EH, and velocity of newly replaced EH is updated to 0.
• Step 3: Calculate the amount of charge for each EH using fitness value by using the equations (8) and (9).
• Step 4: For each available EH in EHM, the total Coulomb's Forces acting on it due to all other EHs are calculated using equations (10) and (13). • Step 5: Update the velocity of each available EH using equations (14) and (15). The position (same as EH) is also updated accordingly using equations (16) and (4).
As the EH can only contain 1s and 0s, equation (17) is used.
• Step 6: Calculate the fitness value for each EH using equation (20). • Step 7: If the stopping criterion is not satisfied, the algorithm is repeated from Step 2, else the fittest EH is selected to be the optimal solution. A comprehensive flowchart shown in Figure 2 explains the proposed EHHM algorithm.

V. TIME COMPLEXITY ANALYSIS
Before proceeding for complexity analysis, we must consider the input parameters of the proposed EHHM algorithm. Input parameters of proposed method are: The complexity analysis is as follows: • Step 1: Starts by randomly initializing electrical harmony memory which is practically a 2D array of size NXFL. Considering random function performs c 1 non-trivial operations, contribution of EHM initialization on Time Complexity(T) is c 1 N * FL.
• Step 2: Improvisation step considers all of EHM or the whole solution space based on HMCR score. When EHM is considered, a EH is improvised based on random index for each feature. Therefore, its contribution is c 2 FL. When whole solution space is considered the for each feature index i, a random value is selected VOLUME 8, 2020  • Step 3: The calculation of fitness and charge will cost in proportion with the Population size, and the charge calculation will also cost in proportion with the Population size. Therefore, it costs c 6 N for each iteration. Clearly, Therefore, worst-case time complexity of proposed method is O(It * FL * N 2 ).

VI. EXPERIMENTAL RESULTS AND DISCUSSION
Before we report the experimental outcomes, we have first standardized the parameters used in our proposed algorithm for FS. To maintain a fair comparison throughout the experimental process, the performance of the proposed EHHM, as well as both of the parent algorithms has been assessed using the KNN classifier with nearest neighbour value is set to 5. The KNN classifier is used to evaluate the classification accuracy of the model trained using the selected features. The proposed method has been implemented in MATLAB platform, and the graphs are plotted using matplotlib. Further, for experimental coherence, each dataset, considered here, is divided into training and test sets in 4:1 ratio. The training set has been used to train the KNN classifier using the features selected by the proposed FS algorithm. The performance of the FS algorithm is then evaluated on the test set, using the same features decided by the proposed FS algorithm during training.

A. DATASET DETAILS
The proposed EHHM has been trained and tested on the 18 standard UCI datasets [61], which are as follows: Breast cancer, BreastEW, CongressEW, Exactly, Exactly2, HeartEW, IonosphereEW, KrvskpEW, Lymphography, M-of-n, PeglungEW, SonarEW, SpectEW, Tic-tac-toe, Vote, WaveformEW, WineEW, and Zoo. Among these datasets, 4 are multi-class datasets (Lymphography, WaveformEW, WineEW, Zoo) whereas the rest 14 are bi-class datasets. The diverse nature of the attributes of the datasets helps to find out the robustness of the proposed FS algorithm. A brief description of the datasets is given in Table 2.

B. PARAMETER TUNING
The population size plays an important role in efficient exploitation of the search space. Hence, assigning a proper value to it can influence the result significantly. Too small population size can result in inefficient exploration while too large population size can result in rise in computational time. Figure 3 shows the plots indicating the variation in the classification accuracies with population sizes for all the 18 UCI datasets. Figure 4 shows the variation of fitness scores with the number of iterations in the HS algorithm, AEFA and proposed EHHM, for all the 18 UCI datasets. The number of iterations used in the proposed EHHM is experimentally set to 30. The values of HMCR and PAR are set to 0.75 and 0.2 respectively. These are amongst the core parameters of the proposed algorithm.

C. ANALYSIS
In this section, we analyze the performance of EHHM algorithm, based on the classification accuracy obtained using KNN classifier as well as percentage of features used for obtaining this accuracy. To justify the usefulness of our new hybrid FS algorithm, we first compare its performance with both of its parent algorithms (i.e., HS algorithm and AEFA), in terms of classification accuracy and feature reduction. To prove the superior performance of the proposed FS algorithm, based on fair comparison and experimental coherence throughout the whole result section, we have used KNN classifier, in MATLAB platform and the datasets are divided into train-test set according to 4:1 ratio. Table 3 shows the performance comparison in terms of both classification accuracy and percentage of feature selected by EHHM, AEFA and HS algorithm for the 18 UCI datasets. It is evident from Table 3 that the proposed EHHM has outperformed its parent algorithms by a fair margin. In 7 out of 18 datasets, our proposed EHHM has secured 100% classification accuracy, and more than 90% accuracy in 16 out of 18 datasets. For each of the datasets, the proposed EHHM has secured state-of-the-art accuracy compared to its parents. These datasets include BreastCancer, breastEW, Congress EW, Exactly, Exactly2, HEartEW, IonosphereEW, Lymphography, M-of-n, PenglungEW, SPectEW, Tic-tac-toe, Vote, WaveformEW, WineEW, and Zoo. For most of the cases, even the number of features selected by the proposed EHHM is below 50%, as seen from Table 3. The proposed FS algorithm has utilized the least number of attributes for 10 datasets, which may mislead to the conclusion that this algorithm is inefficient in feature reduction. But it has to be kept in mind that this algorithm has secured accuracy scores, significantly better than those secured by its parent algorithms which have used lesser number of features. Hence, it is quite difficult to judge the competitiveness of the algorithms based on only the number of features selected. It may be worth mentioning that the reduction of feature is an important issue in any FS algorithm, but improving or at least retaining the classification accuracy (while considering the entire feature set) is another vital aspect. In the regime of number of features selected, the proposed EHHM outperforms its two parent algorithms for more than half of these datasets. All these factors cumulatively confirm the successful and efficient hybridization of these two FS algorithms, leading to the development of a better one. The graphs in Figure 3 and Figure 4 show the variation of classification accuracies with both population sizes and number of iterations respectively, for the proposed EHHM, the HS algorithm and the AEFA. It can be clearly seen that the proposed EHHM has proved itself to be the best among these three algorithms.

D. COMPARISON
The effectiveness of the proposed EHHM algorithm is proved by comparing it with 10 state-of-the-art meta-heuristic FS algorithms. These algorithms include Binary Genetic Algorithm (BGA) [62], Binary Particle Swarm Optimization (BPSO) algorithm [63], Embedded Chaotic Whale Survival Algorithm (ECWSA) [20], Mine Blast Algorithm (MBA) [64], Whale Optimization Algorithm (WOA) [49], Binary Optimization Using Hybrid Grey Wolf Optimization (BGWOPSO) algorithm [65], Hybrid GA for FS (HGAFS) [66], Hybrid Mine Blast Algorithm for FS (MBA-SA) [67], Wrapper-Filter FS algorithm based on Ant Colony Optimization (WFACOFS) [24], and Whale Optimization Algorithm assisted with Crossover and Mutation (WOA-CM) [68]. Table 4 and Table 5 show the performance comparison of the proposed EHHM algorithm with the above mentioned FS algorithms, in terms of both classification  accuracy and percentage of features selected over 18 UCI datasets respectively. Some of the observations are as follows: • For the WaveformEW dataset, the classification accuracy achieved by the proposed EHHM is significantly higher than that secured by the rest of the algorithms. As evident from Table 4, the accuracy achieved with our algorithm for the Waveform dataset is 86.8% whereas the second highest accuracy scored for this dataset is 80.74% by HGAFS algorithm. Hence, it is clear that our algorithm has completely outperformed the other FS algorithms for this dataset.
• For Breast Cancer, PenglungEW, and BreastEW datasets, the proposed FS algorithm is the sole algorithm to achieve exactly 100% classification accuracy.
• For the PenglungEW dataset, our proposed algorithm's accuracy score is significantly higher, using the least number of features compared to the other FS algorithms.
• As seen from Table 5, for the WineEW dataset, the fraction of the total number of features used by the proposed FS algorithm (0.077) is far less than that used by any other FS algorithms.
• For the Lymphography dataset, the proposed EHHM has scored an accuracy of 96.9% which is about 18% greater than that of WOA-CM algorithm and about 5% higher than that obtained with BGWOPSO algorithm.
• Zoo, M-of-n, WineEW and Exactly are the other such datasets for which the proposed EHHM algorithm has succeeded in securing exactly 100% classification accuracy, with a tie with BGWOPSO algorithm. The evaluated average rank of the proposed EHHM algorithm, in terms of classification accuracy, is found to be 1.22, whereas the overall rank of this algorithm is 1. In terms of the percentage of features selected, both the average as well as overall rank of this algorithm is found to be 4. VOLUME 8, 2020 • It must be noted that although the proposed algorithm has used a greater number of features in some cases, but in most of those cases, the algorithm has fetched an accuracy value far better than secured by any other of its competitors. This margin of difference between the accuracy values is far more significant than the margin of difference between the fraction of features utilized by our algorithm and other state-of-the-art FS algorithms. Figure 5 shows comparison of the average classification accuracy achieved by EHHM and 10 state-of-the-art methods. Figure 6 shows comparison of the average number of features selected by EHHM and 10 state-of-the-art methods.

E. STATISTICAL SIGNIFICANCE TEST
A statistical test is performed to provide a mechanism for making quantitative decision about a process. The goal is to determine whether there is enough evidence to ''reject'' a conjecture or hypothesis about the process. The conjecture is called the null hypothesis. For our case, the null hypothesis states that the two sets of results have same distribution. This implies that if the distribution of two results is statistically different, then the generated p-value from the test statistics will be < 0.05, when the test is performed at 0.05% significance level. This will result in the rejection of the null hypothesis. So, to determine the statistical significance of EHHM algorithm, Wilcoxon rank-sum test [69] has been performed. It is a non-parametric statistical test where pairwise comparison is performed. From the test results provided in Table 6, we can conclude that the results of the proposed EHHM algorithm is found to be statistically significant.

VII. CONCLUSION
In this paper, we have introduced a hybrid meta-heuristic algorithm for FS, namely EHHM, based on a popular metaheuristic called HS algorithm and and a recently proposed meta-heuristic called AEFA. This method follows the HRH approach of the hybridization, where the output of HS algorithm is fed to the AEFA as a pipeline fashion. The proposed method EHHM is applied on 18 standard UCI datasets. It is compared with 10 state-of-the-art FS methods that include both meta-heuristic and hybrid meta-heuristic algorithms. The obtained results confirm the superiority of proposed method over other state-of-the-art FS algorithms and also its parent algorithms. The Wilcoxon rank-sum statistical test, performed on the proposed method along with 10 state-of-the-art FS methods, indicates that the method is statistically significant, rejecting the null hypothesis, that proposed method is not quite unique in it's result distribution. Therefore, our proposed FS method not only outperforms other state-of-the-art FS methods but also it is statistically unique. Therefore, we can safely claim that EHHM has better capability in reducing the feature dimension as well as in enhancing the classification accuracy. But there may be some cases where it may fail to provide global optima, which is in accordance with NFL theorem [58]. Another possible shortcoming of the proposed EHHM algorithm is that there certain parameters which may take different values for other optimization problems, and it requires some amount of trial-and-errors.
As future scope of the proposed EHHM algorithm, it can be applied to other popular research fields such as class imbalance problem or Knapsack problem. It can also be applied on high dimensional data such as gene expression data to prove its robustness and scalability. We can think of adding some competent local search algorithm with this.
Moreover it can be used for document clustering like used in [70], [71] and [72]. Text or document clustering is a very much relevant field in modern times, where similar text documents are grouped into coherent clusters. Meta-heuristic FS algorithms are providing promising results in this field. Some FS methods proposed in [70], [71] has also performed quite better.
PAWAN KUMAR SINGH (Member, IEEE) received the B.Tech. degree in information technology from the West Bengal University of Technology, in 2010, and the M.Tech. degree in computer science and engineering and the Ph.D. degree in engineering from Jadavpur University (JU), in 2013 and 2018, respectively. He also received the RUSA 2.0 Fellowship for pursuing his postdoctoral research with Jadavpur University (JU), in 2019. He is currently working as an Assistant Professor with the Department of Information Technology, JU. He has published more than 50 research papers in peer-reviewed journals and international conferences. His current research interests include computer vision, pattern recognition, handwritten document analysis, image processing, machine learning, and artificial intelligence. He is a member of the Institution of Engineers, India, and the Association for Computing Machinery (ACM), as well as a Life Member of the Indian Society for Technical Education (ISTE), New Delhi, and the Computer Society of India (CSI).
JIN HEE YOON received the B.S., M.S., and Ph.D. degrees in mathematics from Yonsei University, South Korea. She is currently a Faculty Member of the School of Mathematics and Statistics, Sejong University, Seoul, South Korea. Her research interests include fuzzy regression analysis, fuzzy time series, optimizations, intelligent systems, and deep learning. She is a Board Member of the Korean Institute of Intelligent Systems (KIIS) and has been working as an associate editor, guest editor, and an editorial board member of several journals including SCI and SCIE journals. In addition, she has been regularly working as an organizer and a committee member of several international conferences.
ZONG WOO GEEM (Member, IEEE) received the B.Eng. degree from Chung-Ang University, the M.Sc. degree from Johns Hopkins University, and the Ph.D. degree from Korea University. He researched at Virginia Tech, the University of Maryland at College Park, and Johns Hopkins University. He is an Associate Professor with the Department of Energy IT, Gachon University, South Korea. He invented a music-inspired optimization algorithm, Harmony Search, which has been applied to various scientific and engineering problems. His research interests include phenomenon-mimicking algorithms and their applications to energy, environment, and water fields. He has served an Associate Editor for various journals, such as Engineering Optimization, and a Guest Editor for Swarm and Evolutionary Computation, the International Journal of Bio-Inspired Computation, the Journal of Applied Mathematics, Applied Sciences, and Sustainability.