Binary Social Mimic Optimization Algorithm With X-Shaped Transfer Function for Feature Selection

Definitive optimization algorithms are not able to solve high dimensional optimization problems when the search space grows exponentially with the problem size, and an exhaustive search also becomes impractical. To encounter this problem, researchers use approximation algorithms. A category of approximation algorithms is meta-heuristic algorithms which have shown an acceptable degree of efficiency to solve this kind of problems. Social Mimic Optimization (SMO) algorithm is a recently proposed meta-heuristic algorithm which is used to optimize problems with continuous solution space. It is proposed by following the behavior of people in society. SMO can efficiently explore the solution space for obtaining optimal or near-optimal solution by minimizing a given fitness function. Feature selection is a binary optimization problem where the aim is to maximize the classification accuracy of a learning algorithm using minimum the number of features. To convert the continuous search space to a binary one, a proper transfer function is required. The effect a transfer function has on the binary variant of an optimization algorithm is very important since selecting a particular subset of features based on the solution values attained by the algorithm in continuous search space depends on the considered transfer function. To this end, we have proposed a new transfer function, namely X-shaped transfer function, to enhance the exploration and exploitation ability of binary SMO. The proposed X-shaped transfer function utilizes two components and crossover operation to obtain a new solution. Effect of the proposed X-shaped transfer function is compared with the effect of four S-shaped and four V-shaped transfer functions on SMO in terms of achieved classification accuracy, rate of convergence, and number of features selected over 18 standard UCI datasets. The proposed algorithm is also compared with state-of-the-art meta-heuristic feature selection (FS) algorithms. Experimental results confirm the efficiency of the proposed approach in improving the classification accuracy compared to other meta-heuristic algorithms, and the superiority of X-shaped transfer function over commonly used S-shaped and V-shaped transfer functions. The source code of the proposed method along with the datasets used can be found at https://github.com/Rangerix/SocialMimic.

considerably [5]. Again, high dimensional datasets have various disadvantages such as larger time requirement for construction of learning model, possible existence of irrelevant and redundant attributes or features, and degraded performance due to redundancy of features which make analysis or classification of the data very difficult. Here comes the importance of feature selection (FS) methods. FS is a data pre-processing step which aims to remove all possible irrelevant and redundant features [6] from the underlying dataset or feature vector, and thereby reducing the storage and time requirement to process the data.
FS is considered as an NP-complete combinatorial optimization problem. Generating all possible subsets of features and evaluating those are not feasible for large datasets. This is because, for a dataset containing n features, 2 n feature subsets will be generated and evaluating all of those requires a huge computational cost. There are randomized algorithms that attempt to search for the optimum feature subset in a randomized manner. On the other hand, a heuristic search strategy performs a guided search which may not always find the optimum solution but tries to produce a near-optimal solution in terms of computational time. Heuristic approaches are classified into two categories -specific heuristics which are designed for a particular problem, and general purposed meta-heuristics which are designed to solve a wide range of problems [7].
Based on the usage of learning algorithm, FS methods can broadly be divided into two categories [8]: filter and wrapper. Filter methods do not use any learning algorithm during elimination (selection) of the irrelevant (important) features, rather use different pre-defined scoring criteria to rank the features indicating their importance in terms of classification ability. Wrapper methods use learning algorithms (such as classifiers) as a part of the selection as well as evaluation of the subset of the selected features in each step of the algorithm. Filter methods are faster but wrapper methods, in general, perform much better [8]. Meta-heuristic methods are mostly wrapper based, since they require a classification algorithm for evaluation of a selected feature subset. In the last decade, meta-heuristic algorithms have become quite popular in solving FS problems also due to their ability to obtain an optimal or near-optimal solution in a reasonable time [9]. Two main characteristics of these algorithms are: exploration or diversification, which is the ability to search the whole solution space when looking for new solution in each iteration by avoiding local optima, and exploitation or intensification, which implies finding a better solution in the neighborhood of the obtained solution, leading to faster convergence. A good meta-heuristic algorithm tries to find a proper balance between exploration and exploitation.
In this work, we have proposed a meta-heuristic FS algorithm. Here, we have introduced a new transfer function and applied this transfer function to a recently proposed meta-heuristic optimization algorithm called Social Mimic Optimization (SMO) algorithm for the purpose of FS. Main contributions of this work are as follows: • A new FS technique is developed following a recently proposed optimization algorithm called SMO.
• A novel X-shaped transfer function is introduced.
• The performance of the new transfer function in combination with SMO is compared with widely used S-shaped and V-shaped transfer functions.
• The proposed FS method is evaluated on 18 standard UCI datasets [10].
• It is also compared with five classical and five recently proposed meta-heuristic based FS methods.
• The performance of the proposed FS method is statistically validated using Wilcoxon rank-sum test [11].
The rest of this paper is organized as follows: Section II provides a brief review about the FS methods and transfer functions found in the literature. Section III provides detailed description of the proposed FS method. The results obtained by the FS versions of SMO are reported in Section IV. Section V provides the comparison results of the proposed model with state-of-the-art FS methods. Lastly, Section VI concludes this work and provides the possible future extension of this work.

II. LITERATURE SURVEY
FS is an optimization problem where the aim is to maximize the classification accuracy of a learning algorithm using minimum the number of features. The role of FS is crucial because it helps us gauge the performance of different machine learning and data mining techniques.
In the past two decades, nature-inspired meta-heuristic algorithms are at the forefront due to number of important factors of these algorithms: easy to adopt, flexible, usage of less mathematical derivation, their ability to avoid local optima. These algorithms have the ability to exploit the information of the population in order to find the optimal solutions. Meta-heuristic algorithms can also be divided into different categories based on different criteria: single solution based and population based [12], nature inspired and non-nature inspired [13], metaphor based and non-metaphor based [14]. From the 'inspiration' point of view, these algorithms can roughly be divided into four categories [15]: Evolutionary, Swarm inspired, Physics based, and Human related.
• Evolutionary algorithms are basically inspired from biology. It utilizes crossover and mutation operators to evolve the initial population, usually selected in a random fashion, over the iterations and eliminates the worst solutions in order to obtain the improved solution. Genetic algorithm (GA) [16] is a well-known method of this category which follows the Darwin's theory of evolution. Co-evolving algorithm [17], Cultural algorithm [18], Genetic programming [19], Grammatical evolution [20], Bio-geography based optimizer [21], Stochastic fractal search [22], Salp swarm algorithm [23], Black widow optimization [24], Barnacles mating optimizer [25] etc. are some well-known evolutionary algorithms. VOLUME 8, 2020 • Swarm inspired algorithms imitate individual and social behavior of swarms, herds, schools, teams or any group of animals. Every individual has its own behavior, but the behavior of the accumulated individuals helps to solve complex optimization problems. One of the most popular algorithms of this category is Particle swarm optimization (PSO) [26], developed by following the behavior of flock of birds. Another notable method of this category is Ant colony optimizer (ACO) [27], inspired from the foraging method of some ant species. Some other methods belonging to this category are: Bacterial foraging [28], Firefly algorithm [29], Grey Wolf optimizer (GWO) [9], Ant Lion optimizer (ALO) [30], Whale optimization algorithm [31], Grasshopper optimization algorithm (GOA) [32], Squirrel search algorithm [33], Harris Hawks optimization (HHO) [34] etc.
• Physics based algorithms are inspired by the rules governing a physical process. The inspiring physical process ranges from music, metallurgy to mathematics, physics, chemistry, and complex dynamic systems. One of the oldest algorithms of this category is Simulated Annealing (SA) [35], developed by following the annealing [36] process of metals present in metallurgy and materials sciences. Another popular method of this category is Gravitational search algorithm (GSA) [37], developed by following gravity and mass interaction. Some other methods of this category are Harmony search (HS) algorithm [38], Black hole optimization [39], Sine Cosine algorithm [40], Multi-verse optimizer [41], Find-Fix-Finish-Exploit-Analyze [42], Atom search optimization [43], Equilibrium optimizer [44] etc.
• Human related algorithms searches for the global optima by following human behavior. Teaching-Learning-Based optimization [45] is one such popular method belonging to this category, developed by following the enhancing procedure of class grade. Some other methods of this category are: Society and civilization [46], League championship algorithm [47], Fireworks algorithm [48], Tug of war optimization [49], Volleyball Premier League algorithm [50], Political optimizer [51]. FS is a binary optimization problem, and transfer functions are required to convert the search space of a continuous optimization algorithm to a binary one. Transfer function generates a probability value based on the position/velocity of a solution and with this probability value, real valued solution is converted to a binary one. Kennedy and Eberhart have proposed binary PSO (BPSO) algorithm, using a sigmoid transfer function [52]. GA is used in [53] for the selection of features in automatic pattern classifier. In [54], the authors have proposed V-shaped transfer function. In [37], binary GSA (BGSA) is proposed using V-shaped transfer function (| tanh(x)|). In [55], the authors have proposed eight binary variants of PSO using four S-shaped and four V-shaped transfer functions. These transfer functions are given in Table 1.
In [56], the authors have proposed six binary variants of ALO using three S-shaped S2, S3, and S4 (as mentioned in  [55] (used for comparison with X-Shaped transfer function). Table 1) and three V-shaped V2, V3, V4 (as mentioned in Table 1) transfer functions. In [57], Dragonfly algorithm is used for FS by utilizing V3 transfer function and applied on 18 standard UCI datasets. In [58], binary variants of GOA is proposed using S1 and V1 transfer functions. HHO [59] is converted to its binary version using S1 and V1 transfer functions and applied on microarray datasets. In [60], the authors have proposed binary variants of Butterfly optimization algorithm using S1 and V2 transfer functions and applied on 21 UCI datasets. In [61], four V-shaped transfer functions V1, V2, V3, and V4 are used to convert GWO into its binary variant for solving FS problems.
Presence of such a significant number of meta-heuristic FS algorithms along with transfer functions, clearly raises the question about the need for (i) another meta-heuristic FS method, and (ii) another transfer function. However, as indicated by No Free Lunch [62] theorem for optimization, there cannot be any single algorithm which will be equally applicable for all the optimization problems desiring optimal solutions. With each new algorithm following any regular or natural phenomenon, researchers primarily aim to provide some new facet to the algorithm where both exploration and exploitation will have a superior trade-off, thereby trying to get away from the local optima and eventually compass to the global optima. Nevertheless, accomplishing these objectives are not straightforward, hence motivating researchers to propose new algorithm that can be applicable to different problem domains. In summary, this is the key reason to the researchers to make an attempt in order to formulate better methods in comparison with the past methods which, thus keeps the research alive in this domain. For a specific problem, in order to discover the best algorithm, the No Free Lunch theorem ought to guide researchers that they have to concentrate on the particular problem at hand, the hypotheses, the priors (additional data), the information and the cost.
For the complex optimization problems, the multi-modal functions are having huge number of dimensions and finding an ideal value for all those dimensions at the same time is almost next-to-impossible. This challenging aspect of the optimization problem prompts researchers to plunge into the field of meta-heuristic strategies where the aim is to get an optimal solution within a reasonable amount of time. FS is considered as an optimization problem -there may exist numerous optimal feature subsets i.e., having same dimension and same precision. Here likewise, it would be extremely 97892 VOLUME 8, 2020 hard to discover an optimal feature set where burden of the extra storage space and running time alongside the performance of the machine learning algorithm would be lessen. In this way, research is still going on by developing new algorithms which can meet these requirements. This has also inspired us to propose a new meta-heuristic FS methodology based on the SMO [63] algorithm.

III. PRESENT WORK
A. SOCIAL MIMIC OPTIMIZATION: AN OVERVIEW SMO algorithm [63] is proposed by following the human behavior. Each individual tries to 'mimic' or assimilate himself/herself to someone more esteemed, more intelligent and more powerful. Accordingly, each solution (analogous to an individual) in an optimization problem moves towards the global optima reached so far by imitating the parameters of that global optima. In this algorithm, Follower represents the population, Follower i represents i th solution in the population, Leader represents the global optima obtained so far. During an iteration, each Follower i calculates the difference between its fitness value and the fitness value of the global optimal using Equation 1.
In next step, each follower i updates itself using Equation 3.
The fitness value of each Follower i is calculated and Leader is updated accordingly. A brief overview of the SMO algorithm is represented in Figure 1. The reason we have chosen this optimization method is because SMO is simple to implement but can produce effective results. Besides, it does not require any inherent parameter in contrary to other popular meta-heuristic algorithms, except only the population size and maximum number of iterations. As a result of this, no parameter tuning is required which itself requires exhaustive experiments to get the optimal values for the parameters of any algorithm. It has already been mentioned that FS is a binary optimization problem [64], where the solution is limited to binary values {0, 1}. Here, a solution is represented using a binary vector where 1 indicates that corresponding feature is selected and 0 indicates otherwise. The size of this vector is equal to number of features in the original dataset. The SMO algorithm is proposed to solve continuous optimization problems where a solution consists of real values. To map the continuous search space of the standard SMO algorithm to a binary one, a transfer function is required [55]. In the literature, there are mainly two types of transfer functions commonly used, which are S-shaped and V-shaped.
In case of S-shaped transfer functions, the solutions are updated based on Equation 4.
where rnd ∈ [0, 1] is a random number, F d i (t + 1) represents the d th dimension of the i th solution (Follower) in (t + 1) th iteration.
In case of V-shaped transfer functions, the solutions are updated based on Equation 5.
and vice-versa. Now, in case of S-shaped transfer function, solution in the next ((t + 1) th ) iteration is modified without considering the impact of solution in the current (t th ) iteration. This may diverge the agents, leading to slower convergence of the algorithm. In swarm inspired algorithms, where the agents are updated based on their velocity values, a big value of velocity in the positive or negative direction shows that the agents should have large movements to reach the optimum position. In contrast, a small value of the velocity indicates insignificant movement. Again, the zero velocity means that the new position should not be changed [54]. Now, these concepts are changed by using the S-shaped transfer function. The value of velocity in the negative and the positive directions creates different values for the new position. Moreover, the zero value of velocity generates either zero or one with probability 0.5 for the new position [54]. Whereas, with V-shaped transfer function, the solution may get stuck in local optima since if low velocities are associated with a particular solution, in next iteration the solution remains the same with high probability. Transfer function performs a key role in helping a binary optimization algorithm to find the optimum solution [55]. In early steps, the exploration is very important to search promising regions and avoid getting trapped in local optima but during the later steps, the exploitation is more essential so that the probability of finding better solutions gets increased. In other words, a balance between exploration and exploitation is essential in order to achieve a good result. In the literature, we have found many such cases where the meta-heuristic strategies need to be enhanced by a local or global search which would able to find the optimal solution [65]- [69].
Considering the limitations of the commonly used transfer functions found in the literature, we have introduced a new transfer function which is X-shaped. Two components, as shown in Figure 2, are used to generate two different results. The best result is chosen and compared with the previous solution. If the new solution is better than the previous one, it will be selected as the next position; otherwise, a crossover operator is applied on the new and previous solution. In this case, the best result of crossover operator is chosen as the new position. Due to crossover, there is a chance where, y i and z i are the binary versions of Follower i generated by Equation 6 and Equation 8 respectively, and rnd1, rnd2 ∈ [0, 1] are random numbers.
Now, if fitness(F i (t +1)) < fitness(F i (t)), then F i (t +1) := F i (t + 1). Otherwise, crossover operation is performed on F i (t + 1) and F i (t). The crossover results in two children where the best one is chosen as the next solution. In this case, the child has a chance to retain the good qualities of the parent F i (t). Uniform crossover [70] has been chosen for crossover operation. This part is summed up in Equation 11.
if fitness(F i (t + 1)) < fitness(F i (t)) In this work, we have compared the performance of the introduced X-shaped transfer function with the performance of eight different transfer functions (four S-Shaped and four V-Shaped transfer functions) when these are used with SMO algorithm. Table 1 shows the mathematical formulas of the eight transfer functions considered here whereas Figure 3 shows their corresponding graphs. Now, FS is a multi-objective optimization problem with two main objectives: achieving maximum classification accuracy and selecting minimum number of features. Since these two goals are opposite in nature, we have considered classification error rate instead of accuracy. These two objectives are then combined into a single one and used as the fitness function, given in Equation 12. Each follower (solution) is assessed by the proposed fitness function which relies on the performance of the K-Nearest Neighbor (KNN) classifier [71] in order to determine the classification error rate and on the number of features selected.
where |F| represents total number of features in the original dataset, |F | represents the number of features in the selected subset, γ (F ) denotes the classification error rate of F using KNN classifier. ω ∈ [0, 1] denotes the importance of classification quality and selected subset dimension.     The time complexity of the proposed method is O(maxIter × popSize × D × t fitness ), where maxIter is the maximum number of iterations, popSize represents the number of followers (individuals), D represents the dimension of the problem in consideration, and t fitness denotes the time requirement for calculating the fitness value of a particular individual using a given classifier. It is to be noted that the usage of X-shaped transfer function instead of S-shaped or V-shaped transfer functions, does not alter the time complexity.

IV. RESULTS AND DISCUSSION
We have used KNN [71] classifier with Euclidean distance metric to measure classification accuracy of the optimal feature subset selected by SMO algorithm. As per the recommendation found in the works described in [64], [72], [73], we have set K = 5. For each dataset, five fold crossvalidation scheme is used for the evaluation purpose. Fundamentally, in k-fold cross-validation, the dataset is divided into k equal partitions (folds) where k − 1 folds are utilized for training and the remaining fold is utilized for testing the classification model. This procedure is iterated for M times. We have applied the FS methods on the train folds and determined which features are to be included in the selected feature subset. From test fold, only those features are selected and test classification accuracy is measured using the KNN classifier. Test fold is completely hidden from the FS method and used for the final evaluation purpose only. This work is implemented using Python3 [74] and graphs are plotted using Matplotlib [75].

A. DATASET DESCRIPTION
For assessing the performance of the proposed FS method, 18 standard UCI datasets [10] are considered. The datasets are selected from various backgrounds. The underlying reason for selecting these datasets is that they are diverse in terms of number of attributes and instances present [76]. The description of these datasets is presented in Table 2. These variances help in establishing the robustness of the proposed method.

B. PARAMETER TUNING
There are two parameters which are very important for any multi-agent evolutionary algorithm: (a) population size and (b) maximum number of iterations to be used to run the algorithm. Population size characterizes how a single agent learns from other agents' experience, and iterations provide step-wise evolution of the agents. In order to find the optimal values for these two parameters, exhaustive experiments have been performed by varying one parameter w.r.t. the other. Figure 4 shows the effect of different population sizes on achieved classification accuracy using SMO algorithm with the proposed X-shaped transfer function. We have decided to set population size as 20 because (i) it is consistent, and (ii) it is able to achieve highest classification accuracy for most of the datasets. Figure 5 shows the values of the fitness function in each iteration using the proposed X-shaped, and the commonly used S-shaped and V-shaped transfer functions. Now, from the computational complexity of the SMO algorithm, mentioned in Section III, it can be observed that either increase in population size or maximum number of iterations, increases the time requirement. Considering both Figure 4 and Figure 5, it has been decided to set the values of population size as 20 and the maximum number of iterations as 30 for further experiments.

C. EXPERIMENTAL RESULTS
In this section, we have discussed about the results achieved by binary SMO algorithm using the proposed X-shaped transfer function and four S-shaped and four V-shaped transfer functions. The details related to these transfer functions are already mentioned in Table 1. We have denoted the binary SMO algorithm with i th S-shaped and j th V-shaped transfer functions (as mentioned in Table 1) as SMOsi and SMOvj respectively. The proposed binary SMO algorithm with X-shaped transfer function is abbreviated as SMOX. Table 3 displays the classification accuracies achieved by the SMOsi, SMOvj, and SMOX methods. Now, from Table 3, it can be observed that the SMOX algorithm has achieved the highest accuracy for all the utilized 18 UCI datasets. The SMOX algorithm is able to achieve 100% classification accuracy for nine cases (50%) which are: Breastcancer, Con-gressEW, Exactly, M-of-n, PenglungEW, SonarEW, Vote, WineEW, and Zoo. For BreastEW dataset, it has achieved the second best classification accuracy of 99.12%. In case of Exactly2, Tic-tac-toe, and WaveformEW datasets, the SMOX algorithm has achieved 80.5%, 82%, and 84.4% classification accuracies respectively. Table 4 displays the number of features selected by the SMOsi, SMOvj, and SMOX algorithms. From Table 4, it can be observed that the proposed SMOX algorithm has selected the minimum number of features for eight datasets which are: CongessEW, KrvskpEW, M-of-n, PenglungEW, Tic-Tac-Toe, Vote, WineEW and Zoo. However, the second best performing algorithm is found to be SMOs4 algorithm which selects the minimum number of features for five datasets: BreastCancer, BreastEW, Exactly2, IonosphereEW, and Zoo. Figure 6 displays the average accuracies achieved by the nine (four SMOsi, four SMOvj, SMOX ) binary variants of SMO algorithm over the utilized 18 UCI datasets. It can be clearly seen that the SMOX algorithm has achieved the highest classification accuracy among other binary variants. On an average, the SMOX algorithm has achieved about 96% classification accuracy. Figure 7 shows the average number of features selected by the nine binary variants of the SMO algorithm. From Figure 7, it can be observed that the SMOX algorithm has selected the lowest number of features in most of the cases. Upon averaging over the utilized 18 UCI datasets, it can be said that the proposed SMOX algorithm has selected < 10 features.

D. STATISTICAL ANALYSIS
To determine the statistical significance of the proposed SMOX algorithm, a non-parametric statistical test, known as Wilcoxon rank-sum test [11], has been performed. This is done in order to check whether the results of an algorithm are statistically different from other algorithms [77]. The null hypothesis states that the two sets of results are from the same distribution, therefore any difference in the two mean ranks comes only from sampling error. If the distributions of two results are statistically different, then the generated p-value from the test statistics will be < 0.05 (level of significance), as we have performed the test at 0.05% significance level, resulting in the rejection of the null hypothesis.
Here, we have deployed Wilcoxon test to prove that the obtained results by the proposed SMOX algorithm is statistically different from the obtained results by both the SMOsi and SMOvj methods. For every datasets, each of the binary variants has been made to run 20 times and the accuracies obtained by the SMOX algorithm is compared with each of the SMOsi and SMOvj methods via Wilcoxon test. The p-values obtained for pair-wise comparison of the SMOX, SMOsi and SMOvj algorithms on 18 UCI datasets are provided in Table 5.

V. COMPARISON
In section IV, the proposed X-shaped transfer function has already proved its superiority in comparison to other transfer functions. In this section, we have compared the proposed SMOX algorithm with some popular meta-heuristic FS methods present in literature.

A. COMPARISON WITH CLASSIC META-HEURISTIC FS METHODS
Here, we have compared the results obtained by the SMOX algorithm with five traditional state-of-the-art approaches which are widely applied to solve FS problems in the literature. These approaches are GA, PSO, ALO, GSA, and HS. The values of the control parameters considered for these five methods are mentioned in Table 6. Table 7 shows the performance of the SMO algorithm as compared to the above mentioned five methods both in terms of classification accuracies achieved and number of features selected. The SMOX algorithm has achieved better classification accuracy than BGA method for 15 cases as well 97900 VOLUME 8, 2020 as achieved same accuracy in 3 cases. In comparison to BGA, the SMOX algorithm has selected lowest number of features in 6 cases and same number of features in 6 cases. The SMOX algorithm has achieved better classification accuracy than BPSO in 16 cases and achieved same accuracy in 2 cases. Considering selected number of features, the SMOX algorithm has 9 wins and 3 ties with BPSO method. As compared to both BALO and BGSA methods, the SMOX has achieved better accuracy for all the 18 cases. In terms of selected number of features, the SMOX algorithm has 14 wins and 2 ties with BALO method and 10 wins with BGSA method. In terms of classification accuracy, the SMOX algorithm outperforms Binary HS algorithm in 17 cases and achieved same classification accuracy for only PenglungEW dataset. Figure 8 illustrates the average accuracies achieved by the proposed SMOX algorithm and five state-of-the-art FS methods considered here. From Figure 8, it can be observed that the SMOX algorithm has achieved the highest classification accuracy. Considering all the 18 UCI datasets, the SMOX algorithm has achieved > 95% classification accuracy. Figure 9 provides the average number of features selected by SMOX and five state-of-the-art FS methods. It can be seen from Figure 9 that the proposed SMOX algorithm has selected the lowest number of features w.r.t. all the methods considered. It can also be observed that the SMOX algorithm has selected < 8 features. This proves the robustness of the proposed SMOX algorithm.
To prove the statistical significance of the results obtained by SMOX as compared to the state-of-the-art FS methods, we have also performed Wilcoxon rank-sum test for pairwise comparison of the proposed SMOX with other methods. In Table 8, the obtained p-values for each pair of methods are provided, with p < 0.05 marked bold.

B. COMPARISON WITH RECENT META-HEURISTIC FS METHODS
In this section, we have compared the results obtained by the proposed SMOX algorithm with five recently proposed  meta-heuristic FS methods such as SSDs+LAHC, SSDv+LAHC, AβBSF, bBOA-S and BGWOPSO. SSDs+LAHC and SSDv+LAHC [69] are proposed by hybridizing the social ski driver (SSD) algorithm and late acceptance hill climbing (LAHC), and using a S-shaped transfer function S1 (as referred in Table 1) and a V-shaped VOLUME 8, 2020  transfer function V3 (as referred in Table 1). AβBSF [78] is proposed by hybridizing sailfish optimizer with adaptive β-hill climbing algorithm. bBOA-S [60] is developed by following the recently proposed butterfly optimization algorithm (BOA) [79]. BGWOPSO [80] is developed by hybridizing both PSO and GWO methods. The parameter details of these methods considered for experimentation are mentioned in Table 9. Table 10 shows the performance of the SMOX algorithm as compared to the above mentioned five FS methods both in terms of classification accuracies achieved and number of features selected. In terms of classification accuracy achieved, the SMOX algorithm is able to perform the best for almost 17 datasets. In case of BreastEW dataset, it performs the second best, following AβBSF method. For 11 datasets (61.11%), the SMOX algorithm has selected the lowest number of features. Figure 10 shows the average classification accuracies achieved by SMOX and the five recent metaheuristic FS methods considered here. It clearly shows that the SMOX algorithm has achieved the highest average classification accuracy over all the 18 UCI datasets. Figure 11 shows the average number of features selected by SMOX and the five recent meta-heuristic FS methods considered. Now, from Figure 11, it can also be observed that the SMOX algorithm has selected the lowest number of features over all the 18 UCI datasets.  To prove the statistical significance of the results obtained by the SMOX algorithm in comparison to the recently proposed meta-heuristic FS methods considered here, we have again performed Wilcoxon rank-sum test for pair-wise comparison of the SMOX with five recent meta-heuristic FS methods. In Table 11, the p-values obtained for each pair of methods are provided, with p < 0.05 marked bold. Table 11 clearly proves the statistical significance of the proposed SMOX algorithm.
A meta-heuristic algorithm can fail to find the optimal subset if (i) it cannot find the 'promising' area where the optimal solution (global optima) may lie, and converges to local optima, or (ii) it is unable to properly search the promising areas discovered, and fails to converge, or (iii) both. We have tried to address both these issues in the proposed SMOX algorithm. The proposed X-shaped transfer function utilizes two different components as well as crossover operation, thereby enhancing the search ability of the SMOX algorithm. 97902 VOLUME 8, 2020

VI. CONCLUSION AND FUTURE DIRECTIONS
In this work, we have proposed a new transfer function which inherently utilizes crossover operation thus helping the optimization algorithm to properly find any region where the global optima may lie. We have chosen a competent metaheuristic algorithm called SMO, which is proposed recently following the human behavior of mimicking/copying more esteemed individuals. The SMO algorithm itself requires no such parameter to tune, since the agents simply follow the best agent found so far. We have compared the effect of the proposed X-shaped transfer function with four S-shaped and four V-shaped transfer functions commonly used in the literature while converting the continuous search space of SMO algorithm to a binary one. Publicly available 18 standard UCI datasets have been considered to assess the performance of our proposed algorithm. The comparison clearly displays the superiority of X-shaped transfer function both in terms of achieved classification accuracy and reduction of feature dimension. Hence, it can be concluded that the X-shaped transfer function aids SMO algorithm to search for the possible region towards achieving global optima. Finally, the proposed FS algorithm, SMOX (SMO with X-shaped transfer function) is compared with both five state-of-theart FS methods and five recently proposed meta-heuristic FS methods. The experimental results show that the SMOX is able to achieve higher classification accuracy with lower number of features in both the cases. This, in turn, indicates that the SMOX is able to effectively search the feature space and find the optimal solution better than other FS methods. Statistical significance of the obtained results is also performed using Wilcoxon rank-sum test.
However, having the same stochastic nature as other metaheuristic FS algorithms, as per No Free Lunch theorem [62], the SMOX is not guaranteed to produce outstanding results for all FS problems. As future scope of this work, we can apply the proposed X-shaped transfer function on different state-of-the-art FS methods. We can also apply SMOX on different real world problems, like musical symbol recognition, facial emotion recognition, handwritten digit/character/word recognition, etc. It would be interesting to investigate the performance of SMOX on high-dimensional datasets such as Microarray datasets. Enhanced initialization techniques can be thought of where the algorithm starts with an initial population closer to the global optima. We can also hybridize this algorithm with other population based meta-heuristic algorithms.
ZONG WOO GEEM (Member, IEEE) received the B.Eng. degree from Chung-Ang University, the M.Sc. degree from Johns Hopkins University, and the Ph.D. degree from Korea University. He researched at Virginia Tech, the University of Maryland at College Park, and Johns Hopkins University. He is currently an Associate Professor with the Department of Energy IT, Gachon University, South Korea. He invented a music-inspired optimization algorithm, harmony search, which has been applied to various scientific and engineering problems. His research interest includes phenomenon-mimicking algorithms and their applications to energy, environment, and water fields. He has served for various journals as an Editor (Associate Editor for Engineering Optimization and a Guest Editor for Swarm and Evolutionary Computation, the International Journal of Bio-Inspired Computation, the Journal of Applied Mathematics, Applied Sciences, and Sustainability).