MbGWO-SFS: Modified Binary Grey Wolf Optimizer Based on Stochastic Fractal Search for Feature Selection

Grey Wolf Optimizer (GWO) simulates the grey wolves’ nature in leadership and hunting manners. GWO showed a good performance in the literature as a meta-heuristic algorithm for feature selection problems, however, it shows low precision and slow convergence. This paper proposes a Modified Binary GWO (MbGWO) based on Stochastic Fractal Search (SFS) to identify the main features by achieving the exploration and exploitation balance. First, the modified GWO is developed by applying an exponential form for the number of iterations of the original GWO to increase the search space accordingly exploitation and the crossover/mutation operations to increase the diversity of the population to enhance exploitation capability. Then, the diffusion procedure of SFS is applied for the best solution of the modified GWO by using the Gaussian distribution method for random walk in a growth process. The continuous values of the proposed algorithm are then converted into binary values so that it can be used for the problem of feature selection. To ensure the stability and robustness of the proposed MbGWO-SFS algorithm, nineteen datasets from the UCI machine learning repository are tested. The K-Nearest Neighbor (KNN) is used for classification tasks to measure the quality of the selected subset of features. The results, compared to binary versions of the-state-of-the-art optimization techniques such as the original GWO, SFS, Particle Swarm Optimization (PSO), hybrid of PSO and GWO, Satin Bowerbird Optimizer (SBO), Whale Optimization Algorithm (WOA), Multiverse Optimization (MVO), Firefly Algorithm (FA), and Genetic Algorithm (GA), show the superiority of the proposed algorithm. The statistical analysis by Wilcoxon’s rank-sum test is done at the 0.05 significance level to verify that the proposed algorithm can work significantly better than its competitors in a statistical way.


I. INTRODUCTION
The optimization process is existing in several research areas such as engineering, medical, agriculture, computer science, and feature selection. In optimization, the main target is to select the optimal solution of a given problem from the available solutions concerning the problem description. Moreover, in optimization algorithms, there is a target that should be minimized or maximized according to the problem to be solved [1], [2]. Filter, wrapper, and hybrid-based are the main The associate editor coordinating the review of this manuscript and approving it for publication was Jiju Poovvancheri . categorize of feature selection techniques [3]. The filter-based feature selection techniques or traditional feature selection techniques have an advantage that it is speed and ability to scale to a large dataset. The process of feature selection is often most useful in situations in which wrappers may over-fit such as Information Gain (IG). IG measures how much information a feature can give us about the class and it is useful in reducing the number of features that can give more accuracy in classification model [4].
The search space for selecting features is reduced in the wrapper technique which is accurate but needs much time to include learning algorithms as a part of the select function. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Genetic algorithms (GA) are randomly based algorithms on the process of natural selection underlying biological evolution. They can be applied to many challenges, optimization, machine learning problems, and feature selection [5]. To do wrapper feature selection, one needs to utilize an optimization algorithm, however, the classical optimization techniques are somehow restricted in solving the problems. Thus, the evolutionary computation (EC) algorithms are considered as an alternative in searching for the problems' optimum solution and solving the mentioned limitations. Swarm-based algorithms are inspired by nature, biological behavior, and social behavior of animals, birds, whales, bat, grasshopper, firefly, salp, fish, wolves, etc. [6]- [9]. Many kinds of research used optimization to solve a given problem such as the Whale Optimization Algorithm (WOA) [10], [11]. WOA can be used to find the optimal weights to train the neural network. A multi-objective version of WOA is evolved and applied to the problem of forecasting the wind speed in [12]. Another algorithm is the Grey Wolf Optimizer (GWO). GWO is an optimization algorithm that simulates the grey wolves in nature [2], [7], [13]. GWO has the advantages of simplicity, flexibility, deprivation-free mechanism, and the ability to avoid the local optima. Because of that, it has been used in many research areas in the last years such as feature subset selection [1], DC motors control [14], [15], solving optimal reactive power dispatch problem [16], financial crisis prediction [13], and in some applications, the GWO algorithm was used to train the Multilayer Perceptron (MLP) network [17]. For the problem of feature selection, the solution can be represented as a vector of features with size n, which is the number of features and the vector items can be binary values with 1 (the feature is included) and 0 (the feature is not included). Hence, GWO starts with an initial random population of vectors holding randomly selected features. Then, using the exploration and exploitation capabilities, GWO can find the optimal subset of features. The wrapped feature selection methods have a learning algorithm to evaluate the selected subset of features quality [7].
Recently, to solve the feature selection problems, a binary GWO algorithm is integrated with a multi-phase mutation in [7] based on the wrapper methods. In [18], a multi-strategy ensemble GWO is proposed. This method overcomes the single search strategy limitation of GWO in solving function optimization problems. Another research proposed a hierarchy strengthened GWO (HSGWO) algorithm in [19] for solving large-scale problems. To improve the accuracy of identification, a chaos-based grey wolf optimization (EGWO) algorithm is proposed in [20] to find the optimal feature sets. Hybrid algorithms are also proposed for improving the GWO performance for different applications. In [21], a fusion between Particle Swarm Optimization (PSO) exploitation ability with the GWO exploration ability is proposed. Their algorithm was evaluated based on benchmark functions and real-world problems. Another research proposed a hybrid of GWO with a Crow Search Algorithm (CSA) (GWOCSA) in [22]. This hybrid algorithm combines both algorithms' strengths to generate a promising solution for achieving global optima efficiently.
In this paper, a Modified Binary GWO based on Stochastic Fractal Search (SFS) is proposed. The proposed algorithm achieves the exploration and exploitation balance in the identification of the main features. First, a modified GWO is developed by applying an exponential form for parameter a of the original GWO to increase the search space and crossover/mutation operations to increase the diversity of the population. Then, the SFS diffusion process is applied for the modified GWO, the best solution, by using the Gaussian distribution method for random walk in the growth process. The continuous values of the proposed algorithm are then converted into binary values so that it can be used for the problem of feature selection. To ensure the stability and robustness of the proposed MbGWO-SFS algorithm, nineteen datasets from the repository of the UCI machine learning are tested including two datasets with more than 500 attributes. As a preprocessing step, the class imbalance of the datasets is solved using the LSH-SMOTE algorithm [5] to improve the processing time. Compared to the binary versions of the-state-of-the-art optimization techniques of the original GWO [1], SFS [23], PSO [24], hybrid of PSO and GWO [21], Satin Bowerbird Optimizer (SBO) [25], WOA [26], Multiverse Optimization (MVO) [27], and Firefly Algorithm (FA) [28], in addition to, GA [29] and hybrid of GA and GWO, the results show the superiority of the proposed algorithm. In the experiments, the K-Nearest Neighbor (KNN) [30] is used for classification tasks to measure the quality of the selected subset of features. The statistical test of Wilcoxon's rank-sum is done at the 0.05 significance level to determine the significant difference between the results of the proposed algorithm and the other comparison algorithms in a statistically way. This paper is organized into seven sections. The related work is presented in Section II. Section III shows the background of the basic mechanisms used in this work. The proposed algorithm (MbGWO-SFS) is described in detail in Section IV. Sections V and section VI show the evaluation metrics and the experimental results. Lastly, conclusions are stated in Section VII.

II. RELATED WORK
The optimizer of the grey wolf has been applied in the literature for different research directions such as face recognition, gene selection, electromyography classification, diagnoses of diseases, interference detection systems, and feature selection. The binary form of GWO can be used for feature selection and classification problems efficiently [36]- [38]. Table 1 shows a summary of some binary GWO algorithm in the literature. Binary GWO algorithms have been introduced in [31], [32] to select the subset of features for wrapper feature selection and classification. In these algorithms, a KNN classifier was used as a fitness function to evaluate the selected features subsets. Eight benchmark datasets were applied from the machine learning repository for evaluation. The methods were compared with PSO and GA algorithms to show the effectiveness of their proposed methods in the experiments in terms of accuracy and reduction in the number of features. Another binary GWO wrapper method was presented in [33] to classify cancer on gene expression data. They used classifiers with cross-validation based on a decision tree C4.5. Ten microarray cancer datasets were used to evaluate their method and a comparison with Self-Organizing Map (SOM), MLP, and Support Vector Machine (SVM) was provided.
Recently, authors in [1] proposed a binary GWO based on PSO and they used the KNN classifier. They have assessed the performance of their method by using eighteen standard benchmark datasets from the repository of machine learning and compared their proposed method with different optimization approaches such as PSO, GA, and GWO to prove the enhancement in computational time, classification accuracy, and the number of selected features. In [34], a method based on Bag-of-Keypoint Features (BoKF) model and Binary GWO (BGWO) is proposed to distinguish nucleolar and centromere staining patterns. Authors in [35] introduced five transfer functions to get the binary values from the continuous values. They proposed an updating equation for the a parameter to balance between the local and global search.
Stochastic Fractal Search (SFS) was proposed firstly in [23] based on the fractal concept, which is a self-similarity property of objects. A chaotic SFS (CSFS) algorithm was introduced in [39] to improve SFS performance. This method integrated ten chaotic maps into the original SFS algorithm. The algorithm random scheme is replaced by the chaotic maps to enhance the accuracy of the solution and convergence speed of the original SFS. Recently, a modified SFS (MSFS) algorithm was proposed in [40] to solve the problem of economic load dispatch. In this method, the power system constraints are taken into consideration. A Multi-Objective SFS (MOSFS) algorithm was proposed to solve complex multi-objective optimization problems for the first time in [41].
The binary GWO still suffers from achieving a high exploration capability. By creating new particles based on the diffusion procedure of SFS, which employed the Gaussian distribution method for random walk in the Diffusion Limited Aggregation (DLA) growth process, a high exploration capability can be achieved. A series of Gaussian walks participating in the diffusion process around the best solution − → G α can be listed and checked to get the best solution. This increases the capability of exploration in the proposed MbGWO based on the diffusion process of the SFS algorithm to get the best solution.

III. BACKGROUND A. GREY WOLF OPTIMIZER
Grey wolf optimizer simulates the wolves' movements in the process of searching for prey. Wolves usually live in packs where a pack consists of from 5 to 12 wolves. One pack has four different kinds of wolves named alpha, beta, delta, and omega wolves [42]. The alpha wolves are making decisions in each pack. The beta wolves help the alpha wolves in making decisions. The delta wolves submit to alpha and beta. The omega wolves submit to other wolves. The GWO algorithm is shown in Algorithm 1 step by step.
Mathematically, the best solution is named the alpha ( are the second and third best solutions. Other solutions are indicated as omega ( − → G ω ). During the process of catching the prey as shown in Fig. 1, alpha, beta, and delta wolves guide other wolves as denoted in Equations (1, 2, 3, and 4).
where t is the current iteration, where the components of − → a are decreasing linearly from 2 to 0 throughout iterations, and vectors − → r 1 , − → r 2 are random values ∈ [0, 1]. The parameter − → a is updated and controls the balance of the exploration and exploitation processes [42]. The − → a values are computed as in the following equation [42]: where M t is the available number of iterations for the optimizer.
The three best solutions, to change their positions toward the estimated position of the prey as shown in Figure 1. Equations (6, 7, and 8) show the process of positions updating.
Genetic algorithm (GA) is based on some techniques such as inheritance, mutation, crossover, and selection which are inspired by evolutionary biology. The algorithm uses the chromosomes/genes representation of living organisms [43]. In GA, a solution x ∈ ζ is an individual for ζ as the search space. Each chromosome x consists of discrete units or genes Update individual positions based on Eq. 8 10: end for 11: Update ( − → a ) by Eq. 5 12: Calculate the fitness function F n for each − → G i 14:

15:
Set t = t + 1. (increase counter). 16: end while 17: return − → G α is started randomly and the individuals are then generated. Crossover and mutation operators, as shown in Fig. 2, are used to get new generations and then all the individuals are evaluated to select the best individuals for the next iteration.
The GA has the following challenges: • The agents are moved randomly in the entire search space, thus the algorithm may select sub-optimal solutions.  • The exploration capability of the GA algorithm is very limited and it may trap into local minimum which is not the best solution (global minimum).
• The algorithm has slow convergence due to the encoding and decoding steps and more recent optimization algorithms are easier to be implemented than GA.

C. STOCHASTIC FRACTAL SEARCH
Using the characteristics of the original fractal method, a meta-heuristic algorithm can be inspire based on the random fractals in time consumption and accuracy [23]. To find a solution for a given problem, the basic Fractal Search (FS) method uses the following three simple rules 1) A particle can have electrical potential energy.
2) Each particle can diffuse and other random particles can be created. The original particle energy is distributed among the new particles. 3) In each generation, a few best particles are remaining and other particles are discarded.
Stochastic Fractal Search (SFS) was proposed based on the mathematical model of the fractal [23]. The author proposed a Fractal Search (FS) algorithm using the DLA method, which is employed to generate fractal-shaped objects. Figure 3 (a) shows a sample of random fractal generated by the DLA method. The main SFS structure consists of three processes of diffusion, first and second update processes to overcome the disadvantages of the FS algorithm. Figure 3 (b) presents the diffusion process in the SFS algorithm. A series of Gaussian walks participating in the diffusion process around the best solution (best particle) BP which can be listed around this best solution as BP 1 , BP 2 , BP 3 , BP 4 , BP 5 .

D. K-NEAREST NEIGHBOR
In this work, a wrapper approach based on the K-Nearest Neighbor (KNN) classifier, a supervised learning algorithm, is used for feature selection [30]. In KNN, each sample is classified into a specific class label based on the majority of its K neighbors. To decide the class of the unknown instance, KNN uses training instances instead of building models.
In our experiments, KNN is used for classification tasks to measure the quality of the selected subset of features. The Euclidean distance, Euc D , between features of the training data and features of the testing data is calculated to determine the nearest K neighbors to a sample as follows where Train_F i is a feature in the training data, Test_F i is a feature in the testing data, and k is the number of features.

IV. MbGWO-SFS: MODIFIED BINARY GREY WOLF OPTIMIZER WITH STOCHASTIC FRACTAL SEARCH
This section shows the Modified binary Grey Wolf Optimizer (MbGWO) with the Stochastic Fractal Search (SFS) in detail. Also, the fitness function that is used to measure the quality of the original GWO solutions and the proposed algorithm solutions is presented. The proposed MbGWO-SFS algorithm is explained in Algorithm 2 step by step.

A. MODIFIED GREY WOLF OPTIMIZER
The process of finding the global minimum is a challenging task. GWO uses exploration and exploitation to do its job. GWO achieves the balance between exploration and exploitation, to avoid stagnation in local optimum and to converge on the global minimum, using the two parameters of − → A and − → a . The value of − → a decreases linearly from 2 to 0 during iterations according to Eq. 5. Thus, part of the iterations are associated to exploration (| − → A | > 1) and the remaining part is associated to exploitation (| − → A | < 1).

1) EXPONENTIAL FORM
To achieve the balancing between exploration and exploitation, Eq. 5 is changed so that the value is decreasing exponentially throughout iteration as shown in Eq. 10. By apply this exponential change, the number of iterations that can be used for exploration is increased and hence the proposed modified GWO achieves higher exploration of the search space for more iterations. Figure 4 illustrates the difference between a linear and exponential change of the value of − → a which indicates that the exploration is achieved for a greater number of iterations.
where iteration number in denoted as t and the optimizer total number of iterations are denoted as M t .

2) CROSSOVER AND MUTATION
The crossover is the operation that combines information of the different solutions to generate a new offspring, which is the way to generate new solutions from an existing population. The crossover operation increases the diversity of the population and enhances exploitation capability. A singlepoint crossover, cp i , i = 0 to N − 1, is chosen randomly for  a number with N bits. The offspring of the three suggested solutions of ( consists of the pre-cp i section from first solution followed by the post-cp i section of the next one as shown in Fig. 5. The following equation represents the crossover process The mutation operator changes one or more components of the offspring randomly. This is used to prevent premature convergence. The mutation operation is employed to enhance the position of a specific solution around randomly selected leaders. The positions are then updated as shown in Fig. 5 based on a random point mp i , i = 0 to N − 1, which is chosen randomly for the offspring number with N bits. the following equation represents the crossover process where − → G 3 represent the updated position after the crossover and mutation processes.
To summarize, two different modifications are presented in this subsection to the original GWO. The first modification enforces the parameter − → a to change exponentially and hence increases the number of iterations for exploration. The second modification is based on applying the crossover and mutation processes to the solutions of The crossover operator enhances the exploitation process while the mutation operator enhances the exploration process. By merging these modifications, the proposed modified GWO has a higher exploration and exploration capabilities than the original GWO.

B. SFS DIFFUSION PROCESS
To create new particles based on the diffusion procedure of SFS, the Gaussian distribution method is employed for random walk in the DLA growth process. A list of generated walks in the diffusion process according to the best solution − → G α can be calculated as:

C. BINARY OPTIMIZER
The problem of feature selection is so special because the search space is limited to two binary values 0 and 1. Hence, the traditional continuous version of an optimizer should be modified to work properly for this problem. Here a technique is presented to convert the continuous values of the proposed optimizer (MbGWO-SFS) to binary values, so that it can be used for the feature selection problem. To convert the standard the continuous values to binary values, the following form will be applied as shown in the proposed Algorithm 2.

16:
Apply Mutation Process from Eq. 12 to get updated positions end for 18: for (i = 1 : i < n + 1) do 19: Apply Diffusion Process from Eq. 13 to get − → Convert updated solution to binary using Eq. 14.

24:
Calculate the fitness function F n for each − → G i 25:

26:
Set t = t + 1. (increase counter). 27: end while 28: return − → G α 0.5 is used to decide whether the value of the dimension will be zero or one.

D. FITNESS FUNCTION
Fitness function is used to measure the quality of the optimizer solutions. The fitness function depends on two factors: the number of selected features and the classification error rate. The solution is considered to be good if it selected a subset of features that give a lower classification error rate and a lower number of selected features. To evaluate the quality of each solution, the following equation will be used where E(D) is the error rate for the classier, s is the number of selected features, f is the total number of features and VOLUME 8, 2020

V. EVALUATION METRICS
The following metrics are used to evaluate the effectiveness of the proposed MbGWO-SFS algorithm. Assume that: M is the number repetitions of runs of an optimizer for the feature   selection problem; g * j is the best solution at the run number j; N is the number of tested points.
• Average Error is calculated to show the accuracy of the classifier in giving the selected feature set. Average Error can be calculated as where C i is the label of the classifier output for point i, and L i is the label of the class for point i, and Match calculates the matching between two inputs.
• Average Fitness is the selected features average size to the total number of features in the dataset (D). Average Fitness is calculated from the following equation where size(g * j ) is the size of the vector g * j . • Mean is the average of the solutions output from running an optimizer for several times M . It can be calculated as • Best Fitness is the minimum fitness function of an optimizer running for several times M . Best Fitness can be calculated as • Worst Fitness is the worst solution found by an optimizer running for several times M . Worst Fitness can be calculated as VOLUME 8, 2020 • Standard Deviation (SD) is the obtained best solutions variation which can be found by running an optimizer several times M . SD is an important indicator of the stability and robustness of an optimizer. An optimizer's ability to converge to the same solution is indicated by a smaller SD. SD can be calculated as where Mean is the average defined in equation 18.

VI. EXPERIMENTAL RESULTS AND DISCUSSION
To evaluate the quality and effectiveness of the proposed MbGWO-SFS algorithm, nineteen datasets from the repository of the UCI machine learning are tested. The datasets are selected with various number of attributes, instances, and classed to represent different kind of issues that the proposed algorithm can be tested on, with two datasets have more than 500 attributes. Table 2 shows the description of the UCI datasets that are used in the experiments. Each dataset is divided into three randomly equal-size parts of training, validation, and testing. The training part is used to train the KNN classifier during the learning phase. The validation is used to test when calculating the fitness function for a specific solution and the testing part is used to evaluate the proposed model efficiency. Table 3 shows the configuration of the proposed algorithm in the experiments. Each optimizer is run 20 times for 80 iterations and the number of search agents is set to 10. For the KNN classifier, the number of k-neighbors is 5 and the value of the k-fold cross-validation is set to 10. The parameters of h 1 and h 2 in the fitness function are assigned to 0.99 and 0.01, respectively. Table 4 shows the configuration of the compared algorithms in the experiments. The proposed (MbGWO-SFS) algorithm is compared in the experiments to different optimization algorithms with single and combined mechanisms. The single mechanisms are the binary versions of the techniques of GWO [1] (bGWO), SFS [23] (bSFS), PSO [24] (bPSO), SBO [25] (bSBO), WOA [26] (bWOA), MVO [27] (bMVO), FA [28] (bFA), and GA [29] (bGA), where b indicated binary output of the algorithm. The binary version uses the Sigmoid function with x represents the algorithm output. The combined mechanisms such as a hybrid of PSO and GWO (bGWO-PSO) [21], a hybrid of GA and GWO (bGWO-GA), and the MbGWO algorithm without applying the diffusion processes of the SFS algorithm are also applied to the tested datasets to clarify the effectiveness of the proposed algorithm these three mechanisms are introduced. Seven different experiments are conducted to evaluate the performance of the proposed MbGWO-SFS optimizer. The performance metrics of average error, average select size, average fitness (Mean), best fitness, worst fitness, standard deviation fitness, and the processing time are evaluated for different optimization techniques during the experiments.
The results of the average error, the average select size, and the average fitness (Mean) for the optimization techniques are shown in Table 5. The lower error indicates that the optimizer has selected the proper set of features that can train the classifier and produce a lower error on the hidden test data. Note that, the lowest error is achieved by the proposed (MbGWO-SFS) algorithm for the Hepatitis, Ionosphere,  bGA, and bWOA showed lower error for Blood, Lymphography, and Tic-Tac-Toe datasets. The proposed algorithm uses the crossover operator to move toward the optimal solution, which contains the optimal subset of features, that minimizes the error.
The average selected features from Table 5 shows the effectiveness of the proposed algorithm. Although, choosing a lower number of features indicates that the optimizer performs feature selection, maintaining lower error is important. Thus, the fitness function assigns a higher weight for the classification error and encourages the optimizer to choose the lower number of features. The MbGWO-SFS algorithm can find the least number of channels for most of the datasets and can get the lower classification for them. However, MbGWO-SFS chooses a higher number of features for (Seeds and Lymphography) datasets and it maintains the smallest error for these datasets. The bGWO and bGWO-PSO algorithms show better results for Seeds and Lymphography datasets. Table 5 also shows that the proposed algorithm can find the lowest fitness value for all datasets except for Vertebral, Parkinsons, Blood, and Tic-Tac-Toe datasets which are better achieved by bGWO-PSO, bGWO-GA, and bWOA. This means that MbGWO-SFS can select the optimal subset of features that give the lowest classification error. The reason for this high performance is the cooperative nature of the individuals of the GWO which utilizes the proposed modification of − → a parameter and the mutation operator to highly explore the search space for different solutions. Moreover, the proposed crossover and the diffusion procedure of the SFS algorithm enhances the exploitation process.
The results of the best fitness, the worst fitness, and the standard deviation fitness of different optimization techniques are shown in Table 6. From the table, the proposed MbGWO-SFS algorithm can find the best fitness compared to other optimization techniques throughout runs. However, bGWO-GA, MbGWO, and bGWO-PSO algorithms achieved better results for Blood, Lymphography, and Titanic datasets. On the other hand, MbGWO-SFS can not find the worst fitness that proves the capability of the proposed algorithm to find the optimal subset of features compared to other techniques in any of the tested datasets even in the higher dimensions datasets of HAR Using Smartphones and ISO-LET. Table 6 also outlines the standard deviation for statistical results. The proposed MbGWO-SFS algorithm has the lowest standard deviation compared to other algorithms that prove the stability and robustness of the proposed algorithm in most of the datasets. The Seeds, Breast-Cancer, Ring, Waveform, Mofn datasets get better standard deviation by other optimizations techniques including bMVO, bGWO-GA, and bGWO-PSO algorithms.
The last experiment investigates the processing time that is required by different optimization techniques as shown   in Table 7. As a preprocessing step for the proposed algorithm, the problem of class imbalance that may occur in some datasets is solved by applying the LSH-SMOTE [5] algorithm to improve the processing time. The lower processing time in most cases indicates that the optimizer finds the optimal subset of features in less time. The proposed optimizer has competitive results compared to other algorithms for the higher dimensions datasets of HAR Using Smartphones and ISOLET. The bPSO and bGA achieved better processing time for the Blood and Towonorm datasets. The faster convergence time as shown in Fig. 7 proves the high exploitation capability of the proposed optimizer and the ability to avoid local optima. This proves the robustness and reliability of the MbGWO-SFS algorithm in finding the optimal subset of features in a reasonable amount of time.
As average values for all the tested datasets according to different optimization techniques, Figure 8 outlines the averaged error, the average size, the average mean, the best fitness, the worst fitness, and the standard deviation fitness overall the nineteen datasets. This figure shows the stability of the proposed algorithm compared to other algorithms. Figure 9 shows the performance of test data averaged processing time overall the datasets using the selected features from the different optimization techniques. Note from these figures that, the proposed MbGWO-SFS algorithm is performing better than most of the other optimization techniques.
To summarize the results of seven different experiments, the proposed MbGWO-SFS algorithm outperforms other optimization techniques in most datasets. The proposed algorithm achieved the average standard deviation of (0.0685), the average error of (0.3831), the average select size of (0.4356), the best fitness of (0.8052), the mean fitness of (0.6918), the worst fitness of (0.9621), and the average processing time of (111.3980) acquired over all datasets. This is due to the high exploration and exploitation of the MbGWO-SFS which allows it to find the best subset of features. This confirms the robustness and reliability in the classification tasks for various datasets in finding the optimal subset of features. VOLUME 8, 2020

A. WILCOXON'S RANK-SUM
The test of Wilcoxon's rank-sum is done here to get the p-values of the proposed MbGWO-SFS algorithm in comparison to other meta-heuristic algorithms. This test helps to determine if the results of the proposed algorithm and other algorithms have a significant difference or not. If the p-value < 0.05, it means that the proposed algorithm results are significantly different from the compared algorithms. Otherwise, a p-value > 0.05 means that the results have no significant difference. Table 8 shows the results of p-value where the worst values that are greater than 0.05 are underlined. Note from the table that, the p-values obtained between the proposed algorithm and other algorithms using this test are smaller than 0.05. This shows the superiority of the MbGWO-SFS algorithm and that the algorithm is statistically significant.

VII. CONCLUSION AND FUTURE DIRECTIONS
This paper proposed a modified binary GWO algorithm based on a stochastic fractal search technique (MbGWO-SFS) that is used with the KNN classifier to select the optimal subset of features for different problems by achieving the exploration and exploitation balance. The modified GWO was developed first by applying an exponential form of parameter − → a of the original GWO to increase the search space for exploitation and the crossover/mutation operations to increase the diversity of the population for exploitation. The SFS technique diffusion process was then applied using the Gaussian distribution method for a random walk for the best solution of the modified GWO. Finally, the continuous values of the proposed algorithm were converted into binary ones by a Sigmoid sunction to use it for the problem of feature selection. The stability and robustness of the proposed MbGWO-SFS algorithm were investigated in the experiments using nineteen datasets from the UCI machine learning repository. The results were compared to the optimization techniques of MbGWO, bGWO, bSFS, bPSO, the hybrid of PSO and GWO (bGWO-PSO), bGA, the hybrid of GA and GWO (bGWO-GA), bSBO, bWOA, bMVO, and bFA. The results showed the superiority of the proposed MbGWO-SFS algorithm. In the future work, the proposed algorithm will be tested for continuous problems, constrained engineering problems, and another binary problem such as EEG problem and also binary problems with more than 1000 attributes. The authors will try to improve continuous MbGWO-SFS and validate the performance of the proposed algorithm at CEC2017 or CEC2019.