Improved Reptile Search Optimization Algorithm Using Chaotic Map and Simulated Annealing for Feature Selection in Medical Field

The increased volume of medical datasets has produced high dimensional features, negatively affecting machine learning (ML) classifiers. In ML, the feature selection process is fundamental for selecting the most relevant features and reducing redundant and irrelevant ones. The optimization algorithms demonstrate its capability to solve feature selection problems. Reptile Search Algorithm (RSA) is a new nature-inspired optimization algorithm that stimulates Crocodiles’ encircling and hunting behavior. The unique search of the RSA algorithm obtains promising results compared to other optimization algorithms. However, when applied to high-dimensional feature selection problems, RSA suffers from population diversity and local optima limitations. An improved metaheuristic optimizer, namely the Improved Reptile Search Algorithm (IRSA), is proposed to overcome these limitations and adapt the RSA to solve the feature selection problem. Two main improvements adding value to the standard RSA; the first improvement is to apply the chaos theory at the initialization phase of RSA to enhance its exploration capabilities in the search space. The second improvement is to combine the Simulated Annealing (SA) algorithm with the exploitation search to avoid the local optima problem. The IRSA performance was evaluated over 20 medical benchmark datasets from the UCI machine learning repository. Also, IRSA is compared with the standard RSA and state-of-the-art optimization algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Grasshopper Optimization algorithm (GOA) and Slime Mould Optimization (SMO). The evaluation metrics include the number of selected features, classification accuracy, fitness value, Wilcoxon statistical test ( $p$ -value), and convergence curve. Based on the results obtained, IRSA confirmed its superiority over the original RSA algorithm and other optimized algorithms on the majority of the medical datasets.


I. INTRODUCTION
Disease detection and diagnosis critically depend on the classification of biomedical datasets. Classifying such datasets can detect complex diseases such as COVID-19, Tumors, etc. The early detection of such diseases increases the survival rate [1]. In biomedical sciences, the diseases categorized are classified based on various features [2]- [4]. The biomedical datasets are rapidly growing, resulting in high dimensional The associate editor coordinating the review of this manuscript and approving it for publication was Li He .
features [5]. In some cases, these features are redundant, inefficient, or embedding the same classification effect as others [6]. A robust ML classifier is required to reduce the complexity and the time taken to classify these features [7]. The ML classifier is suffers from redundant, inefficient and biased features [8]. Thus, FS is an important component of the ML processes [9].
Feature selection (FS) has an important role in ML as a pre-processing phase, pruning the redundant and irrelevant features and selecting the most relevant ones. This process can be accomplished by excluding the features that may negatively impact classifier performance, such as unrelated, redundant, and less-informative features [10]. FS has been applied widely in many applications, image segmentation [11], image processing [12], medical diagnosis [13], cancer detection [14], text recognition [15] and more. Based on the literature, the FS technique has four basic steps, including (1) creating the feature subset, (2) evaluating the feature subset, (3) defining the stop condition, and (4) validating the selected subset [16]. According to the evaluation criteria, FS techniques are divided into two main Approaches: Filter Based Approach (FBA) and Wrapper Based Approach (WBA).
The FBA is an approach to filter the feature subsets based on static evaluation tests. The filtration processes of the subset features are independent of the ML classifier [17], [18]. The Pearson's Correlation, Chi-squared test, and Linear Discriminant Analysis (LDA) are examples of FBA approaches, where filtering is performed before the application ML classifier with no direct contact with the classifier [19]. Unlike the Wrapper-Based Approach (WBA) which is connected directly to the classifier [20]. The WBA is an approach that evaluates the subsets of features to find the possible correlation between the features based on the applied ML classifier [5]. A WBA is computationally expensive, but it has better results when compared to FBA [21], [22].
Commonly, WBA is used for FS problems because it considers the classification performance, and the feature reduction conditions, in addition to its ability to interact directly with the classifier. Furthermore, WBT minimizes the search area; as a result, the classification performance improves, and the selected features decline, as illustrated in [23]. In WBA, the fitness function is applied to evaluate the FS process depending on the classification accuracy [24]. Based on the literature, the WBA is commonly categorized into three main groups: Forward Feature Selection (FFS), Backward Feature Elimination (BFE), and Recursive Feature Elimination (RFE) [25]. The FFS is an iterative process in which the model starts with no features, then in each iteration, new features are added until the performance no longer improves the model. BFE is a backward elimination that starts with all features and eliminates the lowest significant feature in each iteration; as a result, the model performance improves. Finally, the RFE is a greedy optimization algorithm that repetitively builds models and keeps aside the best or the worst performing feature at each iteration. It then creates the new model with the remaining features until all the features are consumed. After that, features are classified based on the order of their elimination. Several researchers have been using WBA methods in optimization algorithms to solve the problem of feature selection [5], [9], [24]. However, the typical inclusive search aimed to find all possible combinations of features from the total set of features, is considered a time-consuming search and is referred to as the Nondeterministic Polynomial problem, known as an NP-hard problem [26]. The above reasons along with the powerful WBA characteristics urged this study to utilize WBAs for feature selection problems.
Based on the literature, optimization algorithms have been used to solve FS problem based on WBA, such as the Chimp Optimization Algorithm (COA) was improved in wrapper-mode for feature selection [5], the Dragonfly Algorithm (DA) with Evolutionary Population Dynamics and Adaptive crossover was developed in wrapper-mode for Feature Selection [27], the butterfly optimization algorithm (BOA) was developed in wrapper mode for feature selection [28], the particle swarm optimization was improved in wrapper mode for feature selection [29], and the Whale Optimization Algorithm (WOA) was combine with simulated annealing in wrapper mode for feature selection [30]. The main purpose of using optimization algorithms in FS is to find the optimal features combination or those close to the optimal features within a reasonable time. The wrapper mode helps to evaluate the classification accuracy based on the classifier [20], in this work KNN classifier is used.
However, optimization algorithms suffer from local optima and population diversity problems when dealing with highdimensional problems, such as the FS problem [10], [30]- [32]. Additionally, according to ''No-Free Lunch'' (NFL) theorems, some algorithms achieve high performance in a particular problem and display low performance in another [33]- [35]. Therefore, designing new optimization algorithms and developing existing ones is one of the great interests of researchers in this field of study. Reptile Search Algorithm (RSA) is one of the newest optimization algorithms [36]. RSA is a wildlife-inspired metaheuristic algorithm that mimics Crocodiles' encircling and hunting behavior. RSA's unique search strategies demonstrated superior results over other optimization algorithms. However, RSA is limited by the problem of population diversity and local optima when applied to high-dimensional feature selection. The reasons cited above, and RSA characteristics motivated the researchers of this study to improve RSA in wrapper mode for feature selection problems.
This research proposes a novel algorithm named Improved Reptile Search Algorithm (IRSA). The goal of IRSA is to improve classification performance for feature selection problems in medical datasets and solve the limitation of the standard RSA algorithm. To solve the weaknesses of the standard RSA algorithm and adapt it to the FS problem, the following improvements are introduced to the RSA algorithm. In the initialization phase of IRSA, the chaotic map algorithm is used to initialize the solutions (search agents). IRSA is expected to achieve a faster convergence rate and generate a wider range of solutions due to the proposed version. Furthermore, to avoid local optima and improve RSA exploitation ability, IRSA combined the SA algorithm with the local search capabilities of the RSA. A number of hybrid optimization algorithms have been presented in the literature to solve feature selection problems. However, to the best of the authors' knowledge, there is no previously published work on improving RSA with a chaotic map and the SA algorithm for feature selection problems. The contributions of this work are summarized as follows: 1) IRSA: a modified variant of the RSA algorithm intended to solve its weaknesses and provide better performance in feature selection. 2) The standard RSA has been improved in two main ways, including: • The chaotic maps are used in the initialization phase of RSA to improve its solutions diversity.
• Improve the exploitation and avoid local optima, simulated annealing (SA) is combined with RSA.
3) The IRSA algorithm is developed in wrapper mode for feature selection problems. 4) To evaluate the performance of the IRSA algorithm, the experiments are conducted on 20 UCI medical datasets with various dimensionalities. In addition, IRSA results are compared with original RSA and four well-known optimization algorithms including: Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Grasshopper Optimization Algorithm (GOA) and Slime Mould Optimization (SMO). The number of features, classification accuracy, fitness values, P-value, and convergence rate are used as evaluation metrics. The rest of the article is organized as follows: Section 2 presents a review of related works. Section 3 provides a brief description of the RSA, Chaotic Maps (CM), and Simulated Annealing (SA), The proposed algorithm IRSA is illustrated in Section 4. Section 5 describes the datasets used and experimental details, and Section 6 illustrates the experimental results and discussion. Finally, Section 7 concludes the article.

II. RELATED WORK
A meta-heuristic algorithm is a higher-level sequence of programmable instructions that performs a specific task and provides a sufficiently good solution to an optimization problem within a reasonable time [37]. The meta-heuristic optimization algorithms contain two main phases: (1) exploration (global search) and (2) exploitation (local search). Exploration is the ability to search for solutions in the search space globally. Its ability is associated with escaping and preventing being trapped in local optima. The exploitation is the ability to search locally for a more optimal solution. Good performance is obtained by achieving an optimal balance between these two phases. All population-based algorithms use these features but with different operators and structures [38]. Metaheuristics are categorized into three main classes: swarm intelligence optimization algorithm, evolutionary optimization algorithm, and physics-based optimization algorithm. The RSA is a new swarm intelligence optimization algorithm. The Swarm Intelligence Optimization algorithm (SIO) is a meta-heuristic algorithm that mimics animals' social behavior in groups (e.g., Crocodiles, Whales, Wolves, etc.).
The main feature of SIO is the ability to share the information from multiple sources during the optimization process [39]. The most popular algorithm that belong to this class is the PSO algorithm which was developed by Eberhart and Kennedy [40]. PSO simulates the behavior of birds flying together in flocks. Other examples of this type include Whales Optimization Algorithm (WOA) [41], Grey Wolf Optimizer [42], Harris Hawks Optimization (HHO) [43], Salp Swarm Algorithm [44] and others.
Recently, Optimization Algorithm (OA) has been applied in various applications to solve high-dimensional feature selection problems. OA achieved significant improvement in classification accuracy and reduced the number of selected features in various applications. Examples of these recent applications are WOA developed in wrapper mode for feature selection problem [45], Also WOA improved for feature selection in Arabic sentiment analysis [15], Butterfly Optimization Approaches (BOA) developed in binary mode for feature selection. Reference [46], Salp Swarm Algorithm (SSA) is developed based on opposition and new local search mechanism for feature selection [23], Antlion optimization (ALO) similarly developed in wrapper mode for feature selection [47], moreover, PSO is hybrid with spiral shaped algorithm for feature selection [29], GOA was improved using opposition-based learning for feature selection [48], Equilibrium Optimization Algorithm (EOA) was improved using Elite Opposition-Based Learning method and new local search strategy for feature selection [20] and many more. Although each optimization algorithm embraces its unique structure, there are some common characteristics: the search agent initialize a random population (solutions) as the primary process and set the best solution so far, then on each iteration the new solutions are evaluated based on the defined fitness function, after that, the best solution is chosen based on a termination criterion [49]. All optimization algorithms perform exploration and exploitation phases. The imbalanced trade-off between exploration and exploitation slows the convergence speed towards the optimal solution [50]. The original RSA may still not achieve an optimal balance between local and global search, especially when applied for feature selection in high dimensional datasets. The algorithm's imbalanced behavior causes slow convergence and quickly falls into local optima problems. Thus, two main improvements need to be applied in RSA. The first improvement is to enhance the population diversity of the algorithm by applying a Chaotic map to the initial solution. The second improvement is improving the local search by combining SA with the local search strategy in RSA.
The Chaotic Map (CM) is a dynamic system [51]. This system is one of the modern methods used in the literature to solve the population diversity problem and low convergence speed in the optimization algorithm. It is a useful method for searching for global optimum solutions in a search space [52]. Chaos Optimization Algorithm (COA) uses the benefit of the chaotic structures in several applications as reported [53]. It had been proven that changing the random parameter values with a chaotic system can enhance classification [54]. Therefore, several efforts contributing to optimization algorithms have involved chaos theory to improve performance and adjust specific parameters. Examples of these implementations are the Harris Hawks Optimization (HHO) [55], where the chaotic map was applied to improve the initial solution of HHO. Also, Chaotic Crow Search Optimization (CCSA) [52], where a chaotic map was also applied to improve the convergence speed and prevent the local optima problem. Additionally, Chaotic Grasshopper Optimization Algorithm (CGOA) to accelerate the global convergence speed of GOA algorithm [56]. As well as the Chaotic Whale Optimization Algorithm (CWOA) using the chaos maps to improve the global convergence rate and enhance the algorithm performance of WOA algorithm [51]. Similarly, Chaotic Salp Swarm Algorithm (CSSA) algorithm examined a chaotic map to improve the local optima problem and low convergence. Chaotic Gray Wolf Optimization (CGWO) where the chaotic system was applied to accelerate the global convergence rate [54]. These algorithms have all embedded chaos maps to improve the global optimization, used in different fields and applications. The reported results verified noticeable improvements after integrating the chaos maps to these algorithms.
All of these have encouraged our research to explore the effect of combining chaos maps with RSA to improve population diversity. In this work, Circle chaotic map value replaced the randomly generated values for initializing the Reptile positions at the initialization phase. It is worth mentioning that different types of chaotic maps were applied to the optimization algorithm [55]. Examples of these maps are Singer, Sinusoidal, Chebyshev, Circle, Tent, Sine, Piecewise, Logistic, Iterative, and Gauss/mouse. These maps, with their statistical equations, are used in several applications. These maps significantly increase the convergence rate and the fitness performance of the algorithms, as reported in several studies [57]- [60]. However, the circle map outperforms other chaotic maps in several studies [61], [62]. In addition, the Circle map provided high stability with high classification performance and a small number of features [57], [63], [65]. Therefore, we utilized Circle chaotic map to improve the diversity of solutions at the initialization phase of RSA.
On the contrary, the next phase intends to enhance the search process for local regions rather than all feature spaces. Usually, exploitation is performed after the exploration phase [66]. In most complex applications, optimization algorithms are trapped in local optima due to the incorrect balance between the exploitation and exploration and the randomization nature of the initialization process. Based on the literature, it has been found that many optimization algorithms use the Simulated Annealing (SA) algorithm to enhance the local search strategy. In our work, SA is proposed to solve the RSA local optima problem, specifically for high dimensional FS. SA was presented in 1983 by Kirkpatrick et al. [67]. It is considered a hill-climbing method that enhances the candidate solution for the objective function. SA algorithm was used to improve the exploitative capability of the algorithm and prevent local optima problems. Many optimization algorithms used SA to enhance the local search strategy. Examples of these implementations such as: the hybridization of PSO with SA for feature selection [68]. The hybridization of SA algorithm with Moth-Flame Optimization to increase the advantage to improve its exploitation capability [69]. Another example is the hybridization of Whale Optimization Algorithm with SA to improve the WOA exploitation for feature selection [70]. Also, the hybridization of the Salp Swarm Algorithm (SSA) with SA Algorithm to adjust the balance between exploration and exploitation of SSA algorithm [71]. Finally, Monarch Butterfly Optimization (MBO) with SA strategy to improve the convergence speed of MBO algorithm. The unique structure and performance obtained by employing the SA in these previous studies inspired this research to include the SA algorithm in the iteration process to enhance the RSA local search. SA is proposed to solve the RSA local optima problem.
Reptile Search Algorithm (RSA) is a new natural-inspired meta-heuristic optimizer [36]. This algorithm is inspired by Crocodiles' encircling and hunting behaviours in the wild.
The key difference between the RSA algorithm and other optimization algorithms is that RSA has a unique method to update the search-agent locations using four new methods. For instance, the act of surrounding is conducted by high-walking or belly-walking, and the Crocodiles communicate or collaborate to perform hunting. RSA attempts to generate powerful search methods that can produce better quality results and get new solutions that can help solve complex real-life issues. However, as reported by the author, RSA successfully solves Artificial Landscapes Functions (ALF) and real-world engineering problems compared to other popular optimization algorithms. The ALF are benchmark mathematical functions used to evaluate the performance of optimization algorithms. Furthermore, although RSA is considered to be a random population optimization algorithm, it is prone to issues such as population diversity and local optima when dealing with high-dimensional features. These reasons and the RSA characteristics motivated this study to improve the performance of the RSA to adapt for the feature selection problem. The following section provides an overview and background about the RSA algorithm.

III. BASICS AND BACKGROUND
RSA is a novel optimization algorithm developed by Abualigah et al. [36], which mimics the Crocodile's encircling and hunting behaviour. The Crocodiles are semi-aquatic reptiles with unique physical characteristics such as lined body shape, the ability to raise their legs to the side when they walk, the belly walk, and the swim. These characteristics allow them to become powerful hunters in the wild. This section describes VOLUME 10, 2022 the exploration and exploitation capabilities of the RSA, which is based on the smart encircling and hunting of the prey. Furthermore, the mathematical functions and Pseudo-code of the algorithm are covered. The RSA is a population-based and gradient-free method that can solve complex and simple optimization problems subject to specific constraints.

1) INITIALIZATION PHASE
In this phase, the initial candidate solutions are generated based on chaotic maps as in Eq. (1). Also, the search-space and the objective function are defined. As well, all parameter values are set before computation.
where X is a represent the candidate solutions produced by using Eq. (2), and x i,j indicate the jth search-agent position of the ith solution, and N is the number of potential solutions, n indicates the size of the problem.
where the rand is an initiation value. Also, the UB and LB are defined, which specify the upper and lower bounds of the given problem, respectively.

2) EXPLORATION PHASE (ENCIRCLING)
In this phase, the exploratory behaviour (encircling) of RSA is discussed. Two strategies Crocodiles perform during their encircling process: high walking and belly walking. These movements refer to different approaches, which are committed to representing the algorithm's exploration capabilities (global search). Crocodile movements (high walk and belly walk) prevent them from catching the prey due to their noise unless they employ another search mechanism (exploration phase). Hence, the exploration search discovers a wide search space; it can find the promising area maybe after several searches. The RSA balanced exploration (encircling) and exploitation (hunting) search according to four conditions; break the total number of iterations into four parts. Exploration mechanisms in RSA concentrate on two major search strategies (high walking and belly walking) to explore the search space and find a better solution. The high walk strategy is defined by t ≤ T 4 , and the belly walk motion strategy is defined by t ≤ 2 T 4 and t > T 4 . This means the condition will be met for almost half the number of exploration iterations (High walk) and another half for the (Belly walk). The position updating formula is presented for the exploration phase as shown in Eq. (3).
where Best j (t) presents the jth position in the best-achieved solution so far, rand refers to an integer between 0 and 1, t is the current iteration number, and T stands for the maximum number of iterations. η (i,j) identifies the exploration operator of the jth position in the ith solution, calculated by Eq. (4). β is a critical parameter, that guides the exploration accuracy for the encircling (i.e., High walking) through iterations, inherited from the original RSA which is set to 0.1 value. R (i,j) is an amount applied to reduce the search area, calculated by Eq. (5). r1 is a random number between [1, N ], and x r1,j refer to a random position of the ith solution. Evolutionary Sense ES (t) is a random ratio between [2, −2] describe the probability of decreasing values throughout the iterations, calculated by Eq. (6).
where r 2 is a random number between [1, N ] and a small amount. In Eq. (6), 2 is the correlation value used to give values between 2 and 0, r 3 which implies to a random integer number between [1, −1]. P (i,j) corresponding to the difference between the jth position of the best-obtained solution and the jth position of the current solution, calculated by Eq. (7).
where M (x i ) stands to the average positions of the ith solution, calculated by Eq. (8). UB (j) and LB (j) are the boundaries of the jth position, respectively. α is a critical parameter, guides also the exploration accuracy for the hunting cooperation over the course of iterations, which set to 0.1 value in this work.

3) EXPLOITATION PHASE (HUNTING)
In this phase, the exploitative behaviour (hunting) of RSA is introduced. Two strategies Crocodiles perform during their hunting process: cooperation and coordination. These strategies simulate the exploitation search (Local search), formulated as in Eq. (9). The strategy for hunting coordination in  (4), (5) and (7), respectively. 12: if (t ≤ T 4 ) then 13: x (i,j) (t + 1) else if (t ≤ 2 T 4 and t ≤ T 4 ) then 15: x where Best j (t) is the jth position in the best-found solution so far, η (i,j) implies to the hunting parameter for the jth position in the ith solution, calculated by Eq. (3). P (i,j) is the difference between the jth position of the best-found solution and the jth position of the current solution, calculated by Eq. (6). η (i,j) implies to the hunting parameter for the jth position in the ith solution, which is calculated using Eq. (3). R (i,j) (t) is an amount applied to reduce the search area in the current iteration, calculated by Eq. (4).

B. CIRCLE CHAOTIC MAP
Chaos theory is commonly used in optimization algorithms to optimize the diversity of initialized solutions. The population diversity represents possible solutions, parts of a solution, or some structure that can be easily transformed into a solution. In the literature review, this optimization algorithm is a population-based algorithm, meaning it is start solving the problem by initializing a random solution and then start to evaluate this solution based on the defined criteria. In order to initialize this solution, we need to use a search-agent (in this work the search-agent is the reptiles). On the original algorithm RSA, the search-agents start with a random position and generate random solutions. These random solutions are considered as population diversity, and causes a population diversity problem. In this work, we use the Circle Chaotic Map function to set the location of this search-agent. The improvement of initialized solutions using chaotic map increases the performance of algorithms. Moreover, chaos theory can explore the search space more thoroughly than random search [72]. However, in order to make the initial population as effective as possible, it is important to leverage solution space as much as possible. This work applies Chaos theory's Circle Map (CM) to initialize the IRSA to improve population diversity. The Circle map is a one-dimensional function extracted from the circle itself. Mathematically, it is VOLUME 10, 2022 equivalent to a point in the circle line, assumed as starting point x that calculated modulo 2π, to identify the angle of the point in the circle [73]. The modulo of two numbers are given, a similar remainder when divided by the same number. When the modulo is taken with a value other than 2π the result still represents an angle but must be normalized so that the whole range between [0,2π] as proved by [73]. In this implementation, the CM control variables are set to a = 0.5 and b = 0.2. The mathematical model of the CM is computed as in Eq. (10).
where n refers to the symbol of chaotic sequence x, and x n is the nth chaotic number of chaotic sequences. As defined earlier, the b and a are controlling variables that help identify the chaotic performance. The CM value replaced the Crocodiles random initial position's (search-agent) values in the IRSA.

C. SIMULATED ANNEALING
The Simulated Annealing (SA) algorithm was used by several optimization algorithms to improve exploitative capability and to prevent local search problems, As illustrated in the  literature review. In this work, to avoid the local optima stagnation problem of the original RSA, the SA is applied at the end of each RSA iteration to improve the best solution. Where the best solution will be accepted, and the worst solution will be taken with a well-defined probability to avoid local optima. The Boltzmann probability function determines the likelihood of choosing a worse solution as in Eq. (12). were eis the energy of the system, T is a parameter (named temperature) that periodically decreases throughout the search process the decreasing rate is α = 0.99, thus in next iteration T = T − α. The ratio of probabilities of two states is known as the Boltzmann factor, which is computed by the fitness function between the best solution (Best Sol ) and the generated solution (Generated Sol ). In this experiment, all SA parameters are based on the cooling schedule [74] and adopted as in Yarpiz.com [75].

IV. THE PROPOSED IMPROVED REPTILE SEARCH ALGORITHM (IRSA)
In this study, a novel IRSA for feature selection is proposed. The proposed IRSA is a hybrid of the original RSA with chaos theory and the SA algorithm. The aim of this improvement is to increase the classification accuracy and decrease the number of selected features. However, the original RSA has two noteworthy drawbacks when used to solve high-dimensional problems, such as feature selection. These drawbacks include the diversity of initial solutions and local optima problems. Therefore, two modifications are suggested to the RSA to overcome the feature selection problem. The first improvement includes integrating the chaotic maps, specifically, Circle Map (CM) at the initialization phase to improve RSA solutions diversity. The second improvement is combining the SA algorithm to the exploitation phase of the RSA to improve the local search. The details of these improvements are presented in this section as follows.
In the IRSA algorithm, the CM value will replace the stochastic values of initializing the RSA population positions at the initialization phase. The chaotic values are generated from the Circle chaotic map. This map notably increases the convergence speed and the fitness performance of the RSA, as will be presented later in the experimental result and discussion section.
Furthermore, the second improvement is to combine the SA in the IRSA to enhance its exploitation capabilities. After implementing CM and finding the best solution, SA is used to improve the current best solution at the end of each RSA iteration. The pseudocode of the proposed CHHO algorithm is illustrated in Algorithm 1.

A. FITNESS FUNCTION
In this work, the proposed fitness function is used to calculate the classification accuracy of each solution as well as the number of selected features. Each solution is computed VOLUME 10, 2022 according to a proposed fitness function that depends on a K-Nearest Neighbor (KNN) classifier in wrapper mode (Altman, 1992). However, after the candidate solution is initialized, the fitness value is calculated to be saved as the best solution so far. Then, in each iteration, a fitness function is computed following the exploration and exploitation of the current best position. It is assumed that the fitness value of the new position (solution) is better than the current position. As a result, the best solution is replaced by the improved solution, and a neighbourhood search is performed. This process is repeated until stopping criteria is performed. The proposed fitness function is utilized as in Eq. (13) where αγ R (D) refer to the classification error rate of the used classier KNN. Furthermore, R is a number of the selected subset, and N is the total number of features in the dataset, α, and β are two parameters corresponding to the importance of classification quality and subset length, α ∈ [0, 1] and β = (1 − α) approved in [76] and [70]. The Pseudo-code of the proposed IRSA algorithm is explained in Algorithm 1. Additionally, the flowchart of the proposed IRSA is presented in Figure 1.
where, T is the number of iterations, N presents the number of solutions, and Dim refers to the solution size.

V. EXPERIMENTAL RESULTS AND DISCUSSION
The experimental details will be discussed in this section. In addition, this section presents the evaluation performance and validation criteria of the proposed IRSA. In this context, the IRSA algorithm was compared with some well-known and new optimization algorithms, including PSO, GA, GOA, and SMO. The experiments were conducted over 20 benchmark medical datasets from the UCI machine learning repository. In the following steps, the datasets and experiment details are presented.

A. DATASETS DETAILS
In this work, all the experiments were performed on 20 medical benchmark datasets from the UCI repository. The UCI repository is a popular machine learning repository contend a benchmarked datasets and have been used in several to evaluate the optimization algorithms. The details of the used datasets are presented in Table 1. Also, the experiment was conducted on PC with setting as Table 2.

B. ALGORITHMS AND EXPERIMENTS PARAMETER SETTING
A KNN classifier based on a wrapper method (k-fold crossvalidation) was used to validate the fitness performance of the proposed algorithm. The validation technique utilizes k-1 folds to train and one fold to test. The parameter settings of the baseline optimization algorithms PSO, GA, GOA, and SMA are also considered as in Table 3. Furthermore, for all algorithms, the search agent was set to 10, and the maximum number of iterations was set to 100. The classification accuracy was selected as a critical metric for evaluating and validating the optimization algorithms performance. In addition, the statistical measures are computed for each algorithm after performing 30 runs. Also, the parameters of the RSA are specified as α is set to 0.1 and β is set to 0.005 by experiments.

C. RESULTS AND DISCUSSION
This section demonstrates the effectiveness of the proposed IRSA by performing two main experiments. The first experiment included the comparison of the proposed IRSA with the standard RSA. The second experiment involved the comparison of IRSA with state-of-the-art algorithms, such as PSO, GA, GOA, and SMA. In all conducted experiments, each algorithm was utilized on all the datasets to verify the solidity of the algorithm within feature dimensionalities. Additionally, the reported results are based on computing the average of 30 runs for every experiment.

1) THE COMPARISON OF RSA AND IRSA
In this section, the proposed IRSA is compared to the original RSA. There are four metrics used in this comparison: VOLUME 10, 2022 classification accuracy, number of selected features, fitness value, and Wilcoxon statistical test (p-value). Table 4. displays the experimental results of IRSA in comparison to the original RSA algorithm, the best results are underlined.
To determine whether the classification accuracy of IRSA is statistically improved, the p-value is computed, where the improvement is considered statistically significant if the p-value is smaller than 0.05; otherwise, it is not.
The results show that IRSA has a higher classification accuracy than RSA for the majority of the datasets, while it provided similar accuracy to RSA in one dataset, as illustrated in Table 4. Accordingly, there is no doubt that the application of CM and SA to IRSA enhances its classification performance. In terms of the number of selected features, IRSA outperformed the original RSA by reducing the number of selected features by 61.18 % across all datasets. In addition, IRSA performed better than RSA in all datasets in terms of fitness value. According to the classification accuracy the IRSA significantly outperforms the RSA in 16 datasets. The overall results of classification accuracy, feature selection, and fitness values and p-value on most datasets indicate the remarkable improvement accomplished by IRSA.
In addition, the results displayed in Table 4, show that the enhancement introduced in the initialization phase using the CM method, improved the candidate solution, instead of using the random solution in the original RSA. The possible reason is that the improved population diversity from random solutions to chaotic solutions using CM balances the convergence speed towards the optimal solution. Also, the enhancement in the exploitation phase with SA provided a better solution. These superiority results prove the IRSA algorithm capability of avoiding the local optima problem and solving the feature selection problem.

2) COMPARISON OF IRSA ALGORITHM WITH OTHER OPTIMIZATION ALGORITHM
Prior experiments have demonstrated the superiority of IRSA, especially in terms of classification accuracy and fitness value, over the original RSA. This advantage is the result of improving population diversity and maintaining an appropriate balance between exploration and exploitation to prevent local optima. Therefore, to validate the advantage of IRSA, an extended comparison was performed between IRSA and well-known and recent optimization algorithms like PSO, GA, GOA and SMA. To compare the performance of IRSA to the other optimization algorithms, the same evaluation metrics were also used. First, the classification performance was evaluated for the considered algorithms, as illustrated in Table 5. Based on the results achieved, IRSA outperformed the other optimization algorithms over all datasets in terms of classification accuracy. The significant results are bolded, while the GOA obtained the last accuracy, PSO ranked a second higher classification accuracy after IRSA with less accuracy 0.59 %, then followed by GA, SMO, GOA with less accuracy respectively. The classification accuracy results of IRSA and all compared algorithms presented in Table 5.
The second evaluation metrics used to evaluate the IRSA performance is the average number of selected features. The best results are bolded in Table 6. Based on the results achieved, IRSA outperformed the other optimization algorithms with the lowest number of selected features in 16 datasets, while GA ranked as second-best performance successful in 4 datasets. The overall ranked results POS, GOA, and SMO show increasing numbers of selected features with 2.85%, 4.15%, 4.8%, respectively.
The third evaluation metrics used to evaluate the IRSA performance is the average fitness value. The fitness function is calculated based on the KNN classifier. The fitness value calculated is based on the classification error rate of the KNN classifier, number of selected features and original number of features as presented in Eq. (13). Low fitness value means that the proposed solution obtains good results towards optimal solutions, as this research aims to minimize the features not maximize. The results show that IRSA outperforms all other optimization algorithms in all selected datasets. The PSO ranked as second-best fitness value followed by GA, SMO, GOA respectively. The results presented in Table 7.
The fourth evaluation metrics used to evaluate the IRSA performance is the Wilcoxon statistical test or p-value. The Wilcoxon test was applied to verify the significance of classification accuracy, as displayed in Table 8, the best results are bolded. The significant results were verified, with a p-value < 0.05. IRSA shows significant improvement over all selected algorithms and on the majority of datasets. IRSA outperformed the GOA and SMO in all datasets, while it performed significantly in 18 datasets over GA algorithm and 14 datasets over PSO algorithm. The significant results are presented in Table 8, with bold font. These significant results proved the superiority of IRSA over all the other algorithms. The results signify the capability of IRSA to balance exploration and exploitation. Moreover, it has a better chance of avoiding the trap of local optima, which ultimately leads to a significant improvement in the classification accuracy of IRSA.
Furthermore, the IRSA performance was evaluated based on convergence curves. The convergence curves measure the average fitness value among the iterations. Graphical representation of the convergence curves among all selected optimization algorithms and datasets are illustrated in Figure 2. Based on the results obtained, it is observed that the IRSA outperformed all other algorithms in convergence curves. Also, it is observed that the performance of PSO is ranked VOLUME 10, 2022  as second-best convergence curves among the datasets. This superiority came from the improvement implemented in the initialization and exploitation phases. The enhancement is done in the initialization phase by applying the chaotic map to accelerate the convergence speed among all iterations. The improved population diversity from random solutions to chaotic solutions balances the convergence speed towards the optimal solution. Also, the enhancement in the exploitation phase provided a high fitness value. These superiority results are a clue of the higher algorithm capability to avoid the local optima problem and solve the feature selection problem.

3) THE LIMITATIONS OF IRSA ALGORITHM
The superiority of IRSA comes from the improvements introduced to the RSA algorithm. Improving the exploration phase (global search) controls the algorithm's population diversity. At the same time, the improvement of the exploitation phase (local search) prevents the local search problem. However, this has some limitations; applying the SA algorithm in each iteration to select the best solution and avoid the local optima problem increases the execution time of the algorithm. As the results show, the average time of algorithm run reaches 6.4 % higher than the second-best algorithms PSO. It is worth mentioning that the choice of optimization algorithm (and its parallelization) highly depends on the properties of the objective function and constraints.

VI. CONCLUSION AND FUTURE DIRECTIONS
The Reptile Search Algorithm (RSA) is a novel populationbased optimization algorithm. RSA is inspired by the swarmbased comparison meta-heuristic algorithm that mimics the Crocodiles' encircling and hunting behavior in the wild. This study proposes an improved version of RSA, named IRSA, which adds two main improvements to the original RSA: (1) applying the chaos theory at the initialization phase of RSA to enhance its exploration capabilities in the search space. And (2) combining the Simulated Annealing (SA) algorithm with the exploitation process to avoid the local optima problem. These two improvements substantially increased the exploration and exploitation search capability of IRSA. Specifically, the use of a Circle chaotic map improves the population diversity, whereas the SA algorithm avoids trapping in local optima. Additionally, these two improvements to IRSA provide a good balance when transferring between exploration and exploitation search. The performance of IRSA was evaluated over 20 medical benchmark datasets from the UCI repository. Moreover, IRSA was compared with other well-known and recent optimization algorithms, including PSO, GA, GOA, and SMA. Four evaluation metrics were used in the comparison: classification accuracy, fitness value, number of selected features, and pvalue. According to these metrics, IRSA is superior to all other algorithms. Furthermore, the results also indicated that IRSA was capable of improving the computational accuracy and accelerating the convergence rate. In addition, the results showed that IRSA was able to minimize the number of features selected for the majority of the datasets. Based on the obtained results, IRSA can be employed as a technique for real-world application. For future work, IRSA could be further developed based on the filter feature selection method used in conjunction with IRSA to deal with realworld datasets. Finally, IRSA could possibly be applied to developing other optimization algorithms.

ACKNOWLEDGMENT
The contributors would like to acknowledge an editor, a reviewer, and a Prof. Mohammad Tubishat for his valuable comments.