A Binary Multi-Objective Chimp Optimizer With Dual Archive for Feature Selection in the Healthcare Domain

Medical datasets frequently include vast feature sets with numerous features that are related to one another. As a result, the curse of dimensionality affects learning from a medical dataset to discover significant characteristics, making it necessary to minimize the feature set. Feature selection (FS) is a major step in classification and also in reducing the dimension. This study attempts a novel Binary Multi-objective Chimp Optimization Algorithm (BMOChOA) with dual archive and k-nearest neighbors (KNN) classifier for mining relevant aspects from medical data. In this research, 12 versions of BMOChOA are implemented based on the group information and types of chaotic functions used. The best Pareto front obtained from suggested BMOChOA variations is compared with three benchmark multi-objective FS methods by taking 14 popular medical datasets of variable dimensions. By analyzing the experimental outputs using four multi-objective performance evaluators, it is found that the proposed FS method is superior in finding the best trade-off between the two objective functions: the number of features and classification performance.


I. INTRODUCTION
The relevance of the diagnosis is known to all professionals. Timely and effective diagnostic treatment can save patients' lives, so it's necessary to have proper computer aided diagnostic (CAD) systems that doctors can use. In general, classification is the most important aspect of CAD systems. A vast amount of observations, mostly electronic health records (EHRs), are collected in the medical sector [1]. Nowadays, machine learning is becoming more common in healthcare. Because of the complex nature and size of data, the huge quantity of data created by Electronic Data Interchange (EDI) clinical transactions cannot be handled and evaluated by conventional techniques. Classifying medical data sets is a difficult challenge because they frequently contain many attributes and examples. The need for rapid and precise diagnosis is a consequence of the search for The associate editor coordinating the review of this manuscript and approving it for publication was Xinyu Du . more correct and faster categorization methods in CAD systems [2]. The classification precision depends greatly on the chosen feature set to allow classification methods to distinguish instances and detect similarities between examples of the same class. Noisy, duplicated, and irrelevant characteristics may also be present in high-dimensional medical data. These properties lead to unfavorable impacts on the learning process and on the efficiency of categorization by widening the search area too much. This idea is often called the ''curse of dimensionality''. There is, thus, high computation, sophisticated models, and extremely long learning times required for analysis and data mining [1].
The application of FS methods is one of the most effective answers to this problem [3]. FS techniques tend to pick a subset of highly significant features by removing or reducing duplicate and unnecessary characteristics. Therefore, FS techniques can lead to improved data understanding, reduced learning time, and simplified prediction models with potentially improved performance [4]. Because of its vast search area and the intricate connections between features, FS is a complicated process. Searching techniques for identifying subsets of optimum features in an L-dimensional data set necessitate the inclusion of 2 L subsets from the combination of characteristics, making searching for big L extremely difficult. Thus, the FS issue is characterized as an NP-Hard task [5].
FS methods are classified mostly into wrapper, filter, and embedding techniques [6]. Our emphasis in this study is on wrapper processes. The FS process includes a search process for the quasi-optimal feature subsets and a prediction system for the wrapping technique [7]. Wrapper techniques give superior outcomes than filter approaches, but due to the continuous learning of the classification algorithm, these strategies take longer than the filter techniques [8]. By contrast, filter strategies are based on statistical methods and theories of information and attempt to recognize a nearoptimal sub-string of characteristics having the strongest individual association with the result and the lowest inner correlation [6]. Furthermore, embedded methods are aimed at integrating the FS phase into the training phase of the classification [9]. Use of the wrapper techniques such as greedy search alternatives like sequential forward selection (SFS) [10] and sequential backward selection [11] tool have been published to date with numerous solutions to the FS task. These strategies do, however, have shortcomings such as slow convergence, optimal local trapping, and are computationally costly [12].
In principle and practice, the utilization of population-based research and the production of various unique solutions might make evolutionary algorithms suited to resolve multi-objective issues. Due to its structural character, FS may be classified as a multi-objective optimization problem (MOP), given that it takes account of at least two conflicting objectives. MOPs are generally supported by a number of non-dominated (ND) solutions which represent a compromise between opposing aims, providing various options for decision-makers. In addition, the FS was examined in many objectives compared to the single objective state in a small number of ways, according to related research [29], [30].
The Chimp Optimizing Algorithm (ChOA) is one of the advanced metaheuristic technique developed for the resolution of optimization issues. According to prior studies, this technique has properties such as low feature assessment, high-speed, and excellent global and local discovery [31], [32]. The capacity of this method for handling the FS assignment has yet to be explored to the best of our understanding.
This article aims mainly to build a ChOA-formed multiobjective wrapper FS technique with dual archive, which can concurrently decrease the amount of features and boost the classification performance, and provide a set of Pareto solutions. Because FS is a multi-target optimization problem, we propose for the first time a binary variant of Multi-objective ChOA called Binary Multi-Objective Chimp Optimization with a Sigmoid Transfer Function (BMOChOA-S).
In particular, the following goals have been examined in this article.
1) To gain insight into the strengths and weaknesses of recent works on metaheuristic-based FS tasks. 2) To introduce a BMOChOA-S to discover Pareto optimum solutions for the FS work. 3) To present a comparative analysis report on the performance of the BMOChOA towards the FS task considering healthcare data, by implementing twelve different variants of BMOChOA depending on the chimp group and the type of chaotic map used. 4) To evaluate the suggested technique with three well-considered multi-target approaches and examine whether the strategy proposed outweighs benchmarking methodologies for limiting the length of feature subset and increasing the classification accuracy. 5) To confirm the performance of the presented technique in terms of computational cost. The structure of this article is organized accordingly. In Section II, we discuss the ChOA algorithm, the multi-target optimization idea, and current FS experiments. The suggested multi-target technique is explained in Section III. Section IV shows the experimental set-ups. Section V fully describes the results and comments. Finally, the last part provides the conclusion and suggests future guidelines for work.

II. BACKGROUND
First, the basic ChOA principles and multi-objective optimization techniques are outlined in this part. Then, a summary of significant work done in the FS world is presented.

A. ChOA
ChOA's conceptual foundation is based on chimpanzees' hunting habits. The standard ChOA separates the chimp group into four types: attacker, barrier, chaser, and driver. The attacker is the leader among them. The other three varieties of chimpanzees helped to hunt, which in turn reduced their status. Drivers track the prey but do not try to arrest them with VOLUME 10, 2022 it. Barriers are placed in trees to construct a dam covering the movement of the prey. Chasers move to catch it quickly after the prey. In the end, the attackers predict the break-out course of the prey towards the chasers or down to the lower canopy. This crucial job (attack) has a favourable correlation with age, intelligence, and physical capacity. In addition, during the same hunt, chimps might swap responsibilities or retain their jobs over the whole procedure [33]. Broadly speaking, the technique of chimp hunting is separated into two major phases: exploration (driving, jamming, and chasing the prey) and exploitation (attacking the prey).
Mathematically, the two phases are described below:

1) DRIVING AND CHASING THE PREY
In the exploration and development phase, the prey is pursued. Eqs. 1 and 2 are used for mathematical modelling of driving and tracking prey.
rnd1 and rnd2 are the random vectors [0, 1] and c2 is a chaotic vector computed on the basis of several chaotic maps to show the influence of chimps' sexual drive on the hunting progression.
Hypothetically, separate independent groups with a shared purpose may be utilized in each population-based optimization method to provide a direct and random search result simultaneously. Any continuous function can be taken out to update distinct chimp groups. These functions have to be selected such that f is decreased throughout each run [34]. The procedure of driving and chasing the prey is pictorially shown in FIGURE 1 [31].

2) ATTACKING (EXPLOITATION)
The process of hunting is generally done by attackers. Occasionally, drivers, barriers, and chasers take part in hunting. Apparently, there is no information regarding the optimal position in a conceptual search area. To mathematically replicate the behaviour of chimps, the initial attacker (the best solution), driver, barrier, and chaser are better informed about the status of the prospective prey. So four of the best yet achieved solutions are kept, and the other chimps need to change their spots according to the ideal chimp positions. This is expressed as follows: Here, CI is the current iteration. The dynamic coefficient c1 and vector b are computed using eq 6 and 7. With the passage of repetition, f drops non-linearly from 2.5 to 0.
The dynamic coefficients for f taking two distinct versions of ChOA (ChOA1 and ChOA2) with various independent groups are given in APPENDIX VI [31].

B. MULTI-OBJECTIVE OPTIMIZATION (MOP)
Many real-world tasks generally consist of a group of goals that must be optimized at the same time. The solution to these problems is a series of solutions that really represent a compromise between distinct objectives. The set containing all the trade-off solutions to a given problem is called the Pareto optimal set or Pareto front. Mathematically, For a single objective optimization problem (SOP), a candidate's superiority in relation to other solutions is determined via a fitness comparison. The merit of a candidate solution is nonetheless judged in MOP by the concept of dominance. A solution P in the objective space of a C-objective problem dominates another solution Q if the following two criteria are true: 1) ∀ C : P is not worse than Q 2) ∃ c : c C and Pc is strictly better than Qc

C. RELATED WORKS
In practice, FS methods are divided into three primary types: embedded, filter, and wrapper. However, our concentration is on a wrapper-based approach. Wrapper techniques utilize a search procedure to discover almost optimal solutions and a classifier for rating the solutions that have been identified. Thus, wrapper methods may be subdivided into two: evolutionary and non-evolutionary groups based on the type of search strategy. Branch & Bound [35], SFS [10] and SBS [11] might be considered among the most wellknown non-evolutionary solutions. Although these methods may be implemented relatively simply, they struggle with issues including convergence to local optimums and with substantial computing overheads for huge data. Structural failure also occurs in both SFS as well as SBS techniques; thus, the following steps may not remove (add) features which have previously been included (excluded) from the set [6]. Sequential Floating Selection (SFFS) and Sequential Backward Floating Selection (SBFS) were developed to tackle this issue [36]. Unfortunately, advancements in these methods have not solved the convergence problem to the optimal local level [37]. We might refer to Focus [37] and Relief [38] as two of many non-evolutionary filtering approaches. Filter processes such as mRmR [39] or MIFS [40] also attempt to enhance the efficiency of the FS algorithm by applying information theory ideas. This improvement is obtained by looking at the relevancy of an attribute to the outcome and any duplication among the characteristics. Researchers used evolutionary techniques to resolve the difficulties described above and to apply better search strategies. These algorithms create and assess numerous solutions concurrently because they are population-based and have better global discovery than traditional approaches. Single objective wrapper techniques generally serve the purpose by restricting the length of the feature set, or by enhancing the classification efficiency, or by aggregating these targets [41]. For a better understanding of the single-objective evolutionary methods for solving FS task the interested reader can refer to DA [21], SSA [26], [42], HS [43], TLBO [28], grasshopper optimization [44], Jaya algorithm [22], [45], HHO [18], atom search [46], SMO [24], SHO [25], CS [47], ALO [48], ABC [30], FOA [49], FPA [50], and WOA [51], [52] etc.
The excellent quality of any meta-heuristic methodology is focused and constrained. Exploration and exploitation are two opposing conditions to be considered when developing meta-heuristics. The meta-heuristic algorithms perform well in some cases but poorly in others, so it is critical to strike a reasonable balance between exploration and exploitation to improve the algorithms' efficiency. Each nature-influenced methodology has its own positive and negative aspects, such that the right algorithm for a particular problem is not guaranteed. We cannot find the optimal solution for each kind of function with the individual optimization algorithm [53]. The implementation and proposal of modern meta-heuristics with high precision for actual implementations has therefore become a challenge to scientists, [54]. As a result, the hybridization of evolutionary methods has engaged many research people to solve FS problems. The aim of hybridization is to identify compatible alternatives in order to ensure the optimal output of optimization methods, which is accomplished by combining and coordinating the exploration and exploitation processes [55]. Hybridization of evolutionary methods is a common method for combining the strengths of independent architectures to address such shortcomings [54]. Some recently suggested hybrid evolutionary techniques for handling FS task are: ABC-GA [56], MA-HS [57], PSO-GE [58], GWO-PSO [41], TEO-SOA [59], GWO-CSA [60], PSO-FLA [61], SCA-ALO [62], TLBO-SSA [63], HHO-CS [64], SCA-HHO [65], GWO-HHO [66], and SCA-CS [67] etc.
The ChOA has been proposed recently to solve different optimization tasks [31]. ChOA is meant to mitigate two challenges in the resolution of high-dimensional issues: poor convergence speed and entrapment in the local optimum. Jia et al. [32] in 2021 have attempted an enhanced ChOA for solving optimization problems in the continuous domain. Gaurav Dhiman has suggested a fusion of SSA and SHO based ChOA for tackling the optimization applications in engineering [68]. Also, a hybrid SCA-ChOA method is proposed by Kaur et al. [69] for HLS of datapaths in digital filters and engineering applications. To the best of our knowledge, no work exists in the literature for discrete ChOA in order to find solutions to problems like feature selection. For the first time, we proposed a discrete version of the ChOA with a multi-objective essence for selecting relevant factors from variable sized healthcare data.
It is noticeable that these strategies provide one almost optimum result using single target techniques. However, the FS is fundamentally a MOP that attempts to achieve at least two opposing goals: to reduce the number of characteristics and to increase the efficiency of the classification. In these cases, the true response is for clients to provide a number of non-dominated (ND) alternatives to choose solutions which are suitable to their circumstances. We will look at the multi-objective techniques offered to the FS issue in the following part of this article.
In the past few years, multi-objective heuristics have been the focus of significant investigation due to the simultaneous evaluation of many typically conflicting objectives and the presentation of a series of ND solutions. In the study [70], the GA and non-domination concepts were used to address the FS issue with the NPGA multi-objective method. NSGA-I was used in [14] to take into account the aims of lowering the number of characteristics and decreasing the artificial neural network's classification error. Zhu et al. [71] have introduced a hybrid filter and wrapper procedure utilizing a mimetic method using a filter process to enhance wrapper MOEA solutions by the addition/removal of features depending on correlation condition. Huang et al. [72] in 2010 have sought to locate the Pareto front using the NSGA-II method. The MOPSO method was used to FS in the studies by Xue et al. [12]. Vignolo et al. [73] have introduced a novel MOGA-based method for FS in the face recognition domain. Three major goals are considered here: 1) increasing accuracy, 2) decreasing the number of characteristics, and 3) limiting mutual information (MI). In Ref [74], a multi-objective FS (DEMOFS) method employing differential evolution was presented. In Ref [75], where the accuracy of every class is treated as a cost function, the NSGA-II wrapper methodology was introduced. The study in Ref [76] investigates the MOFS-BDE, a novel multi-objective FS strategy that incorporates three operators: a unique binary mutation operator, a One-bit purifying Search operator, and a fast non-dominated sorting operator to increase performance. Zhang et al. [77] have suggested the first work of multi-objective PSO with Pareto dominance and an external repository for cost-based FS problems. The PSOMOFS, presented by Hu et al. [78] is a fuzzy multi-objective FS approach with PSO to solve the FS issue with fuzzy costs. To trim the elitist repository, this technique provides a fuzzy dominance relation to check the excellence of nominee particles and sets a fuzzy crowding distance (CD) metric to discover the global leader. In Ref [79], an innovative MOPSO technique was presented for addressing the FS challenge by using local search to enhance the repository solutions. An enhanced MOPSO technique is proposed in Ref [80] that utilizes t-score and precision as goal functions. Ragothaman and Sarojini [81] have tried to address the FS problem on healthcare datasets using a MOABC inspired by non-dominated sorting. Jimenez et al. [82] utilized the ENORA technique, which applied dominance and slot principles to identify the Pareto front in order to pick parameters in the online sales prediction method. In the article [30], the FS problem was solved using a multi-target ABC approach. In Ref [83], a mixed MO approach built on the SSA and the SHO has been proposed for FS. This approach prefers to mix SSA and SHO to balance variability and convergence to obtain the best Pareto front. The NSGA-II method-based hybrid filter wrapper was developed in the work [84]. Baliarsingh et al. [85] used a fisher score technique together with a multi-objective Penguin Optimizer. Numerous recent research papers, such as [86]- [88], are concurrently working on resolving the FS issue and optimizing the classifier's parameters. For example, [86] utilized MOPSO to fulfil three objectives: optimizing the settings of SVM, choosing a kernel function, and also picking the right characteristics. A unique method using the multi-objective Grey Wolf optimizer (MOGWO) was introduced by Al-Tashi et al. [17] in 2020. The technique presented employs the repository to preserve non-dominated alternatives. Recently, in 2021, Piri et al. [20] have suggested a MO quadratic binary harris hawk optimizer for selecting required features from 12 medical datasets. Also, they have proved the superiority of the proposed technique by comparing the results with deep-based FS methods such as AE and TSFS. A multi-objective FOA has been introduced in the article [3] for tackling the FS task. When the depth of FS issues grows, the solution space grows exponentially, resulting in a large number of local optima. As a result, for large-scale FS issues, present evolutionary approaches still suffer from the problem of local optima stagnation. To address this, Xue et al. [89] have presented a self-adaptive PSO (SaPSO) method for FS, specifically for widescale data. To cope with high-dimensional FS challenges, Song et al. [90] have presented an innovative FS technique called ''bare bones PSO'' (BBPSO), integrating mutual information. They additionally devised a successful swarm initialization technique based on correlation in order to hasten swarm convergence. Also, the authors in Ref [91] have developed a novel three-phase hybrid FS method based on correlation-guided clustering and PSO for large dimensional data.
Most prior research has employed single objective evolutionary algorithms to address the FS issue, as can be observed from the literature, whereas few studies have concentrated on MOFS techniques compared to the other methods. Moreover, the effectiveness of the ChOA to solve MOPs has yet to be studied given the potential functionalities of ChOA, such as the ease of operators, the reduced number of fitness assessments, and the small number of parameters. Therefore, in this article, we suggest the BMOChOA method to solve the FS issue.

III. PROPOSED METHODOLOGY
As per the study in the previous part, the FS is a MOP that may intrinsically address at least two goals. That is, to limit the count of characteristics and boost the accuracy of classification. The binary version of ChOA to resolve discrete optimization problems like FS has not been established yet. Furthermore, no implementation of ChOA to address FS as MOP in the existing work has yet been suggested. This portion thus contains a Binary Multi-Objective ChOA (BMOChOA) to tackle the FS problem for the first time in healthcare data mining. FIGURE 2 shows the overall design of the presented BMOChOA-based FS job. In the following, FS is regarded as a binary optimization task in the context of a 0/1 or discrete presentation of candidate solutions. The ChOA is not directly suited to the FS challenge, as it aims to solve the problem of continuous optimization.  In order to attain the aim, various changes are thus necessary: 1) Representation of chimp: Each chimp location in BMOChOA is represented as a vector of length L, consisting of only 0s and 1s. If a particular bit value in the nominee position vector is 1, then the corresponding feature is considered to form the reduced dataset, otherwise not. The pictorial representation of the structure of each chimp's location is shown in FIGURE 2. 2) Fitness Assessment: This research considers FS as a bi-objective optimization problem. Therefore, each chimp in the population is evaluated by applying the following two goal functions: where P is the location string of a chimp of length L.
The following procedures are taken to calculate the objective function Obj2 for each chimp: • First, a compressed data set is formed by taking the characteristics from those indices of the chimp location string where the value is 1.
• Then the KNN model with a 10-fold CV is applied to compute the Obj2 value of each chimp. For training and testing, the total samples of a dataset are randomly split into 10 groups. 3) Maintenance of Primary Archive: The primary archive must be updated following evaluations of each chimp in the swarm because, after completion of each iteration, the BMOChOA provides a collection of Pareto optimum solutions rather than a single one. A fresh non-dominated solution NS new of the in progress iteration is allowed to be inserted into the primary archive depending on the following scenarios: • If the NS new is dominated by at least one of the primary archive participants, it is not allowed to be entered.
• If NS new dominates any current member of the primary archive (say Q), NS new will substitute Q for improved archive construction.  [20]. 4) Secondary Archive Management: In this paper, after each generation, we consider a solution optimal if its rank value is 1. The primary archive is responsible for storing all the optimal solutions of a particular iteration. However, for practical simulation of chimps' behavior, the algorithm requires four best solutions (A: Attacker, B: Barrier, C: Chaser, D: Driver) in each iteration. But during the execution of each loop, it may happen that there are not four best solutions ranked 1 to find ABCD. Therefore, in each iteration, we have ordered the chimps according to their rank and chose the top four solutions as ABCD. Again, after each generation, we may get a different set of ABCD that are better than the previous ABCD. So we need to preserve the location of ABCD after each loop. Here we feel the presence of a secondary archive to keep the ABCD of each iteration. After finishing a particular iteration, the new ABCD group is compared with the older one using the dominance principle, and the fresh one will replace the existing one if it is fitter. Then the new set of ABCD is used to update the chimp position in the next iteration. The schematic diagram for both primary and secondary archive management is shown in the FIGURE 3.

5) Conversion from Continuous to Binary ChOA:
Existing research demonstrates the effectiveness of the transfer function (TF) to turn a continuous optimizer into a binary one because of its simplicity, low cost, and rapid and easy execution. The majority of researchers employed the common TFs in the form of S and V to alter the continuous optimizer into a discrete one. But this proposal uses the S-form (sigmoid ()) method to achieve the same.
6) Chimp Location Update: Assume that µ is a random probability value between 0 and 1. Then if µ is less than 0.5, the equation 9 is applied to alter the current location of the respective chimp. Similarly, a chaotic value is used for updating the chimp if µ >= 0.5 (given in equation 13). In this article we have taken 6 important chaotic maps and their descriptions are mentioned in APPENDIX VI.
Here we have implemented two variants of BMOChOA, namely BMOChOA1 and BMOChOA2. Therefore, depending on the chaotic maps used for each type in order to update the chimp location, there are 12 different BMOChOAs that are coded. The APPENDIX VI contains the detailed naming. After calculating the new location of the chimp ( p i L (CI + 1)) in the continuous domain, the sigmoid TF comes into front to convert it to a probability value by using the equation 14.
T ( p i L (CI + 1)) = sigmoid( p i L (CI + 1)) Then the BMOChOA alters the new location of each chimp to continue the next iteration, by applying the equation 15.
where, i: i th bit of the chimp's position, L: original dimension, rd: random number between 0 and 1, and CI: current iteration 7) Returning the best of Primary archive: The primary archive holds all the solutions that are not mutually dominated after the given number of steps. As a screening method, the CD value is utilized here to pick an optimum combination of features from a group of non-dominated ones. When the entire population is present in the primary archive, the calculation of the CD needs a C number of arrangements of a maximum of S solutions, resulting in a time complexity of (CSlogS). For details regarding the CD calculation procedure, refer to the article [20]. 8) BMOChOA Algorithm Algorithm 1 CD-Based BMOChOA for FS 1: Set the initial size of the population (S) and the upper bound of iterations to be used in the simulation (MI). 2: Set the initial chimps' location. 3: Evaluate the chimps using fitness functions. 4: Select the Attacker, Barrier, Chaser, and Driver based on their rank and store them in a secondary archive. 5: Save all non-dominated chimp locations into a primary archive. 6: for CNT ← 1 to MI do 7: for each chimp do 8: if µ < 0.5 then 9: Update the position of the chimp using equation 9. Update: f, c1, c2, c3, and b. 17: Evaluate the updated chimp position. 18: Update P Attacker , P Barrier , P Chaser , P Driver . 19: Update the primary and secondary archives. 20: end for 21: Return the primary archive along with the best of it using CD measure.
The time complexity of the KNN for Q samples is O (Q * L) [20]. For finding the non-dominated solutions after each repetition, we have used the concept of dominance tree, which minimizes the number of duplicate comparisons, resulting in the time complexity of O(CSlogS) [93]. The complexity of the initialization task can be derived by the equation 17: Here, S:-population size, C:-number of objectives, Q:-sample size, and L:-actual dimension. The CD-based ordering of the primary archive in each loop takes O(CSlogS) [20]. Hence, the overall computational complexity of the introduced FS method can be expressed as:

1) Datasets in Details:
The achievement of the BMOChOA is confirmed by the use of eleven conventional datasets of various dimensions from UCI and three microarray cancer datasets [94]. TABLE 1 explains each dataset's structure, including the number of attributes (9-7129), instances  and categories (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21). To overcome the numerical issue, every dataset is normalized. 2) Methods for Comparison: To verify the superior performance of the presented BMOChOA in producing the Pareto fronts, three of the most popular MOFS techniques are used in this article: 1) MOGA 2) MOPSO, and 3) NSGA-II. In the first approach, [15], MOGA, a genetic algorithm with a multi-objective flavour, is applied to solve the FS task. Here, the CD measure is used to maintain population diversity. The second method, MOPSO, [12], [95] is a very popular multi-objective technique to handle the FS task. Here too, a repository is used to keep the updated fittest solutions from every iteration. The third strategy, as suggested by Deb et al. [96], is an NSGA-II and is a widely recognised MO approach. The method consists  where λ I is the I-dimensional Lebesgue measure.
• Spread [99]: This metric is used to estimate the spread of solutions present in the PF. The higher the value, the better the spread of Pareto solutions.   lower for all the 14 datasets as compared to other variants of BMOChOAs. The lower value of IGD indicates the higher convergence speed because it denotes the gap between the TPF and CLF. BMOChOA16 used a Tent Chaotic map for position updates of each chimp. The tent map is a piecewise non-smooth chaotic map. It covers the entire phase space, and the region is chaotic. The reason for getting better Pareto solutions from BMOChOA16 may be due to the ability of the tent map to search non-repeatedly all states within a specific range. This aids the BMOChOA16 to escape from local minima and reach the global best faster. In addition, we have used the Wilcoxon signed rank test to verify whether the BMOChOA16 is significantly superior to the other variants. The performance of the two approaches is significantly different if the P-value obtained from the Wilcoxon signed rank test is less than 0.05 and similar otherwise. The outputs of the Wilcoxon signed rank test along with the P-values are given in  Similarly, for SRBCT samples, the performance of MOGA is also very close to the presented approach in producing the non-dominated solutions. The Pareto front for the LungCancer dataset has only one solution, but that solution is superior to others. For arrhythmia, Parkinson, and colon tumour data, there is a wide gap between the fronts of BMOChOA16 and the same of the other three methods, which indicates the supreme efficiency of the proposed approach over other standard methods. All four methods are executed for 50 iterations, and after completing them, the CD measure is applied to pick the best  out of the best from the least gathered area of the primary archive. The TABLE 5 is for listing the Pareto solutions selected by the CD value for all the 14 datasets. In the BreastCancerW data, CD based BMOChOA16 selected the best one, which gives 96.9% accuracy by taking less than 50% of the original features. However, the best solution of MOPSO gives 0.1% more accuracy than BMOChOA16 at the expense of 2 additional attributes. In the case of lung cancer data, the performance of the CD-based BMOChOA16 is excellent because it produced the best Pareto solution, which is able to classify the samples with an accuracy that is 45.5% more than the actual one by considering only 2 out of 56 features. A brilliant solution is obtained from the proposed method for the Arrhythmia dataset, which is able to give more than 12.1% classification accuracy than the original by focusing on only 1.4% of the actual feature set. For colon tumour and cardiotocography samples, both MOPSO and BMOChOA16 have shown equal performance in selecting the best of the primary archive, and that is quite satisfactory. Similarly, the achievements of MOGA and BMOChOA16 are the same in the cases of SRBCT, diabetic, and lymphography data. In the Parkinson's dataset, BMOCHOA16 picked a solution which contains 2% more accuracy than the original by using only 2 out of 754 features.
The TABLE 6 and 7 have recorded the statistical evaluation results with respect to the count of features selected and the corresponding accuracy figure produced by BMOChOA16 and other three very famous algorithms, respectively. It is found that the BMOChOA16 is able to give higher average classification accuracy by using fewer features in the Lymphography, Diabetic, LungCancer, Parkinsons, and Leukemia datasets. All the methods have shown equal performance when considering the ILPD data. For the BreastCancerW dataset, the results are up to the mark   because the MOPSO and NSGA-II produce only 0.3% more average accuracy at the expense of 0.75 and 1.66 average number of attributes, respectively. Similarly, the proposed method has shown satisfactory output for cardiotocography data by giving a lower average number of features while compromising with only 0.7% less classification accuracy value than MOPSO. For arrhythmia data, our suggested approach gave an accuracy value of 0.623 with an average of only 3 features (1.07% of the original dimension). However, the MOPSO is able to produce 4.1% more average accuracy than BMOChOA16 by including 78 extra features. With respect to the high dimensional dataset like colon tumor, the BMOChOA16 is able to classify the samples with an accuracy of 0.796 with the help of only 12% of the original number of attributes. However, MOPSO has an increased result of 4.3% more accuracy than BMOChOA16 by taking 45% of the original width. In the case of the SRBCT dataset, MOGA gave an excellent average accuracy value compared to that of the suggested approach, at the expense of 3% more features. The population containing the number of chosen features by the presented method has a lower standard deviation than others, indicating the closeness of the Pareto solutions towards the mean one with respect to the objective function one. From the above analysis and discussion, it can be concluded that the proposed approach can efficiently generate a set of non-dominated solutions in most of the datasets by discarding the irrelevant attributes.

C. ANALYSIS OF CONVERGENCE CHARACTERISTICS
Both the diversity of the final estimate of the Pareto front (CLF) and the convergence to the true Pareto set of a MOP are two key concerns in evolutionary multi-objective optimization (EMO). In this article, we have used IGD for convergence proof, Spread for distribution analysis, HV for both convergence and distribution analysis, and SCC to quantify the contribution of the four multi-objective methods. After the end of the 50 repetitions of all the FS strategies, the performance measures for the Pareto fronts shown in FIGURE 4 and 5 are recorded in Table 8. As the IGD values of the obtained Pareto fronts by the proposed approach are less than the others in all the datasets, they are very close to their respective TPFs. Also, the non-dominated solution sets resulting from BMOChOA16 for all the 14 datasets are quite well scattered as their spread and HV values are attractive. The SCC values of the CLFs by the suggested approach are high as compared to others in the case of most of the datasets, indicating their larger contribution towards getting the TPFs. We have also conducted a comparative study on the IGD values of the Pareto fronts achieved from 25 separate executions of BMOChOA16, MOGA, MOPSO, and NSGA-II by applying the Wilcoxon signed rank test. The statistical results of this test by taking a 0.05 significance level are listed in TABLE 9. In TABLE 9 the '++', '--', and '==' for any column Approach1-Approach2 denote whether the Approach 1 is statistically superior to, statistically worse than, or significantly same as the Approach2 respectively. Entries of the TABLE 9 indicate that the efficiency of the proposed approach is significantly better than others for most datasets, except ILPD and cardiotocography, where it is equal. Finally, we can conclude that the presented BMOChOA16 with Tent Chaotic Map outperforms others in terms of optimizing critical aspects of healthcare data.

D. RUNNING TIME COMPARISON
In order to record the average running time of the four above-discussed methods in TABLE 10, they are executed 25 times separately for each of the 14 datasets. By observing the entries of the TABLE 10, it is found that for most of the datasets like BreastCancerW (5.26), ILPD (5.12), PrimaryTumor (3.34), Diabetic (7.17), Cardiotocography (9.45), Cervical Cancer (2.14), Arrhythmia (7.59), Parkinsons (14.23), and Leukemia (17.02) datasets, the proposed FS method took less time to complete 50 iterations. Furthermore, for many datasets, MOPSO and NSGA-II took longer to execute than others, because in MOPSO, both position and velocity are updated for each particle, whereas in NSGA-II, two populations are combined and different fronts are computed for the next loop at each iteration. In MOChOA, either exploration or exploitation is performed for each chimp according to the value of µ. This may be one of the causes of its lower execution time. However, in the case of the other three methods, both exploration and exploitation are carried out for each solution of the population.

E. MERITS OF THE PROPOSED FS METHOD
By doing a vigilant revision of all the experimental outcomes brought up in the previous subsections, it is observed that the suggested BMOChOA16 method for FS could be an efficient member in the domain of FS for discarding unnecessary aspects from healthcare samples.
• The offered FS approach, BMOChOA16, is found to be excellent in generating the top Pareto fronts concerning both the objectives as compared to the other three benchmark multi-objective approaches.
• The most distinct characteristic of the proposed method is that it is able to produce a repository of Pareto solutions, which are very unique because of the crowding distance measure.
• For classification tasks in FS, we used KNN as a wrapper because it is one of the finest supervised classifiers and has the lowest computational cost.
• The comparative experimental outcomes of all the four FS methods disclose that the proposed BMOChOA16 with Tent chaotic map is able to reach higher classification accuracy by considering low-sized feature subsets in less time.
• The rate of convergence of the proposed method is high as compared to others due to the lower IGD value of the CLFs in all the datasets. Also, the CLFs of the BMOChOA16 are able to dominate a larger area in the objective space due to the higher spread values.

VI. CONCLUSION AND FUTURE WORKS
A binary multi-objective chimp optimization algorithm is introduced in this study for the first time to handle the bi-objective FS task in the healthcare domain. To establish the method as a strong competitor for the FS purpose, we have taken six chaotic maps with two variants of BMOChOA, resulting in 12 distinct versions of the suggested approach.
The results of all the 12 BMOChOAs are compared, analyzed, and then BMOChOA16 with Tent chaotic map is statistically selected as the best one in identifying the non-dominated solution set which is closer to their TPFs. Finally, to verify the robustness of the offered FS method, it is compared with three well-known multi-objective FS algorithms, namely, MOGA, MOPSO, and NSGA-II. BMOChOA16 has proved to be the supreme method in terms of achieving fewer feature sizes and high classification accuracy when compared with the other three popular methods. The best quality Pareto fronts are obtained from the proposed method in less time for most of the datasets and are verified by using several multi-objective performance evaluation conditions. The soundness of the proposed BMOChOA16 is again statistically verified by performing a Wilcoxon signed rank test on the IGD values of the CLFs by all four methods.
Here we have considered the FS problem as a bi-objective wrapper-based optimization task. However, scalability and computational complexity can be taken into consideration. Also, the proposed wrapper method can be enhanced for the hybrid filter-wrapper optimization task by focusing on the mutual information and correlation among the features and target attribute. In this article, to choose the best of the best from the primary archive, we have applied the CD measure. However, other criteria, such as knee point, can be used for the same purpose. The efficiency of the proposed BMOChOAs is verified using only healthcare data. However, it can be applied to several other optimization tasks in the real world.

APPENDIX A DYNAMIC COEFFICIENT
See Table 11. VOLUME 10, 2022

APPENDIX B CHAOTIC MAPS
See Table 12.

APPENDIX C NAMING OF BMOChOAS
See Table 13.
JAYASHREE PIRI received the B.Tech. degree in information technology from BPUT, India, in 2007, and the M.E. degree in information technology from Jadavpur University, Kolkata, India, in 2012. She is currently pursuing the Ph.D. degree with the Department of Computer Science, IIIT Bhubaneswar, India. She has more than ten years of teaching experience. She has published more than ten research articles in peerreviewed journals, such as Computers in Biology and Medicine and presented various papers in international conferences. Her main research interests include multi-objective optimization, evolutionary computing, medical data mining, and machine learning. She is currently an Assistant Professor at the International Institute of Information Technology, Bhubaneswar, and has more than 18 years of teaching experience. She has more than 35 articles in reputed journals, conferences, and book chapters. Her research interests include biomedical data mining, time series data analysis, and different fields like machine learning, deep learning, and evolutionary computing. She is also associated with various educational and research societies, like IEEE, ACM, IAENG, and OITS.
MANAS RANJAN PRADHAN (Member, IEEE) received the Master of Technology degree in computer science from Utkal University, India, and the Ph.D. degree in computer science from the University of Mysore, India. He has vast experience in teaching, research, and academic administration in India and abroad. He is currently working at Skyline University College, Sharjah, United Arab Emirates. As an Academic Leader, he has worked as the Head of Program at UPES, India, and the Dean-Faculty of IT and science at INTI International University, Malaysia. He has been associated with the IT industry for industry-academic collaboration, internships, placements, and workshops. He has executed the IBM Center of Education for Cloud Computing and Business Analytics at INTI International University, under Laureate International Universities, USA. He has presented and published many research papers in various conferences and journals. He has three Indian patents and three Australian patents to his credit. His research interests include business analytics, datamining, data warehouse, retail/ecommerce analytics, artificial intelligence, machine learning, and business process modeling. He has got the Mentor Award for the i-talent project contest from Confederation of Indian Industry (CII). He has played a vital role in organizing three international conferences, such as NGCT-2015 (UPES, India), ICQMOIT-2008 (ICFAI, India), and ICD-2019 (SUC, United Arab Emirates).
BISWARANJAN ACHARYA (Member, IEEE) received the M.C.A. degree from IGNOU, New Delhi, India, in 2009, and the M.Tech. degree in computer science and engineering from the Biju Pattanaik University of Technology (BPUT), Rourkela, Odisha, India, in 2012. He is currently pursuing the Ph.D. degree in computer application with the Veer Surendra Sai University of Technology (VSSUT), Burla, Odisha, India. He is currently an Academic with the Kalinga Institute of Industrial Technology Deemed University. He has a total of ten years of experience in both academia at some reputed universities, like Ravenshaw University and the software development field. He has published some of the patents to his credit. He has published many research articles in international reputed journals as well as serving as a reviewer for many peerreviewed journals. He has more than 50 patents on his credit. His research interests include multiprocessor scheduling along with different fields, such as data analytics, computer vision, machine learning, and the IoT. He is also associated with various educational and research societies, like IEEE, IACSIT, CSI, IAENG, and ISC.
TAPAS KUMAR PATRA (Member, IEEE) received the Master of Engineering degree from the National Institute of Technology, Rourkela, and the Ph.D. degree from the Indian Institute of Science, Bangalore. He is currently working as an Associate Professor at the College of Engineering and Technology, Bhubaneswar. He has more than 25 years of teaching, academic administration, and research experience. Furthermore, he has guided many award-winning projects at the national and international levels. He has published many papers in reputed journals and conferences. His research interests include electronic systems, communication, wireless networking, the IoT, artificial intelligence, machine learning, computer vision, VLSI, and embedded systems. He is an Active Member of various societies, such as IEEE, COMSOC, ACM, LMISTE, and LMISOI. He is a Microsoft Certified Professional (MCP) and a Microsoft Certified System Engineer (MCSE). He has received the Best Paper Award in IEEE COMSWARE-2007. In addition, he was awarded the Gold Medal in the Intel India Embedded Challenge, in 2011, as a Finalist.