An Enhanced Binary Multiobjective Hybrid Filter-Wrapper Chimp Optimization Based Feature Selection Method for COVID-19 Patient Health Prediction

This work aims to discover the relevant factors to predict the health condition of COVID-19 patients by employing a fresh and enhanced binary multi-objective hybrid filter-wrapper chimp optimization (EBMOChOA-FW) based feature selection (FS) approach. FS is a preprocessing approach that has been highly fruitful in medical applications, as it not only reduces dimensionality but also allows us to understand the origins of an illness. Wrappers are computationally expensive but have excellent classification performance, whereas filters are recognized as quick techniques, although they are less accurate. This study presents an advanced binary multi-objective chimp optimization method based on the hybridization of filter and wrapper for the FS task using two archives. In exceptional instances, the initial ChOA version becomes stuck at the local optima. As a result, a novel ChOA termed EBMOChOA is developed here by integrating the Harris Hawk Optimization (HHO) into the original ChOA to improve the optimizer’s search capabilities and broaden the usage sectors. The location change step in the ChOA optimizer is separated into three parts: modifying the population using HHO to produce an HHO-based population; creating hybrid entities according to HHO-based and ChOA-based individuals; and altering the search agent in the light of greedy technique and ChOA’s tools. The effectiveness of the EBMOChOA-FW is proven by comparing it to five other well-known algorithms on nine different benchmark datasets. Then its strengths are applied to three real-world COVID-19 datasets to predict the health condition of COVID-19 patients.

expeditious global outspread and the involvement of health-24 care centres throughout the world, researchers have access 25 to a vast amount of publicly available data. This virus's 26 activity may be studied in new ways because of this ''big 27 data''. Despite these advantages, the enormous amount of 28 data makes it difficult to analyse on smaller systems. Scalabil- 29 ity is a difficulty on one hand, whereas high dimensionality is 30 a problem on the other. New and more advanced approaches 31 good balance between exploration and exploitation is crucial 88 to improving the methods' efficacy. Each nature-inspired 89 approach has its own unique set of upsides and down-90 sides, making it impossible to predict which method is best 91 for a given task. Particular optimization algorithms cannot 92 discover the best solution for each type of function [12]. 93 Scientists are now faced with the task of implementing 94 and proposing current meta-heuristics with great precision 95 for practical applications [13]. Thus, the hybridization of 96 evolutionary approaches has attracted a large number of 97 researchers to work on FS issues. The goal of hybridization 98 is to find compatible options so that optimization proce-99 dures can produce the best results possible. This is done by 100 merging and synchronising the exploration and exploitation 101 stages [14]. Hybridization of evolutionary algorithms is a 102 popular way of addressing such flaws by combining the 103 power of independent systems [13]. 104 The chimp optimization algorithm (ChOA) [15] is a mod-105 ern evolutionary algorithm motivated by chimps' particu-106 lar intellect and sexual desire in collective hunting. Among 107 many other familiar meta-heuristic methods, ChOA has been 108 certified for its operational excellence. This technique was 109 developed to overcome two typical flaws in evolutionary 110 techniques: delayed convergence and becoming locked into 111 a local optimal solution. Whenever the search area is large 112 and there are a lot of local extremes, ChOA has been shown 113 to be effective. Continuous issues are resolved with the base 114 variant of ChOA. As a result, this work offers a binary 115 representation of ChOA that has been built. According to 116 previous studies, this approach has a low feature rating rate, 117 a fast processing rate, and great global and local finding [15], 118 [16], [17]. To the best of our knowledge, the power of this 119 system for managing the FS mission has yet to be studied. 120 Although this approach is an effective optimization tech-121 nique, it does have several difficulties in terms of improving 122 exploration capability, speeding up convergence, and coping 123 with computationally intensive tasks. In most cases, ChOA 124 has a satisfactory convergence rate and a straightforward 125 design. ChOA, on the other hand, may attempt to keep the 126 balance between exploration and exploitation in certain com-127 putationally intensive situations and slip into a locally optimal 128 state. The drawbacks of ChOA become more apparent when 129 dealing with high-dimensional functions and multimodal sit-130 uations. The basic ChOA's optimization power is determined 131 by the best solution. In this study, we present two techniques 132 (multi-objective and HHO) for improving the basic ChOA's 133 efficiency. To make this algorithm more valuable, the appli-134 cation fields of ChOA must be expanded. The hybrid design 135 is used in this research to show a novel advancement in ChOA 136 and its usage in FS.

137
HHO [18] is a recently released population-based algo-138 rithm with excellent continuous challenge optimization capa-139 bilities. The dynamically interacting activity of Harris' hawks 140 when seeking prey was a major inspiration for the inven-141 tion of HHO. Because different pursuit strategies are based 142 on adaptive prey fleeing tactics and natural circumstances, 143 VOLUME 10, 2022 HHO is divided into two stages: exploration and exploitation. In HHO, the six steps are followed at random to 145 find the best option. The achievement of the classical HHO 146 has been shown to outshine numerous popular approaches, 147 such as PSO, GWO, FOA, ALO, ABC, BA, and WOA. 148 The balance between exploration and exploitation, as well 149 as quick completion, are the key advantages of this opti-150 mization method. HHO, in particular, demonstrates excellent 151 exploitative capacity in later phases. For these outstanding 152 benefits, HHO or its revised counterpart has been studied 153 as an optimization method in a variety of studies [19], [20], 154 [21], [22], [23]. Some customised varieties of HHO have 155 also been designed to address certain optimization chal-156 lenges [24], [25], [26], [27]. Due to the superior performance 157 of HHO, it has been mixed with other optimizers such as 158 HHO-CS [28], HHO-SSA [2], HHO-GWO [29], and HHO-159 SCA [30] etc., in literature to enhance their performance. 160 Our approach to the feature selection problem here is a  This study makes the following major contributions:   multi-criteria approaches and determine whether it out-201 performs conventional procedures for shrinking feature 202 subset size and enhancing accuracy rate. 203 6) To verify the provided method's reliability using 204 three real-world COVID-19 datasets. 205 The following is the paper's structure. The background 206 material is discussed in Section II. The introduced FS 207 approach is presented in Section III, and the experimental 208 setups and findings are presented in Section IV. Application 209 of the suggested approach in real world COVID-19 data is 210 presented in the Section V. Finally, Section VI brings the 211 paper to a close. ChOA [15] is a novel meta-heuristic method suggested by 215 Khishe and Mosavi in 2020. ChOA is based on the chim-216 panzee's collective trapping and mating impulses. Prey seek-217 ing is used in the ChOA technique to find the best solution 218 to an optimization model. Driving, obstructing, chasing, and 219 hitting are the four main phases in this method's hunting 220 process. At the beginning, the community of chimpanzees 221 is generated at arbitrary. It is then possible to classify the 222 four types of chimpanzees as follows: attacker, barrier, chaser, 223 and driver. Continuous ChOA allows chimpanzees to shift 224 their location at any time and in any direction. There are just 225 two possible outcomes in discrete optimization: 0 or 1. This 226 results in a binary form of the ChOA (BChOA). A discrete 227 meta-heuristic technique has a search area that is identical 228 to a hypercube in shape. Using meta-heuristic strategies, its 229 operators can only move from one corner of the hypercube 230 to the other by flipping from 0 to 1 and back again. Since the 231 BChOA's design relies heavily on the positional change cycle, 232 a number of fundamental principles have to be changed.

233
In BChOA, the primary location change formula is based 234 on the equation 1 [31].
Here, r, rn1, and rn2 are the random numbers between 0 273 and 1.

275
HHO is a swarm-based optimization technique inspired by 276 Harris hawks' ability to trap prey that has evaded capture [32]. 277 In HHO, searching entities are referred to as ''Harris hawks'', 278 and the targeted rabbit is the best option or near-optimal 279 option that has thus far been found. During the exploration 280 phase, Harris hawks are initially placed arbitrarily in areas 281 where they can be used to identify rabbits. This procedure is 282 divided into two stages based on the hawks' roosting loca-283 tions. During the exploitation phase, when hawks investigate 284 the desired rabbit, they employ a variety of hunting methods 285 to deal with the rabbit's numerous evasion behaviours. Thus, 286 there are four stages in this operation, each depending on 287 a different seeking strategy. Harris hawks seek a variety of 288 regions in pursuit of prey during their exploration. During 289 each phase, two protocols must be executed with equal prob-290 ability. Equation (19), as shown at the bottom of the page.

291
Where, P: place of Hawk; P rn : place of the arbitrary hawk; 292 P best : place of prey; Max: highest limit of the searching area; 293 Min: lowest limit of the searching area; rn1, rn2, rn3, rn4, 294 and a: random values [0, 1]; P mean : average location of 295 hawks in the current population 296 It is dependent on the prey's power that HHO switches 297 between two different operations. Escaping energy (EE) is 298 characterised as a time-varying probabilistic variable because 299 of how much energy the prey loses while fleeing. In exploitation, searching agents are able to take advantage 306 of alternatives that are close to the optimal solution that has 307 already been found. HHO models the exploitation step using 308 four processes based on the varying pursuing techniques 309 of hawks and the evasive behaviour of rabbits. Whether or 310 not the prey effectively flees, the hawks will determine the 311 method of besieging the target dependent on the strength of 312 the victim. Here, EE is used to govern the switch between soft 313 and hard besiege.

324
As previously stated, the two methods happen when-325 ever the prey attempts but fails to flee (rn >= 0.5).   [19]. Then, the 340 older location is modified by using equation 28.
We treat the FS task as a binary optimization technique the S-shaped TF is applied as follows to squish the continuous 365 values for each aspect: where, p j : continuous location of the hawk for j th variable.
Single-objective optimization's (SOP) main purpose is to 372 identify the ''optimal'' solution, which refers to the lowest or 373 maximum value of a single objective function that combines 374 all multiple objectives into one. This type of optimization is 375 useful as a tool for providing planners with information about 376 the problem at hand, but it rarely provides a set of potential 377 solutions that trade off distinct objectives.

378
In a multi-objective optimization (MOP) with competing 379 objectives, on the other hand, there is no single best solution. 380 The interplay of several objectives results in a collection 381 of compromised solutions, which are sometimes referred to 382 as trade-offs, non-dominated, non-inferior, or Pareto-optimal 383 options. A fitness comparison is used to establish a candi-384 date's superiority over other alternatives in a SOP. Despite 385 this, the idea of dominance is used in MOP to assess the merit 386 of a potential solution. If the following two requirements are 387 true, an alternative A1 in the feasible region of a C-objective 388 problem dominates an alternative A2. The authors in [33] developed a wrapper technique based 394 on genetic algorithms (GAs) that uses NSGAII and the 395 K-Nearest Neighbours (KNN) method to reduce misclas-396 sification as well as feature counts. In [34], NSGAII 397 looked at developing two filter techniques, NSGAIIMI and 398 NSGAIIE, using MI and entropy as the assessment cri-399 teria, respectively. Recently, a text feature selection tech-400 nique based on a filter-based multi-objective algorithm was 401 proposed by Labani et al. [35]. A text feature's significance 402 is determined by using the Relative Discriminative Crite-403 rion (RDC), whereas redundancy is determined by using 404 the correlation measure. Cervante et al. [36] employed rough 405 set theory and MOBPSO to do filter-based FS. There 406 were two multi-objective filter FS methods proposed by 407 Xue et al.  Harmony Search. The suggested MA-HS technique was 469 tested on 18 UCI data sets and contrasted with 12 other 470 cutting-edge meta-heuristic FS approaches as well as 3 high-471 dimensional microarray datasets. In contrast to others, test 472 findings show that MA-HS is capable of reaching the required 473 high classification efficiency and lower number of attributes. 474 In the article [50], a hybrid PSO-GE is suggested to improve 475 performance, minimize query processing time, and shorten 476 the processing load of PSO. The findings of the tests demon-477 strate that the hybrid PSO-GE approach is more efficient 478 than current approaches. The authors of the study [51] have 479 presented a new Salp Swarm Optimizer (SSA) form, known 480 as ISSAFD, for FS. Using sinusoidal mathematical func-481 tions inspired by the Sine Cosine optimizer, ISSAFD adjusts 482 follower (F) location in SSA. Al-Tashi et al. [52] have pro-483 posed a PSO and a hybrid grey wolf optimization (GWO) 484 known as BGWOPSO to find the best attribute subset and 485 to solve FS problems. The research findings showed that the 486 BGWOPSO approach is more effective in terms of quality, 487 including computational time, accuracy, and selecting the 488 best optimal features. The authors of the article [53] have pro-489 vided 3 hybrid architectures for the FS task based on thermal 490 exchange optimizer (TEO) and seagull optimisation (SOA). 491 The findings from the experiments have shown that the pro-492 posed hybrid algorithm improves classification performance, 493 ensures the ability to choose hybrid SOA-algorithms, reduces 494 the time for the CPU and picks the informative variable. 495 In the article [9], in order to improve the capability of the 496 MFO algorithm to exploit and explore and provide another 497 way to create an optimum feature vector that, in particular, 498 represents the complete characteristic, an alternative swarm 499 approach known as OMFODE is presented, in which the OBL 500 technique is linked with the DE and an MFO. In solving 501 the FS tasks, the document [54] suggests a novel hybrid 502 approach called the Hybrid Binary Bat Boosted Particle 503 Swarm Optimization Algorithm (HBBEPSO). The findings 504 of this study show the potential of finding the optimum 505 variable fusion in the suggested HBBEPSO method. The 506 authors of the paper [55] aimed to solve the problem of 507 the Sine Cosine Algorithm (SCA) by using the differential 508 evolution operators as local search methods. The findings 509 of the tests showed that, in terms of success metrics and 510 predictive analysis, the suggested approach will provide bet-511 ter performance than the other approaches. The work sug-512 gested by Houssein et al. [28] merged the Harris Hawk opti-513 mizer (HHO) and Cuckoo Search (CS) and chaotic maps, 514 to improve the efficiency of the initial HHO with the hybrid 515 evolutionary approach called CHHO-CS. In addition, the 516 suggested approach for the collection of chemical descrip-517 tors and chemical composites was paired with the help of 518 an SVM classifier. A hybrid optimising approach is sug-519 gested in the article [2], which incorporates SCA into HHO. 520 SCA integration aims to address inefficient HHO discov-521 ery and also improves exploitation with the complex adjust-522 ment of nominee solutions to prevent solution stagnation 523 in HHO. A discrete hybrid GWO and HHO approach called 524 HBGWOHHO is offered in the article [29]. The sigmoid 525 function translates the continuous region of interest into a 526 discrete area to satisfy the requirements for FS.

527
Though many researchers assumed their approach to be 528 multi-objective, this implies that the optimization procedure 529 is simultaneously taking place, but the FS is still limited to 530 one objective task as they optimize the objective functions 531 sequentially during the filter and wrapper stages, respectively.  In order to pick characteristics, one must look for a 576 group of features that collectively have the greatest 577 relevance to the target class and the least amount of 578 redundancy among them all. Therefore, the maximum 579 of the correlation between the feature substring and 580 the class attribute and the reduction of the dependency 581 between the characteristics in an attribute substring 582 are normally emphasised for FS purposes. MI and 583 the Pearson correlation coefficient (PCC) are typical 584 characteristics of relevance or interdependency. This 585 motivated us to formulate the second objective function 586 by using equation 36 [44].
where, f i : discrete characteristics present in the fea-590 ture subgroup and class: class attribute. Then, rank the 591 solutions according to their dominance count. It means 592 rank 1 (the best one) is assigned to that option which is 593 having lowest domination count.

594
3) Update the Archive1 and Archive2: Because the 595 EBMOChOA-FW produces a set of Pareto optimal 596 options after every generation, the Archive1 should be 597 reformatted to reflect succeeding evaluations of each 598 population member. Relying on the cases described in 599 FIGURE 3, a newly non-dominated option NP new of 600 the present population may be added into the Archive1. 601 We assume an option to be best after every repeti-602 tion if its ranking is 1. So the proposed EBMOChOA 603 updates the Archive1 at the end of each loop so that 604 after completing all the iterations it can output the 605 overall Pareto solutions of the FS task. Simulating 606 the behaviour of chimpanzees requires four optimal 607 answers (''P Attacker : −P A '', ''P Barrier : −P B '', 608 ''P Chaser : −P C '', ''P Driver : −P D '') in each cycle. 609 However, it is possible that throughout the running 610 of each cycle, there are no four optimal alternatives. 611 As a result, the chimps were sorted in ranking in each 612 repetition, and the four leading options were chosen 613 as P A , P B , P C , and P D . Thus, we might acquire a 614 new batch of P A , P B , P C , and P D that is superior to 615 the older ones after every cycle. Therefore, we must 616 remember their values. There needs a supplementary 617 archive (Archive2) here to save each cycle's P A , P B , 618 P C , and P D . Through applying dominance approach, 619 the newer P A , P B , P C , and P D team is matched to the 620 prior group, and if it is better, it will occupy the seat 621 of existing gang. Finally, the chimp's positioning is 622 changed with the recent P A , P B , P C , and P D .   the best solutions stored in Archive1 are used in the 638 location update process (as P best ) using the crowding 639 distance (CD) measure. In most cases, Archive1 con-640 tains more than one optimum option as we are using 641 multiple objectives to evaluate the individuals. There-642 fore, here, CD value is used to pick one best option 643 (having the highest CD value) from the set of best 644 Create hybrid solutions based on P j and HP j using equation 37. 14: Obtain the fitness of P hybrid , P j , and HP j .

15:
Pick the best one as current member for further processing.
Treat the current search members as chimp population 18: Compute the fitness of chimp j 19: end for 20: Rank the chimps 21: Update the Archive 1 and Archive 2 22: Obtain recent P A , P B , P C , and P D from Archive 2 23: for j ← 1 to PS do 24: Update the chimp j using equation 1 25: end for 26: end for 27: Return Archive1 and the best of it using CD measure.     Similarly, BMOChOA is also a multi-objective wrapper 756 attempt to find the major aspects causing a particular disease.

757
It has used ''tent'' chaotic map and CD measure in its 758 architecture.

759
In the second phase of our experiment, three multi-  respectively. TABLE 3 shows that, when compared to BMO-795 ChOA, the IGD of the calculated Pareto fronts (CPFs) pro-796 duced by EBMOChOA-FW are lower or equivalent for all 797 9 datasets. Higher convergence speeds are indicated by a 798 smaller IGD value, which signifies the closeness between 799 the true Pareto front (TPF) and the CPF. This shows that the 800 EBMOChOA-FW improves the exploiting and exploratory 801 potential of dimension reduction and addresses the disad-802 vantages of the classic ChOA, which are attributable to the 803 inclusion of the HHO-based strategy. For every chimp's 804 positional adjustments, EBMOChOA-FW employed a tent 805 map. This is a piece-wise non-smooth map. It spans the 806 complete phase area, and the area is chaotic. The capacity 807 of this map to explore non-repeatedly all states inside a 808 given range may be the explanation for EBMOChOA-FW 809 producing superior non-dominated solutions. This empowers 810 the EBMOChOA-FW to escape local optima and attain the 811 globally optimum more quickly. Furthermore, the Wilcoxon 812 signed rank test was employed to see if the EBMOChOA-FW 813 is markedly fitter than the other methods. The outputs of 814 the Wilcoxon signed rank test are given in TABLE 4. The 815 symbols (++, ==, −−) indicate that EBMOChOA-FW is 816 significantly better, equivalent to, or worse than the above 817 two methods. After inspecting the TABLE 4, we observed 818 that EBMOChOA-FW is outstandingly superior to the above 819 two approaches.     Overall, the results reveal that our approach outperforms 862 the comparative techniques in the vast majority of situations. 863 When tested, it was shown that the proposed method was 864 capable of locating and resolving two of the most difficult 865 issues associated with FS: the curse of dimensionality and 866 improving classification efficiency. 867 VOLUME 10, 2022 (SARS-CoV-2) had begun to target China and had spread 872 fast around the globe. Since August 2020, the SARS-CoV-2 873 virus, called COVID-19, has killed more than 600,000 people 874 all over the world [62]. Machine learning and deep learning 875   input. Because of this, even scalability becomes a problem 884 when the data sets are so enormous [65]. Genomic data from 885 COVID-19 patients has been extensively studied [66], [67]. 886 An important issue in this scenario is converting genomic 887 sequences into a fixed-length feature space so that they may 888 be used as inputs for ML classifiers when making predictions. 889 Recently, Ali et al. [68] have designed a COVID-19 virus 890 prediction model using two very popular FS techniques: 891  of COVID-19 severity. However, the application of evolution-900 ary and, more specifically, multi-objective evolutionary FS 901 techniques for predicting COVID-19 patient health is still an 902 unexplored domain of research. Therefore, we are providing 903 a method for accurately predicting patient death based on 904 a wide range of variables. Doctors can use this problem to 905 prescribe drugs and devise tactics in advance that will assist in 906 saving the most lives. The suggested MOBChOA-FW is used 907

988
In order to classify COVID-19 patients, we have developed 989 an effective model employing an optimised FS approach and 990 ML methods. Simple classification algorithms like random 991 forest and KNN were able to accurately forecast the health 992 of COVID-19 patients with our FS approach. A fresh and 993 enhanced hybrid multi-objective optimizer to tackle FS prob-994 lems is introduced in this article. The suggested technique 995 builds on the latest ChOA approach integrated with BHHO by 996 combining a filter and wrapper model into a single system in 997 the hope of maximising the advantages of each type. During 998 the training phase, a combination of MI and PCC and the 999 performance of the KNN model are employed as filter and 1000 wrapper assessment conditions, respectively. Furthermore, 1001 the sigmoid transfer function is used to enable EBMOChOA-1002 FW to handle binary situations. A comparative analysis with 1003 five well-known algorithms was conducted on nine bench-1004 mark datasets and three real-world COVID-19 datasets. The 1005 suggested algorithm exceeds the selected alternatives in terms 1006 of both the number of features and classification perfor-1007 mance, according to the results.

1008
Furthermore, we have noticed that EBMOChOA-FW takes 1009 longer to execute in some circumstances due to the pres-1010 ence and administration of two archives as well as the fil-1011 ter function employed. As a result, we intend to examine 1012 additional fitness functions in the future in order to maintain 1013 higher performance without increasing the running duration. 1014 The mutual information and correlation between features and 1015 class attributes are only taken into account here. However, the 1016 mutual information and correlation between the traits must be 1017 considered. This study uses the crowding distance measure 1018 to choose the best alternative in the Pareto front that gives 1019 VOLUME 10, 2022 Gujarat, India. He has a total of ten years of experience in both academia at 1332 some reputed universities, such as Ravenshaw University and the software 1333 development field. He has published many research articles in international 1334 reputed journals and serving as a reviewer for many peer-reviewed journals. 1335 He has more than 50 patents on his credit. His research interests include 1336 multiprocessor scheduling along with different fields, such as data analytics, 1337 computer vision, machine learning, and the IoT. He is also associated with 1338 various educational and research societies, such as IACSIT, CSI, IAENG, 1339 and ISC.