Overview on Binary Optimization Using Swarm-Inspired Algorithms

Swarm Intelligence is applied to optimisation problems due to its robustness, scalability, generality, and flexibility. Based on simple rules, simple reactive agents - swarm (e.g. fish, bird, and ant) - directly or indirectly exchange information to find an optimal solution. Among multiple nature inspirations and versions, the dilemma of choosing proper swarm-based algorithms for each type of problem prevents their recurrent application. This scenario gets even more challenging when considering binary optimisation because of the absence of overview papers that assembles the trends, benefits and limitations of swarm-based techniques. Based on 403 scientific papers, we describe the basis of the leading binary swarm-based algorithms presenting their rationales, equations, pseudocodes, and descriptions of their applications to tackle this research gap. We also define a new classification based on the mechanism to update the solutions and the displacements, indicating that the Binary-Binary approach - binary decision variables and binary search space - is more efficient for binary optimisation in accuracy and computational cost.


I. INTRODUCTION
Optimisation is an essential task for several fields such as Computer Science, Economy, Engineering, Bioinformatics and Operational Research [1]- [4]. Classical or brute force approaches that are represented by linear or non-linear programming suffer from the ''curse of dimensionality'' and require high knowledge of the problem (e.g. objective function, parameters, preconditions) [5]- [7]. Computational Intelligence (CI) arises as an alternative to solve the optimisation problems as it decreases the computational cost considerably, and it does not require a complete understanding of the problem.
The development of the Genetic Algorithm (GA) in the 1970s can be defined as the beginning of the optimisation using metaheuristics. The methods which use the The associate editor coordinating the review of this manuscript and approving it for publication was Jamshid Aghaei . evolution paradigm, such as the GA or the Differential Evolution (DE), are classified as Evolutionary Computation algorithms [8], [9], and they can be mapped to binary or continuous problems [9]. Just in recent years, the binary optimisation -in which the variables are binary, as ''on/off'' or ''selected/not selected'' problems -gained considerable attention in the literature. The increase in the complexity of the current binary/discrete tasks leads to the necessity to develop strategies to deal with more complex optimisation problems [10]- [12].
In the early 1990s, another area of CI became recognised: Swarm Intelligence (SI). Similarly to Evolutionary Algorithms, SI is inspired by mechanisms from nature. The term Swarm Intelligence represents the ''coordinated'' behaviour that animals such as birds (Particle Swarm Optimization -PSO [13], [14]), bees (Artificial Bee Colony -ABC [15]), fish (Fish School Search -FSS [16]) or ants (Ant Colony Optimization -ACO [17]) emerge from the individual and collective movements. These groups of animals, or simple reactive agents, behave mainly without any supervision, and their actions have a stochastic component according to the perception of their neighbourhood [6], [18]. The intelligence of the swarm lies in the interactions agent-agent and agentenvironment, using collective characteristics (e.g. decentralisation and self-organisation) [2], [19], [20]. Evolutionary computation is mainly based on competition and natural selection, while swarm intelligence relies mainly on the cooperation of the agents.
Most of the swarm-based algorithms were developed for continuous optimisation, but the demand for better algorithms for binary optimisation inspired researchers to adapt continuous algorithms to binary ones. The first version that we located in the literature of the binary PSO (BPSO) [10] use the same pseudocode of their continuous versions with some adjustments to deal with binary data. The first version located in the literature for the bees, Binary Artificial Bee Colony (BABC) [21], was inspired not only by the continuous version of ABC but also by this first BPSO algorithm. Until now, many binary algorithms still use mapping functions to transform continuous information to binary solutions during the iterations. A few papers use binary strings either in the solutions or their displacement. However, binary optimisation differs from continuous optimisation in several aspects, and the application of continuous methods and mechanisms to binary problems may not be the most efficient approach.
In order to understand better the swarm-based algorithms for binary optimisation, we have analysed a high number of papers proposed in the field. Our work points out the characteristics of the binary optimisation problem for swarm intelligence techniques, which is rarely discussed in the literature. Also, we argue about the advantages and disadvantages of considering particular algorithms, operators, and functions. We identified four main categories of methodologies applied to binary optimisation: (a) the use of genetic operators, (b) the use of mapping functions (or transfer functions), (c) the use of logic gates and (d) the use of similarity or information theory metrics. Then, we grouped the algorithms into three classes: Continuous-Continuous, Binary-Continuous and Binary-Binary, which represents the type of decision variables and the search space. Each class shares several methods and operators regardless of the inspiration of the algorithms.
Finally, the paper is organised as follows: Section II details the process of selecting and analysing the literature; Section III presents the general notation used in this investigation and the most used transfer functions to transform a continuous vector into a binary string; Section IV discusses a new classification proposal to group the binary swarm-based algorithms and presents the nine most prominent binary swarm algorithms before mentioned; Section V describes the most relevant applications found in the literature regarding each algorithm; in Section VI are the other nature-inspired algorithms found in the search, while Section VII and VIII present a discussion about some hybrid proposals and multiobjective approaches; in Section IX we perform a discussion and point some future directions, and Section X shows the main conclusions.

II. SYSTEMATIC METHODOLOGY
We selected the most relevant articles regarding swarm-based algorithms for binary optimisation from five of the most important scientific platforms: IEEE Xplore [22], ACM [23], Springer [24], and Science Direct [25]. The search criteria were the keywords ''binary'' and ''swarm''; the papers are from 2006 until the first half of 2020, and their relevance were ranked using the search engine of each electronic library. We applied an experimental analysis on the impact of using different strings on the search engine, and we noticed that including more keywords resulted in a very specialised set of papers. Therefore, as our goal was to understand the generality of the area and not specific patterns on a subtopic, we did not select other keywords.
It is important to remark that we also double-checked if some of the most important papers were presented also on the Google Scholar [26] that is well-recognised on ranking. We searched for the first 400 most relevant papers in the Google Scholar [26] and we selected extra 39 papers to our list because of their relevance to our paper. Most of the results presented in the Google Scholar were also selected on the other four databases, confirming that we could indeed identify some of the most relevant papers in the literature. We also point out that using the five platforms together, we were able to cover a high number of papers from the field.
In Table 1, we present the total number of analysed papers using the keywords: binary and swarm. Notice that all platforms presented a large number of articles, so we considered only the first 1,200 papers in each case. The exception was the IEEE Xplore library which we analysed the first 500 occurrences (as the following articles were too specific in a particular topic). The number of most relevant articles were not a priori selected; in fact, each paper was equally likely to be selected on our investigation. We analysed the papers based on five iterations of reading: (i) title, (ii) abstract, (iii) conclusions, (iv) methods and results and (v) entire paper. Finally, we selected the papers that we considered important to the literature based on historical (first proposals), generality (proposal not dependable from a specific problem) and novelty (outstanding performance and recent proposal) perspectives.   of selected works in our investigation have been expressive since 2006. Almost half of the papers selected (233, or 58%) are from journals, while the others are from conferences (170). We identified 43 different swarm-based proposals which nine of them are well-known algorithms in Swarm Intelligence: Artificial Bee Colony (ABC), Ant Colony Optimisation (ACO), Bat Algorithm (BBA), Cat Swarm Optimisation (BCSO), Firefly Algorithm (FA), Flower Pollination Algorithm (FPA), Grey Wolf Optimiser (GWO), Gravitational Search Algorithm (GSA) and Particle Swarm Optimisation (PSO). A high number of papers addresses continuous benchmark functions as a case study. Most papers convert the continuous values into binary ones applying transfer functions (e.g. sigmoid and tangent), or mapping continuous values into bits [27], [28]. New algorithms are proposed in 237 papers, in which the authors usually present some modification or improvement in previous existing proposals. Additionally, 30 proposals was found for multiobjective approaches.

III. DEFINITIONS AND NOTATIONS
In this section, we present concepts, definitions, terminologies, and notations used throughout this paper. Table 2 summarises the most common notations. An agent (i.e. particle, bat, cat, etc.) has a vector with D elements, being each element correspondent to a variable in the problem domain. To exemplify this idea, lets consider the agent i, x i = {0, 1, 0, 0, 0, 1, 0, 1, 1, 1} with D = 10. Therefore, the dimensions are x i,1 = 0, x i,2 = 1, x i,3 = 0, . . . , x i,10 = 1. This example is graphically presented in Figure 2: It is very usual to apply continuous mathematical operations using the bits as real numbers. For example, consider Equation (1) [12], [29]: where x i = {1, 1, 0, 0} and x j = {0, 1, 0, 1}. The resulting array is the subtraction of the value in each corresponding dimension, generating z ij = {1, 0, 0, −1}; the elements of z are assumed to be real numbers. Most of the binary swarm-based algorithms (but not all of them) use, during the intermediate steps, continuous information to update the displacement of an agent or its next position. Sometimes, it is also calculated as a continuous solution in the literature. In both cases, it is necessary to transform a continuous vector into a binary string before the evaluation of the fitness [30], [31]. There are some possibilities to transform (or map) a continuous array with D dimensions into a binary string with the same size. Often, the mapping process is performed in two steps. The first one is the application of the value of each dimension D into a nonlinear function. Then, the number achieved is used as a probability to determine if the respective binary dimension d is ''0'' or ''1''; the conversion is done by some conditional criterion [10], [32]. This first proposal is the most common in the literature. Considering a general continuous (real) vector z cont i [33], [34], for each dimension z cont i,d , it is possible to apply a sigmoid function S, according to (2): Then, the value found S(z cont i,d ) is used as a probability to define the bit in the respective dimension d, creating a binary element x bin i,d of a vector as described in (3) [10], [35]: x bin i,d = 1 if rand(0, 1) < S(z cont i,d )) 0 otherwise We can modify (2) by adding a constant b generating S(bz cont i,d ) [7]. If b = 1, the equation stands as presented before, otherwise b changes the format of the function. Figure 3A shows the graphic behaviour of a sigmoid function. Because of its format, it is usually called as S-shaped [30]. A different approach is to use ''V-shaped'' functions [30]. The first proposal is the utilisation of the modulus of the hyperbolic tangent in the same way as in the previous case, generating |tanh(z cont i,d )|. Equation 4 describes the mathematical formulation of this proposal [36], [37]: Alternatively, other transfer function is addressed in (5) [38]- [40]: We can resort (5) to perform the mapping of the continuous vector z cont,t i,d in each dimension to generate x bin i,d as in (6) [30], [38]: wherex bin i,d means that the current state of the variable d must be flipped and x bin i,d means that the current bit has to be maintained in the next iteration. Note that in this case |tanh(z cont i,d )| can be replaced by |V (z cont i,d )|. Figure 3B shows the graphics of the most used V-shaped functions. Observe the difference of both proposals. We highlight that there are other mapping methods. Dahi et al. [41] describe mapping methods using nearestinteger, normalisation, angle modulation and search process. Mirjalili and Lewis [30] discuss the impact of S-Shaped and V-Shaped functions in the BPSO. In the following sections, we describe other proposed mapping methods which are used a few times in specific swarm-based methods. We highlight that one transfer function is not necessarily better for all types of swarm-based algorithms. Figure 4 depicts the most frequent algorithms mentioned by the selected papers and, as we can see, the most prominent names (calculated accordingly to the number of papers that mentioned a particular algorithm) are mostly related to the PSO and some of its variations. Besides the PSO-based algorithms, ABC, ACO, GSA and BA are other prominent algorithms in Figure 4.

IV. ALGORITHMS
All the versions of the swarm-based algorithms in this section can be divided into four main categories based on their similar methodologies and operators: a) Use of genetic operators -the beginning of the natural algorithms for optimisation started considering the evolutionary approaches, mainly the Genetic Algorithm, which uses genetic operators (as crossover and mutation) [3], [9]. The use of genetic operators can be found in the algorithms as a kind of hybridisation; b) Conversion of decision variables from continuous to binary values -the majority of binary swarm-based algorithms was adapted from continuous versions. Then, the researches created adaptations to deal directly with binary decision variables, usually using some mapping function, like those described in Section III, like S-Shaped [1]; c) Use of logic gates -all the variables and agents are in the binary domain and the operations consider the logic gates such as AND, OR, XOR and NOR [2]; d) Use of similarity or information theory metricsrecently, a new trend appeared in the literature, the use of measurements of binary structure similarities, like the Jaccard's coefficient [42]. Other metrics from information theory can also be used such as entropy [43]. The 403 analysed papers showed that almost all papers are included in the items: b) and c). Looking at the similar patterns from the most popular methodologies and operators in swarm-based algorithms, we introduce a new proposal to classify binary swarm-based algorithms using the representation of the decision variables and the search space. We observe three groups that summarise the typical patterns of swarm-based techniques for binary optimisation: 1) Binary-Continuous The candidate solutions in this Binary-Continuous class are binary vectors from the start until the end of the iterations. However, the displacement (e.g. the velocity VOLUME 9, 2021 FIGURE 4. Wordcloud of the swarm-based algorithms mentioned in the selected papers. Note that the size of the word is proportional to the frequency that it appears in the papers (i.e. the number of papers that mentions the algorithm). Therefore, PSO and BPSO appear more frequently in the majority of the papers.
in the PSO) is calculated and updated as a continuous array before acts in the agent. The information of the displacement vector is converted into binary numbers commonly using a transfer function, like the functions described in Section III.

2) Continuous-Continuous
The Continuous-Continuous class is the closest of the original versions of the algorithms, which were initially developed to solve continuous problems. In this case, all the steps are performed in the continuous (real) space. Therefore, the candidate's solutions, as well as the displacement process, are continuous vectors. The conversion to binary occurs by applying some mapping process to the agents to evaluate their fitness. Interestingly, this proposal is rarely found.

3) Binary-Binary
In the Binary-Binary approach, the algorithms operate in the binary space. The candidate's solutions and the displacement are binary vectors, and the interactions between the swarm occur using binary methods such as the logic gates. Consequently, there is no need to use transfer functions. Therefore, the algorithms in this class are usually more efficient and less computationally costly than the others because they treat the problem in the proper binary search space using binary decision variables.
Algorithms inside each group share similar patterns also concerning computational complexity and cost. This is the case because their mechanisms usually have common levels of memory consumption and a number of methods and functions. In Swarm Intelligence, we argue that a fair comparison across algorithms from different inspirations requires the comparison to be based on fitness evaluation, number of agents and complexity of any called function and method.
For binary optimisation, there is more development in Swarm Intelligence on the Binary-Continuous approach. In the following sections, according to our classification rule, we describe the core insights of the nine most popular swarm-based algorithms applied for binary problems: Artificial Bee Colony (BABC), Ant Colony Optimisation (BACO), Bat Algorithm (BBA), Cat Swarm Optimisation (BCSO), Firefly Algorithm (BFA), Flower Pollination Algorithm (BFPA), Gravitational Search Algorithm (BGSA), Grey Wolf Optimiser (GWO), and Particle Swarm Optimisation (BPSO). The binary algorithms inspired by the behaviour of birds (PSO) and bees (ABC) have versions on all three categories. The binary versions inspired in the behaviour of cats (CSO) present versions on Binary-Continuous and Binary-Binary versions. The binary versions inspired on the movement of ants (ACO), bats (BA), wolves (GWO) and planets (GSA) present versions only in the Binary-Continuous category. Firefly (FA) and Flower Pollination (BFPA) are only adapted to Continuous-Continuous category.
Rashedi et al. [51] and Mirjalili et al. [52] argue that some aspects should be considered when applying a transfer function to swarm-based techniques: • The values mapped by the transfer function should range between 0 and 1.
• Small values (closer to 0) from the transfer function represent small probabilities of changing the position, or few changes on the position.
• High values (closer to 1) from the transfer function represent high probabilities of changing the position, or several changes on the position.
• The algorithm should have a mechanism to control how the transfer function influences the swarm based on the aforementioned points. Even though it is a straightforward map between the two search spaces, it does not account for all the characteristics of binary optimization problems such as the emergence of agents (e.g. fish, bee, particle) with similar positions, which rarely happens in continuous optimization. In this way, the transfer function can indeed influence the emergence of several agents with the same position and premature convergence.
It seems that for some binary problems such as Feature Selection that the goal is to minimise the number of selected features, the transfer function can be biased to select more 0s than 1s [53], so the swarm-based algorithm chooses quickly the best feature vector with a small number of 1s.
As, in this class, it is common to use most of the methods similar to the ones created by the continuous versions, the core strategies of the algorithms are not changed. Therefore, the new versions here focused on finding the best step/way to map the continuous solution to a binary one. Sigmoid functions are widely applied for any swarm-based algorithm, but the V-shaped function seems to have better results for the version of Flower Pollination Algorithm, for example.

B. BINARY-CONTINUOUS ALGORITHMS
For the Binary-Continuous approach, the swarm-based algorithms continue to apply a transfer function to map continuous vectors to binary vectors, but the agents are represented by binary vectors (position). The majority of the mechanisms are also maintained from the original versions, but the operations may change depending on the version.
In this category, most swarm-based algorithms have a binary version such as the following: 1) Binary Artificial Bee Colony (BABC) [36], [54] (explained in Appendix B-A); 2) Binary Ant Colony Optimization (BACO) [55]- [57], [57]- [61] (explained in SectB-A1); 3) Binary Bat algorithm (BBA) [30], [39], [62]- [65] (explained in Section B-A2); 4) Binary Cat Swarm Optimization (BCSO) [28], [66]- [69] (explained in Section B-A3); 5) Binary Gravitational search algorithm (BGSA) [37], [51], [70]- [73] (explained in Section B-A4); 6) Grey Wolf Optimizer (GWO) [7], [74]- [79] (explained in Section B-A5); 7) Binary Particle Swarm Optimization (BPSO) [31], [38], [40], [74], [80]- [96] (explained in Section A-D). As the key idea in swarm intelligence is to update a new position of an agent considering a previous position, a collective position or a neighbour position, the algorithms provide adaptations for the original equations to incorporate binary solutions. One adaptation commonly used is a transfer function (e.g. S and V-shaped) that maps a value of a continuous vector directly to a binary solution (such as the velocity vector in PSO). Other methods are the use of mathematical operations to create a binary solution such as frequency, average and probability of values. For example, an agent will flip a dimension to be similar to the majority of the swarm because more than half of the swarm have the dimension in a particular direction (0 or 1). Some algorithms also map each dimension 0 or 1 to a continuous value to apply a continuous mechanism and then return to a binary solution after the mechanism is performed. We see that even though the versions find optimal solutions for a set of binary problems, the excess of mapping values from continuous to binary and binary to continuous might not be the most efficient alternative.

C. BINARY-BINARY ALGORITHMS
Binary-binary algorithms are the only class in which the proposals are made for binary optimization as their mechanisms and operators, binary decision variables and binary search space. In this category, the proposals usually use logic gates, genetic operators (e.g. crossover and mutation) and binary stochastic processes. We found three kind of swarm-based algorithms in this category: 1) Binary Artificial Bee Colony (BABC) [54], [97]- [100] (explained in Appendix C-A); 2) Binary Cat Swarm Optimization (BCSO) [101], [102] (explained in Appendix C-A1); 3) Binary Particle Swarm Optimization (BPSO) [103]- [108] (explained in Appendix C-A2); Taking into account the computational cost, Binary-Binary algorithms seems to be much faster than the other categories because it considers a much smaller search space. Small changes on the continuous search space can represent no change on a projected binary search space, causing unnecessary search. Moreover, binary methods such as Boolean operators and crossover are usually less computationally costly than methods that account for continuous search. Thus, binary methods, search space and storage usually requires less memory and operations.
In this class, we also see that the operators follow different operations than the original ones, but the inspiration and goal of these strategies maintain the same. For example, the update of velocities and displacements will continue to provide convergence depending on the collective information at the moment. Moreover, mechanisms of local search will be adapted to wiser random changes depending on the individual or social information.

V. MAIN APPLICATIONS
In this section, we describe the main applications found in the literature. The results presented in Figure 5 shows the diversity of the problems used to assess and validate the performance of the algorithms. It is worth mentioning that the benchmark functions and the feature selection problem were the most common applications. PSO displays the most extensive set of applications, which covers most of the problems present in the literature.

A. APPLICATIONS OF BABC
• Wang et al. [109] proposes a hybrid approach using BABC and SVM to improve Intrusion Detection Systems (IDSs). The results showed that the proposal was able to overcome the results found by BPSO and GA. Similarly, Wei et al. [54] showed that BABC could overcome the BPSO and BGA to optimise benchmark functions.
• A Novel BABC algorithm (disABC) was developed by Kashan et al., in which the authors evoke the concept of dissimilarity to generate new solutions [12]. They applied the algorithm in the uncapacitated facility location problem (in 15 benchmark instances) and showed that the disABC could be better than BinDE and PSO.
• Jia et al. introduced the BABC using Bitwise Operation (BitABC) [98]. In this work, because of the binary nature of the variables, they suggest the use of binary operations as those performed by logic gates. The proposal was the best in comparative to DisABC, normABC, BinABC to optimise benchmark functions.
• Ozturk et al. proposed the use of evolutionary mechanisms, like the crossover, to improve the BABC, creating the BABC based on Genetic Operators (GB-ABC) [110]. The proposal overcame GA, PSO based dynamic image clustering (DCPSO) and a binary ABC model (DisABC) to solve dynamic clustering problems. The same authors optimised three problems: dynamic image clustering, 0/1 knapsack problems and benchmark functions from CEC2005 using the Improved DisABC (IDisABC), which was proposed in this study [3]. The new approach overcame the GB-ABC, DisABC, Quantum Inspired BPSO (QBPSO), BPSO and GA.
• In the paper from Hancer et al. [5] [44]. The proposal was the best in comparative to ABCbin, DEbin, GA and A-SUKP algorithms, and the unique paper presenting the real encoding to all the operations during the search process.
• Zhang and Zhang used the BABC to construct spanning trees in vehicular ad hoc networks [112]. Despite using more computational time, BABC found similar results than Kruskal, a classical algorithm. Moreover, BABC could also produce candidate suboptimal spanning trees that could be useful when some nodes become unavailable.
• The work from Shunmugapriya [18] addresses a feature selection using benchmark functions as a case study to evaluate the hybrid BABC and ACO algorithm (AC-ABC). The new proposal overcame the ACO, ABC, ABC-DE, PSO, CatFish Binary PSO.
• In the article from Xu et al., they solve a two-level distribution optimisation problem using a multiobjective approach named BEES-Binary Bees Algorithm (BBA), which adopt the concept of Pareto dominance to develop the search [113].

B. APPLICATIONS OF BACO
• The paper from Fernandes et al. [114] was the oldest we found about BACO, published in 2007. The basis of the algorithm is present in the paper, even though they called the proposal as Binary Ant Algorithm (BAA). The update in the pheromone is a little different from FIGURE 5. Network representation of the swarm-based algorithms applied to different problems. In this network, the nodes represent problems or algorithms, and the links indicate that an algorithm was assessed in a given problem. The node size is proportional to the number of connections it has, and the colours are a result of a clustering process. Note that the number of small red nodes correspond to a variety of problems in which only the PSO was applied. Moreover, we can see that continuous benchmark functions and feature selection are the most used problems.
those described in Section B-A1. They compared the algorithm with the GA using benchmark functions.
• Zhao and Yan [115] introduce the Bottleneck Assigned Binary Ant System (baBAS), based on the traffic organisation phenomenon in ants swarm, under a high level of crowded conditions. The proposal achieved better performance than Binary Ant System (BAS), Binary Ant System with Elitist Strategy (BASe) and Niche GA in benchmark problems.
• Kuo [61] use the BACO with a hypercube framework in pixelated source optimization for improving lithographic resolution. The numerical simulation showed that BACO effectively searches the optimal source shape in the problem.  [57]. They applied the algorithm in some benchmark functions and compared the achieved solutions with the NSGA-II and Ant Colony Algorithm for solving multiobjective problems.
• Zangari et al. proposed the Improved Decompositionbased Multi-objective BACO (MOEA/D-ACO) [121] to solve the multi-objective unconstrained binary quadratic programming (mUBQP). The result shows that the proposal can be better than the traditional multi-objective evolutionary algorithm based on decomposition (MOEA/D).

C. APPLICATIONS OF BBA
• Kaur et al. [122] address the BBA together with the DWT-SVD model to enhance the quality of underwater images. The simulation showed that the hybrid approach using BBA had overcome the existing technique when compared using various parameters, such as Bit Error Rate, Entropy, and Normalized Cross Correlation.
• Gupta et al. address the algorithm to perform feature selection to classify white blood cells using the KNN [123]. The results showed that the Optimized Binary Bat Algorithm proposed outperformed Optimized Cuttlefish Algorithm (OCFA), and Optimized Crow Search Algorithm (OCSA) in efficiency and accuracy.
• Mokhov et al. [124] created a mathematical formulation of the problem called ''on space flights'' from the film ''Planet Ant: Life Inside the Colony (2012).'' In summary, it is a discrete optimisation task that consists of finding the best spacecraft's route between two planets. The results showed that BBA presented better results than the BPSO.
• Basetti et al. [63] suggest the application of the Taguchi method to perform the initial population of the bats. They give the acronym TBBA to the hybrid proposal. The application was the optimal PMU placement for the power system.
• Amine et al. [39] proposed a multi-objective bat algorithm (MBBA) using the concept of Pareto Dominance. They applied the algorithm to optimise benchmark functions, and the results overcame those achieved by NSGA-II.

D. APPLICATIONS OF THE BCSO
• Sharafi et al. [28] were one of the first to propose a binary version of the CSO algorithm, which they named Discrete Binary CSO. They apply the BCSO to solve some instances of 0/1 knapsack problem and benchmark functions. The results were compared to the GA and two different versions of the BPSO.
• In the work of Mohamadeen et al. [66], the authors utilise the BCSO to define the parameters of an SVM. The goal was to select the tests that can be employed to classify transformer health indexes. The results were successfully compared to those achieved by the BPSO.
• In the same way, Srivastava and Maheswarapu applied the BCSO to solve the optimal PMU placement problem [69]. The algorithm presented better results than BPSO, Generalized Integer Linear Programming and Effective Data Structure Based Strategies.
• The paper developed by Li et al. [68] addresses the BCSO to the antenna selection problem. The authors analysed the algorithm changing the number of iterations, and agents (cats) in the swarm. Simulations showed that, for large scale MIMOs systems, BCSO shows advantage in the antenna selection problem.
• Kumar et al. [67] solved scheduling workflow applications in cloud systems using the application of a Discrete Binary Cat Swarm Optimization (DBCSO) that they introduced. Their proposal achieved good results in comparison to BPSO.
• Pappula and Ghosh utilised the BCSO to planar thinned antenna array synthesis. In this case, they present a multi-objective version (MOBCSO), using the Pareto dominance criteria. The results were better than those obtained with the multi-objective BPSO (MBPSO) [125].
• Siqueira et al. [101] proposed a version of a Binary-Binary BCSO, the Boolean BCSO. In this algorithm, the authors compared the performance of the method with the BPSO, GA and the BCSO in the 0/1 knapsack problem. In 2020, Siqueira et al. [102] proposed the Simplified version of BCSO that overcame other swarm-based algorithms in the One max, Subset sum, 0/1 Knapsack, Multiple Knapsack and Feature Selection problems.
• A BCSO algorithm was also applied for manufacturing cell design problem [126] that optimises the transportation of manufactured parts between cells. Soto et al. include the Autonomous Search algorithm into the BCSO, improving the fitness on the middle-to-late stages of the simulation.

E. APPLICATIONS OF BFA
• Chandrasekaran et al. [6] address a unit commitment (UC) problem, achieving better results than PSO and GA, among others. Note that in this paper, the calculation of the parameter vector r ij is specific to solve this problem.
• Liu et al. [45] solve a spectrum allocation optimisation for cognitive radio networks (CRN). The computational results showed that using network reward or average user's fairness as fitness, the proposal performed better than BPSO and GA in the spectrum allocation problem. • Shilaja and Ravi optimised the emission/economic dispatch in solar photovoltaic generation utilising the new Euclidean Affine Flower Pollination Algorithm (eFPA) and BFPA [129]. The final results were favourable to this proposal in comparative to PSO and FPA.

G. APPLICATIONS OF BGSA
• The work from Ji et al. [98] proposes an improvement in the algorithm using a quantum-inspired computing concept. This numerical computational method addresses the principle of quantum mechanics, creating the Quantum-inspired BGSA (QIBGSA). They solve an instance of the thermal unit commitment with wind power integration problem. The authors show that the QIBGSA can overcome the BPSO and BGA.
• The paper developed by Nezamabadi-Pour presented another proposal of Quantum-inspired BGSA.
He addressed three different applications: 0/1 knapsack problem, Max-ones problem and the optimisation of Royal-Road functions. The novel proposal was superior to standard BGSA, GA and three versions of Quantum-Inspired Evolutionary Algorithms (QIEAs) [37].
• Barani et al. [71] introduced the Improved Binary Quantum-Inspired Gravitational Search Algorithm (QGSA-UC) to solve a unit commitment problem. The results obtained showed that the method could be better than GA, Evolution Programming (EP), Differential Evolution (DE), Simulated Annealing (SA) and some versions of BPSO.
• Chakraborti and Chatterjee [72] introduced a Binary Adaptive Weight GSA (BAW-GSA). The application addressed was a feature selection for face recognition. The BAW-GSA was a better optimiser than BPSO and BGA in this case.
• Rouhi and Nezamabadi-pour investigated the use of an improved version of the BGSA to perform feature selection in 5 high-dimensional microarray databases. The classifier addressed is the KNN [70].
• The work from Khanesar and Branson III [130] proposed the XOR BGSA to improve the capability of the method to solve instances of the knapsack problem and to optimise benchmark functions. It was shown that the proposal results overcome BGSA, NBPSO, IBPSO, and BPSO.

H. APPLICATIONS OF BGWO
• Emary et al. [7] and Hu et al. [76] solved feature selection using UCI datasets. Chantar et . [131] employed BGWO to enhance a wrapper-based feature selection technique for Arabic text classification. The results showed that GWO-based process using elitebased crossover approach (55) revealed a superior performance in the problem. Lastly, modified binary grey wolf optimizer (MBGWO) was proposed by Alzubi et al. [77] to choose relevant features to intrusion detection system problem. The results showed that the proposal added significantly enhanced the performance of the IDS.
• Luo et al. [78] introduced a new BGWO proposal to solve the multidimensional knapsack problem. Compared with BFOA, Hyvrid Harmony Search (HHS), Quantum PSO QPSO) and bGWO-o, the proposal obtained the best performance on the two benchmark problems. Also, the results showed that the V-shaped function could achieve better results in the problem.
• A quantum-inspired binary grey wolf optimizer (QI-BGWO) was proposed by Srikanth et al. [79] to solve unit commitment problem. Statistical tests are performed to show the superior performance of the proposal.
• Jiang et al. [75] proposed the Improved Binary GWO to solve the dependent task scheduling problem in edge computing. The proposal showed faster when compared with BGWO and performed better than BBA and BPSO.
• Panwar et al. [132] showed that BGWO has superior results solving unit commitment problem when compared to classical and heuristic approaches. The algorithm has demonstrated better performance in small, VOLUME 9, 2021 medium and large instances. Reddy et al. [133] used BGWO to solve profit unit commitment. The results showed that one of BGWO configurations reached the highest profit in all the cases.

I. APPLICATIONS OF BPSO
In our search, we found 263 papers addressing BPSO, about 65% of the total amount. Hence, different from the other algorithms discussed in this work, we chose to present the applications dividing the papers into main areas. We list below the number of works categorised in each case and the main subjects: a) Benchmark (54 papers) -feature selection in repositories, optimisation of mathematical functions, OneMax problem and 0/1 knapsack problem; b) Biology (34 papers) -drug design, DNA design, detection of diseases, cell layout, feature selection, classification in diseases dataset, among others; c) Computer Sciences (36 papers) -face recognition, image processing, software tests, computing security, coalitional games, neural networks design, cryptanalysis and software reliability; d) Engineering (21 papers) -allocation problems, scheduling, electrohysterogram signal, design resonators and structural topology; e) Electrical Engineering (69 papers) -unit commitment problems (UCP), phasor measurement unit (PMU) placement, control, power system reliability, distribution feeder scheduling, distributed generation interconnection, placement of generators, fault location, hydrothermal generation scheduling, microgrid and voltage regulation; f) Telecommunications (27 papers) -antenna problems, wireless sensor network, wireless local area network (WLAN) and cognitive radios; g) Others (22 papers) -NARX model structure, classification, time series, text mining, steganography, routing problem in VLSI circuits, among others.
Most of the papers used the BPSO as a previous step before applying another technique to solve the problem. In this sense, the algorithm is utilised, especially in feature selection. In Appendix B (Section D), we present all the acronyms of the version of the PSO, which is the most widely present in our study. We also suggest that the paper from Jordehi et al. [1] covers a great review about PSO for discrete optimisation problems.

VI. OTHER ALGORITHMS
We present in this section the summary of other 20 different swarm-inspired techniques that are not as popular as the ones previously discussed in Section IV, but they can present better results depending on the problem.

Algae:
The Binary Artificial Algae Algorithm (BAAA) mimics the algae search for food [134], [135]. BAAA [134] is created in 2016 using three mechanisms: elite local search, transfer function and repair operator. The repair operator that is something unique from BAAA minimises the search by removing the infeasible solutions. This version of BAAA applied to 94 benchmark problems is efficient when compared to MBPSO, BPSOTVAC, CBPSOTVAC, GADS, bAFSA, and IbAFSA. Then, in 2018, Korkmaz and Kiran [135] proposed a new version of BAAA using a stigmergic behaviour and XOR logic operator to solve uncapacitated facility location and benchmark problems. The stimergic operator is a controlled mutation that allows the swarm to find better solutions. is a mono-objective version that adapts the fitness function as a regression to solve multi-objective problem [138]. The BOA shows a high exploitation and convergence rate based on the employed of a random walk and elitism. The method is inspired by the food foraging behaviour of butterflies, which is based on the sense of the fragrance of flowers. In the investigation provided by Arora and Anand, the authors use 21 datasets from the UCI repository to prove the efficiency of the method [138]. Also, they use S-and V-shaped transfer functions. Coyote: The Binary Coyote Optimization Algorithm (BCOA) was inspired by the intelligent social organisation of a group of coyotes. A binary approach was introduced by Souza et al., which uses the hyperbolic tangent as a transfer function, to deal with binary data [139]. In this work, the authors investigated the effectiveness of the method using a Naive Bayes classifier and benchmark functions. Crow: We identify two works that addressed the Binary Crow Search Algorithm (BCSA). The natural inspiration comes from flocks of crows that fly in a surface. These animals can hide food, memorise and protect their caches. Also, they follow each other to find a better food source. The work from Souza et al. suggests a binary version of the continuous CSA that uses the V-Shaped transfer function [140]. The case study involves feature selection using benchmark problems. A second work proposes to solve two-dimensional packing problem with Fixed Orientation using a BCSA with S-Shaped transfer function [141]. The computational results showed the superiority of the method in comparison to the BPSO. Cuckoo: The Binary Cuckoo Optimization Algorithm is inspired in the brood parasitism of cuckoo bird, which places their eggs in the nests of other species. Dalili and Karegar applied a modified version of the algorithm (the MBCOA) in some PMU placement problems [142]. Some instances of the Unit Commitment Problems are solved using the Improved Binary Cuckoo Search Algorithm (IBCSA) by Zhao et al. [143]. Garcia et al.  [146]. Fish: The Breeding Artificial Fish Swarm Algorithm (BAFSA) is used to solve an optimal cluster head selection in Wireless Sensor Networks problem by Sengottuvelan and Prasath [147]. The method was inspired in the preying of a school of fish. The Binary Fish School Search Algorithm (BFSS) is another algorithm inspired on fish introduced by Sargo et al. [148]. In this investigation, the method was applied for feature selection in benchmark problems, and the intensive care unit readmission problem. The algorithm mimics the collective behaviour of fish schools, regarding their mechanisms of feeding and coordinated movement. Carneiro et al. [53] improved the BFSS to Feature Selection by adding a random flipping mechanism of variables and biasing the swarm to initialise closer to 0. Then, Santana et al. introduced the Simplified Binary Fish School Search (SBFSS), which present several modifications in comparison to the BFSS, such as a reduction in the number of free parameters [149]. Considering the KNN as a classifier, the SBFSS overcame other methods in feature selection, like ABC, GA, and PSO [149].

Fruitfly:
The Binary Fruitfly Optimization Algorithm (BFOA) was inspired by the foraging behaviour of fruit flies, highlighting their sensitive vision and the smell of the food [150]. In the paper from Wang et al., the authors present some modifications to solve some instances of the multidimensional knapsack problem (MKP). Grasshopper: The Grasshopper Optimization Algorithm (GOA) was inspired by the behaviour of grasshopper swarms in nature, considering food search and social interactions [151]. In larval stages, the agents of a swarm perform slow movements through small steps. On the other hand, in adulthood, they perform long-range and abrupt movements. These combinations are used as the basis for the GOA [152]. The work from Hichem et al. [151] introduces a binary version of the algorithm to deal with feature selection in benchmark functions. The work from Mafarja et al. proposes the application of transfer functions to solve similar tasks. Pinto et al. use these premises to deal with some instances of the binary knapsack problem [153]. Glowworm: The ability of glowworms to change the intensity of the luciferin emission was the main inspiration to create the Binary Glowworm Swarm Optimization (BGSO). Mingwei et al. [154] addressed this idea to solve instances of the unit commitment problem. Xia et al. introduce a forecasting method based on improved binary glowworm swarm optimisation and multi-fractal dimension (IBGSOMFD) for feature selection. In this work, the proposal combined with an SVM overcame other methods, considering benchmark databases [155]. Also, the method is used to risk prediction of P2P lending investment. Moth: A discounted knapsack problem is solved by using a binary version of the Moth Search Algorithm [156]. The method was inspired by Lévy flights and fly straightly of the moths. The authors proposed an investigation considering for the first time nine mutations procedures. Owl: Once again, feature selection is the application addressed to test a swarm-based method. In work from Mandal et al., the Binary Owl Search Algorithm (BOSA) is introduced considering six variants of transfer functions [157]. The method was able to improve the classification accuracy of an SVM, overcoming traditional approaches, as BPSO, BGA and BHS. Pigeon: Rojas-Galeano proposed a modified version of the Urban Pigeon-Inspired Swarm Algorithm to deal with binary problems [158]. Urban pigeons present feeding habits that can be classified into two modes, flock feeding and solitary feeding (similar to the Cat Swarm Optimization). The solitary mode is used to explore the surface to avoid premature convergence. In the flock mode, the agents follow a pigeon that has found a food source. The authors used benchmark functions and real-world problems to evaluate the search capability of the method. VOLUME 9, 2021 Salp: Salps are marine animals with cylindrical gelatinous bodies. They move by pumping water longitudinally through their bodies. Simultaneously, they filter such water through a set of internal structures to retain plankton. A group of these animals is named salp chain. Based on these premises, the Salp Swarm Algorithm (SSA) was developed, using as inspiration their food search. The work from Ahmed et al. proposed a binary version of the SSA with Chaotic maps to solve feature selection, using benchmark functions [159]. Antenna array synthesis problems are solved by Mondal and Saxena addressing the same algorithm and transfer functions to transform the output response into binary strings [160]. The work from Rizk-Allah et al., proposed a new binary version of the algorithm, using modified transfer functions [161]. Other works have addressed the BSSA, like in [160], [162]- [164], mainly in feature selection tasks. Spider: From Cuevas et al., was inspired by groups of spiders that interact using as rules the biological laws of the cooperative colony [165]. From Yu and Li, is based in foraging strategy of social spiders, using the vibrations on the spider web to determine the positions of preys [166]. Shukla and Nanda used the Binary Social Spider Optimization (BSSO) algorithm [167], which was based on the proposal from Cuevas et al. [165]. Such a method was addressed for unsupervised band selection in compressed hyperspectral images. In 2020, Binary Social Spider algorithm (BinSSA) [168] was introduced by showing the effect of choosing different transfer functions and using or not a crossover operator. The authors stated that the usage of crossover is effective because it balanced the exploitation and exploration mechanisms. This algorithm follows Yu and Li's approach. Similar propositions were presented in [169] and [168]. Symbiotic Organism: The Binary Symbiotic Organism Search (BSOS) was proposed in the work from Han et al. [170]. They used the continuous version of the algorithm as the base, addressing S-Shaped functions to transform the agents into binary strings. The algorithm was inspired by the symbiotic relationship between two individuals from different populations in an ecosystem. The authors classify these relationships in mutualism, commensalism, and parasitism. Once again, the feature selection problem is addressed considering benchmark classification problems using the KNN as the classifier. Vulture: Almonacid et al. solve the manufacturing cell design problem using the Egyptian Vulture Optimization Algorithm (EVOA) [127]. The inspiration of the method arises from the abilities of the Egyptian vulture to break eggs using pebbles and to rotate objects using twigs. Whale: The Whale Optimization Algorithm (WOA) was inspired in the behaviour of the whales in the oceans, based on their feeding. These marine mammals swim to prey in a unique spiral. This process was modelled considering three specs: the scope of hunting, spiral trajectory, and random search. Some works have addressed binary versions of the WOA. The investigation conducted by Xu et al. proposed the use of an improved BWOA for Feature Selection of Network Intrusion Detection [171]. The research group from Hussien solved feature selection and discrete optimisation problems using this algorithm [172], [173]. Reddy K. et al. and Kumar and Kumar approached unit commitment problems [174], [175]. In all cases, they showed that the algorithm could overcome many other metaheuristics, mainly versions of the PSO.

A. COOPERATIVE ALGORITHMS
In this section, we present other algorithms that are not directly inspired by groups of animals' behaviour. However, their operators and forms of information spread between the agents are cooperative instead of evolutionary (generational). In this sense, the agents do not ''die'' along with the iterations, but change their position in the search space considering the collective information.
To exemplify the idea, nature-inspired algorithms, such as the Black Hole Algorithm or Tree-Seed Algorithm, mimic other kinds of natural phenomena. We can find a similar idea in the BGSA algorithm of Section B-A4. In contrast, the Fireworks Algorithm or the Open Source Development Model comes from artificial phenomena inspirations.
The Forest Optimization Algorithm (FOA) comes from the process of seeding of trees. Ghaemi et al. introduced this method to solve feature selection tasks using benchmark functions proposing a binary version, the FSFOA [176].
The binary version of the Tree-Seed Algorithm was proposed by Cinar and Kiran [42]. They introduced three ideas to implement the binary proposal: using logic gates (Log-icTSA), similarity measurement techniques (SimTSA), and a hybrid variant (SimLogicTSA). As the name indicates, it mimics natural tree-seed behaviour too. The last approach was very competitive in comparison to binary versions of the ABC, PSO and DE.
The Binary Brain Storm Optimization (BBSO) was applied to medical data classification by Ogwo et al. [177]. The human brainstorm process inspired the BBSO. The authors used wrapper methodology and several classification methods. Similarly, the Imperialist Competitive Algorithm (ICA) emulates human social evolution. The work from Mirhosseini and Nezamabadi-pour address a binary version of such algorithm based on transfer functions to solve knapsack problems, the feature selection problems, and the Content-Based Image Retrieval [178].
The display of fireworks upon explosion inspired the Fireworks Algorithm (FWA). A binary version (BFWA) is introduced by Reddy et al. [179] to solve a profit based unit commitment (PBUC) problem. They execute an extensive investigation on the performance of the proposal, including versions of the PSO, GA and others, proving the efficiency of the BFWA. Xu et al. [180] also used the BFWA to solve instances of the knapsack problem.
The open-source software development mechanism and community's behaviours inspired the Open Source Development Model (ODMA) algorithm. The binary version of the method is introduced by Khormouji et al. [27]. In their study, the method overcame the BPSO and GA in benchmark functions optimisation.
The behaviour of black holes in outer space and the gravitational attraction inspired the Binary Black Hole Algorithm (BBHA). Pashei and Aydin investigated the application of the method in feature selection and classification on biological data [11]. They conclude that it can overcome versions of the BPSO, GA and others.
Binary Harmony Search Algorithm (BHSA) has an exciting inspiration: the improvisation process of jazz musicians. Three papers addressed this method in binary problems: i) from Wang et al. [181], to solve 0/1 knapsack problems; ii) from Gholami et al. [182], in which the authors use versions of the BHSA in feature selection for classification; iii) from Lee et al. in which the authors use it in feature classification in EEG signal [183].
The Binary Equilibrium Optimization (BEO) is a physics-based algorithm (as the GSA) inspired by dynamic controlled volume mass balance models to estimate equilibrium states. The paper from Gao et al. [184] presented a version considering the sigmoid as transfer function for feature selection in benchmark problems. Using similar problems, Zhao et al. changed the transfer functions to V-Shaped approaches [185]. In both cases, the comparative analysis involved the use of distinct parameters.
The Binary Water Wave Optimization (BWWO) for feature selection was introduced by Ibrahim et al. [186]. The algorithm was inspired by water waves phenomena, like propagation, refraction, and breaking. Using 17 datasets, the authors showed that the BWWO could overcome traditional approaches, as the PSO and GWO, considering the KNN classifier.
Binary Multi-Verse Optimizer (BMVO) [187] is applied on 13 benchmark functions (unimodal and multimodal), Feature Selection using four UCI datasets, and 0/1 knapsack problem. The BMVO was also compared to Binary Bat Algorithm, Binary Particle Swarm Optimization, Binary Dragon Algorithm, and Binary Grey Wolf Optimizer. The method was superior on the majority of functions with different dimensions with breakneck convergence speed.

VII. HYBRID APPROACHES
Some papers addressed in this work present significant improvements using the hybridisation of swarm-based algorithms with some other nature-inspired proposals. Figure 6 presents a visualisation of the most common algorithms/methods adopted in the hybridisation process. Note that the connections in the network mean that nodes connected were used in a hybrid algorithm. Also, the size of the node is proportional to the number of times that the algorithm was used. Note that, among the papers analysed, depending on the characteristic of the problem tacked by the algorithm, the hybridisation can happen between an optimisation method and a problem-specific algorithm (e.g. the PSO combined with the K-Means to deal with clustering problems). Furthermore, we can see that the Bird, Bat, Wolf Pack and Firefly are the swarm inspirations that are most frequently used to produce hybrid versions.
Zhao et al. [188] proposed a priority planning and hierarchical learning (PHSO) adding the velocity updating from LLSO to BPSO. The algorithm proposed can produce highquality solutions. The added mechanism divides particle into groups that where particles in inferior groups can learn from superior groups. Using learning steps, it is possible to select high-priority resources to become a candidate solution.
A Hybrid iBPSO and SFLA algorithm was proposed by Rajamohana and Umamaheswari [189] to solve feature selection to improve the accuracy of classification of fake reviewers. In the proposal, iBPSO population is provided as an input to SFLA algorithm that uses the pre-optimised solution as the initial population. Mafarja and Mirjalili [43] have created a hybrid approach using a similar approach. In their proposal, the binary ant lion starts using an improved population generated by two rough sets entropy reduct methods (QuickReduct and CEBARKCC).
Jia and Lu [190] have proposed a Taguchi binary particle swarm optimisation (HTBPSO) to optimise antennas designs. They have also included catfish operator in order to avoid premature convergence.
Kumar et al. [191] have proposed hybrid binary PSO and sine cosine algorithm (HBPSOSCA) to solve feature selection. The proposed algorithm uses SCA and PSO in order to improve exploration and exploitation, respectively.
Another hybrid technique with PSO is present in the literature. Using binary PSO with the decision tree pruning technique, Malik et al. applied the algorithm to network intrusion detection [192].
Lin et al. [193] have proposed a Hybrid Binary Particle Swarm Optimization (HBPSO) that adds a new position updating rule, the tabu-based mutation operators to generate diversity. Additionally, an iterated greedy local search procedure was proposed to repair infeasible solutions obnoxious p-median problem. In a different paper [194], they have proposed an HBPSO/TS a variation that uses a tabu search to intensify the search.
A home energy management system formulated as an MKP was proposed by Naz et al. [195]. Enhanced Differential Harmony Binary Particle Swarm Optimization is an hybrid algorithm that uses HSA, EDE and BPSO.
A Hybrid Binary Dragonfly Enhanced PSO (HBDEPSO) [196] and Hybrid Binary Bat Enhanced PSO (HBBEPSO) [197] were proposed by Tawhid and Dsouza to solve feature selection problem. Using the dragonfly algorithm, the HBDEPSO proposal can obtain diverse solutions, and the enhanced PSO increased the convergence to the global best solutions. As HBDEPSO, in HBBEPSO, the velocity vectors were updated independently for both algorithms. It was done VOLUME 9, 2021 FIGURE 6. The network depicts examples of hybridisation proposed in some of the papers selected in this study. The nodes represent algorithms or techniques, while the links indicate that the nodes connected were used to produce a hybrid algorithm. Also, the nodes' size means the number of times that a node was a component of a hybrid algorithm.
to allow the algorithm to explore the search space in an alternative fashion.
Sarhani et al. [198] proposed BMPSOGSA and BPSOGSA, two approaches that combine BPSO and BGSA to solve the feature selection problem. BMPSOGSA differs from the other proposal because the authors have included a mutation operator to enhance population diversity. BMPSOGSA proposal has reached better results than other metaheuristics and other well-known methods for feature selection.
Too et al. [199] have proposed a new hybrid method called Binary Particle Swarm Optimization Differential Evolution (BPSODE) to solve feature selection problem in EMG signals classification. In the proposed approach, BPSO and BDE are computed in sequence. Hence, no extra computational cost is required. The proposal showed as a powerful feature selection tool overtaking other algorithms in the metrics used.
Al-Tashi et al. [200] proposed the BGWOPSO, a hybrid algorithm for feature selection using bGWO1 and PSO. Al-Tashi et al. also proposed the BMOGWO-S (Binary Multi-Objective Grey Wolf Optimizer) based in sigmoid transfer function [201] as proposed in (56) for feature selection.
Shunmugapriya et al. [18] developed the ACABC algorithm, a hybrid between the ACO and ABC to deal with binary problems. They applied it to optimise benchmark functions.
Rajamohana et al. [202] introduce the hybrid IBPSO with Cuckoo Search Optimization (CSO) in spam detection. Galvan et al. [203] and Ko et al. [83] apply a hybrid between BPSO and Differential Evolution to solve feature selection problems.
The paper from Ruiz-Rodrigues et al. [204] uses the BPSO hybridised with the Jumping Frog Optimization to solve a voltage regulation problem.
Some versions of Artificial Immune System were utilised to perform hybrid methods. Sayed et al. [128] create the Binary Clonal Flower Pollination Algorithm mixing the BFPA and Clonal Selection Algorithm to solve an instance of the unit commitment problem. Pu et al. [105] and Zhai et al. [205], on the other hand, developed the BPSO hybridised with some versions of the Artificial Immune System.
Remarkably, most of the papers suggest the hybridisation between the most known evolutionary model, the Genetic Algorithm (GA). The BACO algorithm is hybridised with the GA to create the modified BACO (MBACO) by Wan et al. [120]. A similar proposal is developed by Wang et al. [60], the modified coded ACO algorithm combined with GA (MBACO). Both papers solved feature selection problems.
The works of Fathy et al. [206] and from Mirjalili et al. [30] present hybrid versions of the BPSO and GSA. The last introduced the BPSOGSA algorithm. The paper of Zeng et al. [207] introduces the Mixed-Binary Evolutionary PSO (MB-EPSO), using BPSO and GA to solve a scheduling problem.
In the paper from Ozturk et al. [110] the authors proposed the BABC Based on Genetic Operators (GB-ABC), utilising the idea of crossover from the GA. The same research group in [3] introduces the Improved DisABC (IDisABC) using the same approach. In both papers, the problems addressed are benchmark datasets. In the same way, Suresh et al. [103] proposed the Hybrid Improved BPSO(IBPSO) and solved the generation maintenance scheduling problem. Besides, the BPSO with Crossover (BPSOC) was discussed by Singh et al. [4].
The mutation in GA is the main inspiration to create two distinct versions of the Modified BPSO, one from Lee et al. [208] and other from Luh et al. [209]. While in the first they solve a benchmark problem, in the second the proponents apply the new model in continuum structural topology optimisation.
Wei et al. [210] present the hybrid BPSO (HBPSO), which uses the GA. The application is related to find the free parameters of an SVM to classify Crohn's disease and Lung cancer. The same idea is used by Jin et al. [211] (hybrid BPSOGA), and Zhou et al. [212] (Differential BPSO-GA), which were applied in the 0/1 Multidimensional Knapsack Problem. Zouache et al. introduced the Quantum-Inspired Firefly Algorithm with Particle Swarm Optimization (QIFAPSO) to solve the same task.
In the same way, Pashei et al. created the hybrid Binary Black Hole Algorithm and Modified BPSO to solve some instances of the uncapacitated facility location problem [213]. They have shown that applying BBHA as a local optimiser for BPSO (4-2) it was possible to increase the local search capability, effectiveness significantly, and reliability of BPSO (4-2) solving gene selection problem.

VIII. MULTI-OBJECTIVE APPROACHES
Multi-objective approaches are more complex than mono-objective ones because their optimisation consider 2 or 3 fitness functions simultaneously. There are several operators to qualify multiple fitness functions and to adapt the mono-objective methods (operators for exploration, exploitation, convergence and diversity aspects). In mono-objective problems, one solution is either better or worse than another solution. However, in multi-objective problems, we can have indifferent solutions called non-dominated solutions that are the ones that are better than all other (dominated) solutions, but within their group, there is not any solution that is ultimately better than the other. For example, solutions i and j are better than all other possible solutions. A solution i is better than a solution j in the objective a, but the opposite happens to the objective b. In this way, we cannot say that the solution i is better than solution j neither solution j is better than solution i, so they are non-dominated solutions. Nondominated solutions are a set of solutions that are better than all the other solutions found by the execution of an algorithm. However, no solution in the set is better than the others.
Some multi-objective algorithms stores the non-dominated solutions in the called External Archive (EA) [140], [214], [215]. As the solutions are represented by the values of each dimension that outputs the best fitness values for 2 or 3 functions, it means that the required memory storage is higher as you increase the number of dimensions of the problem. In this way, the usage of External Archive (EA) requires high memory space, but it simplifies the mechanisms of optimization because they do not need to save any current information of the search space. The majority of multi-objective algorithms uses external archive, because the computational cost to allow the swarm to maintain good solutions in the execution time is usually higher than the cost to use an EA [216]. The EA also requires to be updated and sorted as new non-dominated solutions are found in the search space. Crowding distance is one method used to rank and limit solutions in the external archive [215], [217].
The vast majority of multi-objective algorithms are applied for feature selection regardless of the type of data or the inspiration [168], [201], [215], [218]- [221]. The applications are varied such as biological or healthy-related problems [92], [222], [223], antenna design [224], electric power problems [225]- [228] and computer sciences [192], [229], [230]. As PSO is one of the most applied algorithms in the literature, in multi-objective problems, it could not be different.  [233] to optimise benchmark functions. The proposal presented better results than the BPSO and novel BPSO (NBPSO). 6) The improved binary particle swarm optimisation (IBPSO) is proposed to solve the multi-objective operation mode optimisation of medium voltage distribution [234]. By introducing nonlinear dynamic adjustment learning factors and inertia weight, the convergence and optimisation of the proposed algorithm are improved compared to BPSO. Some algorithms were not only inspired on the PSO, but also in another metaheuristic. The Multiobjective Hybrid Real-Binary (MOHPSO) algorithm was inspired on the MOEA and PSO operators [235], and MOHPSO outperformed versions of GA, BPSO, and PSO. Xu et al. [219] proposed the crowding, mutation and dominance binary PSO for feature selection (CMDPSOFS), CMDPSOFS succeed a variant of the NSGA-II [217]. A New Modified BPSO (mBPSO) was presented by Fan et al. [29] to solve a multi-objective resource allocation problem (MORAP), it is better than versions of GA and ACO.
Another example is the Hybrid Improved Binary Quantum Particle Swarm Optimization (HI-BQPSO) [221] is a multi-objective algorithm applied for feature selection that reduces the number of selected features while maximising the classification performance. The results show that HI-BQPSO compared to ABC, SA, GA, and BQPSO has good overall performance, strong search capability, and was able to maintain high efficiency with a range of different classifiers. The Co-Operation of Biology Related Algorithms (COBRA-bm) [236] developed by Akhmedova et al. is also a hybrid binary method which involves the PSO, Wolf Pack Search, FA, Cuckoo Search Algorithm and Bat Algorithm. They showed that COBRA-bm could achieve better performances in comparison to each one separately, in benchmark problems.
Other inspirations are also present for binary multiobjective optimisation as follows: Fish: Macedo et al. [215], [220] proposed several versions of the Multi-Objective Binary Fish School Search. Their goal was to reach the most effective version regarding computational cost and performance. The most effective versions are MOBFSS-1-LS and MOBFSS-3-LS.
In both versions, they replace the original mechanism of turbulence on the external archive for the local search method inspired by the BMOPSOCDRLS [218]. The MOBFSS-1-LS version shows the best results considering hypervolume, spacing and maximum spread, but the MOBFSS-3-LS shows the smallest computational cost.
In their proposal, they show that having a reasonable control on the collective movements turns the individual movement dispensable. Wolf: Binary multi-objective grey wolf optimiser based-on sigmoid binary transfer function (BMOGW-S) [201] is a multi-objective algorithm inspired on the GWO. BMOGW-S uses an external archive, and three leaders from the archive perform the hunting mechanism. BMOGW-S is applied to feature selection using 15 datasets from the UCI repository. The results showed that the proposed BMOGWO-S could effectively determine a set of non-dominated solutions. The proposed method outperforms the existing multi-objective approaches in most cases in terms of features reduction as well as the classification error rate while benefiting from a lower computational cost. The authors also show that handling of feature selection as a problem with more than objective, as BMOGWO-S can explore the space more efficiently to attain a set of non-dominated solutions better than treating the problem as a single solution. The results demonstrated that BMOGWO-S could attain better non-dominated solution in most cases. BMOGWO-S outperformed the benchmarking algorithms on both feature reduction and classification accuracy. Firefly: Zouache et al. [237] [125] were previous described in their respective sections. We notice that the literature in multi-objective algorithms is focused on versions of PSO. This is again the case because of the popularity and simplicity of PSO and the lack of computing power in the past. We see that the multi-objective and many-objective algorithms have been more present in the literature in the past years but still concentrated in a small number of research groups because of the need for high computational power and memory. Besides, the complexity of these two classes of problems is much higher than for mono-objective algorithms.

IX. DISCUSSION AND DIRECTIONS
This section presents our considerations on the binary swarm-based algorithms extracted from the 403 selected papers. The Genetic Algorithm was created using binary variables because of the computational power available at that time. However, the development of computer components and their cost/price reduction contributed to creating nature-inspired methods on continuous problems. This change originated several continuous new techniques, but it might also cause a decrease in the creation or improvement of binary techniques. Recently, binary optimisation has expanded because of the absence of robust solutions across many real problems.
Even though the Genetic Algorithm has been successfully solving several binary problems for years, after some of our works with Swarm Intelligence, we learned that SI might have more capability of solving harder binary problems because of the smooth convergence operators from SI. In GA, two operators are mainly used: crossover and mutation. In the crossover operator, the chance of creating similar solutions are high because the mixing of already known individuals does not tend to create a highly diverse set of individuals for binary optimisation. A premature convergence can became quick by the high presence of similar individuals. Problems with high dimensionality are less likely to be susceptible to a high presence of similar individuals. For the case of mutation, flipping random features (0 to 1 or 1 to 0) can prevent the swarm to come back to this state over the iterations. For instance, some features can only increase the fitness value combined with other features, so random flips can drastically impact the fitness value. The lack of impact in the context can also show the irrelevance of a feature, but the strong negative or positive impact is uncertain.
Adapting continuous algorithms to binary problems can require more resources and rules than is needed. Binary search space has fewer combinatorial options and is more sensible than a continuous search space. Each feature for a binary problem should be whether 0 or 1, but, in a continuous problem, it exhibits many more options (e.g. 0.0001 to 1, 0.0001 to 0.01, or 0.0001 to 0.0002). Consequently, some variations in a continuous search space can represent no change on a binary search space, not allowing the agents to explore new solutions in a binary space. In addition, when a feature is somehow dependent on another one, small changes in the continuous search space can have less impact on the fitness value than for the case of binary search space because it is easier for the agent to return to previous regions.
Exemplifying, the fish, ant or bee is going to be moving around by little steps, and the change for a binary mapped vector is going to be rare. The inspired agent will need to move several times to influence the binary vector, which can be prevented by using Binary-Binary approaches. A binary problem demands a peculiar balance of convergence and diversity, which avoids similar solutions and premature convergence. Even though it appears that continuous optimisation is smooth, in fact, the projection of the optimisation for the binary space will not follow the same pattern. Thus, we need to treat binary problems using proper mechanisms that provide optimal convergence, small computational cost, and high accuracy.
We perceive that new binary operators, which are promising in swarm-based techniques, are being developed in the literature. These operators are ideal for binary optimisation also because it requires fewer parameter and mapping functions. In fact, a trend of parameterless techniques is the future. The adaptive versions of swarm-based algorithms are being published, and they will be soon replaced as more effective and fast algorithms are proposed. Unfortunately, for binary optimisation, we do not see as much this trend than on continuous problems, so here we encourage researches to work on this gap.
We observed that the number of relevant papers tends to increase, since 2006. It is noticeable an increase until 2013 and then a small reduction until 2017. This observation may be explained by the growth of Deep Learning and other classification techniques which attract everybody from Machine Learning to focus on them. The massive appearance of the Binary Particle Swarm Optimization (BPSO) algorithm is not a surprise, since, in the continuous case, this is the most prominent algorithm. It can be explained to its simplicity in implementation allied to good results achieved in many real problems. The importance of the BPSO is so high that almost all the papers which address another proposal as the main algorithm, use some PSO-based to perform comparative analysis.
We notice that a large number of papers, around half part, proposed new versions of existing algorithms. The researchers are still looking for the best swarm-binary algorithm version. Also, it is possible to say that the field is in constant evolution and, maybe, a definitive method can be developed soon. However, we cannot identify and point out the best binary version from all swarm-based algorithm because the area are still not organised regarding the comparisons, advantages and disadvantages. We notice that several new versions are only compared by one or two techniques along with one or two versions, which do not help us electing promising versions. Another aspect is that probably some of new versions or techniques might display similar behaviours after a better look at the rules process, but the literature lack from depth analysis on the social behaviour from the swarm. Some works argue that evaluating the social interaction of simple reactive agents. We can understand the peculiarities of the swarm-based algorithm and probably understand the differences between different versions, rules, and operators [238]- [243]. Besides, the extensive use of benchmark functions to evaluate the search capability of the methods does not help the field to determine the best models to deal with real-world problems.
We indicate in Table 3 some promising leading proposals that we believe have higher chances of being successful, being novel or having mechanisms that will be used in the future. The application of transfer functions will continue to appear in the literature not because of the computational cost or its efficiency, but because of the flexibility of applying any algorithm to a binary (or discrete) problem. The most used transfer function to transform a continuous vector into a binary string is the sigmoid function. The fact that the first binary version of the PSO uses this tool, as presented by Kennedy and Eberhart in 1997 [10] may be an explanation why the sigmoid function is widely adopted. It is possible to observe that some of the concepts addressed in the pioneer versions of the BPSO were utilised in many other swarmbased algorithms. Also, the use of the V-Shaped functions, especially the hyperbolic tangent, is very often [96]. In both cases, the main idea is to make the agent more similar to some best solution, as in the continuous case.
We highlight that each specific swarm-based technique or problem can display better results based on the different transfer function. However, the V-shaped seems to be more efficient than S-Shaped in general. The S-Shaped seems to be more effective at the beginning of the iterations, but not necessarily effective in the middle to the end of iterations. Moreover, adaptations of V-Shaped seem to be the most efficient choice or even a trend in the literature. For instance, NBBA shows good performance using multi-V-shaped transfer function [64].
Interestingly, NMBPSO [87] shows a hybrid transfer function of S and V-shaped that seems really effective because it provides local (V-shaped) and global (S-shaped) search over the iterations. We argue that new algorithms in the future might use hybrid transfer functions to better balance the exploration and exploitation for other swarm-based algorithms. Hybrid or adaptive mechanisms will definitely be more present in the literature in the next years because these mechanisms can provide a continuous search through better solutions that are generally better for most complex, high dimensional and multi-modal problems.
Looking at other binary operators, logic gates seem to be the most efficient ones in computational cost and performance. The simplicity of logic gates avoids unnecessary calculations, and provide efficient algorithms such as SBCSO [102], binABC [97] and BPSO [103], [104]. However, even though logic gates are very efficient in position displacement, logic gates are not the most efficient method to provide a high rate of diverse individuals. In this matter, we believe that methods inspired in similarity metrics are a great solution to provide diversity in the swarm such as the use of Jaccard's coefficient similarity in TSA [42]. The use of entropy is as rare as the use of similarity in swarm-based techniques. Entropy can be also use in the sense of providing diversity of individuals, but it is actually used for a filter and map purpose of minimising its complexity by focusing in reduced dimensional problem.
Depending on the inspiration, genetic operators showed high efficiency MBGWO [77], BABC [100], [110], BinSSA [168] and BPSO [100]. The methods of crossover and mutation have limitations on its efficiency for binary optimisation as they tend to generate similar individuals to the current population. In contrast to logic gates, genetic operators tend to have premature convergence because of the lack of diversity in the swarm. The advantage of Boolean gates compared to genetic operators is that they tend to create similar individuals in the direction of the boolean gate and not the swarm positions. For example, the use of an AND gate change more positions to 0 than to 1 indicating a higher probability of having vectors with more 0s than 1s.
The ant-inspired algorithms show less major changes from the original proposal because it already adapts pretty well to binary optimisation problems. We highlight the MBACO [60] algorithm because it outperformed a diverse set of algorithms as GA, BPSO, BACO, BDE, and Binary-Coded Cuckoo Search (BCS). In contrast, the algorithms inspired by bee, cat and fish show very different techniques based on different operators and strategies. For example, NBABC [99] outperformed in accuracy and computational cost when compared to multiple algorithms: ABCBin [244], BABC [44], BitABC [98], GBABC [3] and XBABC [97], BCSO [245], BFSS [148], BGA [246] and MBPSO [247]. Other algorithms are not very popular in the literature but show good insights, such as the case of wolf-inspired algorithms that the QI-BGWO [79] appears as an efficient version to balance exploration and exploitation.
Hybrid and adaptive algorithms might be the next bets for Swarm Intelligence, as the field seems to be trying to create one single algorithm that solves perfectly any problem. Therefore, the only solution to have an algorithm good enough for several problems is making the swarm-based technique the most flexible and adaptive possible. The balance of exploration and exploitation are key for reaching optimal solutions.
Even though we constantly acknowledge that there is no free lunch, mixed algorithms comes with the hope that swarm-based algorithms will be as effective and flexible as mathematics and nature allows it. The three algorithms, HBBEPSO [197], HI-BQPSO [221] and ABPSO [248] seems to be interesting options to start studying this new branch of Swarm Intelligence. HBBEPSO [197] is applied for 20 feature selection problems, compared to six algorithms and assessed by six metrics. HI-BQPSO [221] divides the complexity of the problem into two steps (coarse and fine-grained), considers principles of cross-variation and learning, and compares its performance to nine gene expression datasets and 36 UCI datasets. ABPSO [248] highlights strategies to balance exploration and exploitation, analyses its time complexity and compares its performance to 150 benchmark instances. Taken together, the HBBEPSO [197], HI-BQPSO [221] and ABPSO [248], we are able to have a summary of advantages, drawbacks, data, metrics and methods for Particle Swarm Optimization. We argue here that even though PSO is the most popular and straightforward algorithm in the literature, it does not mean that PSO is the best choice to be transformed in the best adaptive algorithm. We still need to consider more other options in order to get in the consensus that one of the techniques is better for adaptive mechanisms.
Finally, for the case of multi-objective algorithms, the literature is still scarce in proposals but provides really promising techniques. In summary, the multi and many-objective approach evaluates more than two objectives at the same time. This balance of objectives powerfully enhances the computational cost of the techniques. However, most of the relevant problems recently have been described by more than one fitness function, and it was already proven by several papers that mono-objective approach is not as efficient in cases even when combining several monoobjective results. Consequently, the multi and many-objective areas should grow in order to help more and more our society.

TABLE 3. Current leading proposals based on the operators, transfer functions and inspirations.
We also present the number of papers in which the popular options were proposed or applied in the literature. We observe that PSO continues to be popular and effective in the literature, but other swarm-based algorithms can be highlighted as effective while using logic gates, genetic operators and similarity metrics. Table 3 leading multi-objective proposals inspired by bird, bee, ant, wolf and fish. BMOGWO-S [201] extensively compare its performance to the BMOPSO [229] and NSGA-II [217] for 15 Feature Selection datasets. BMOGWO-S improves the computational cost, the diversity of the leaders that guide the optimization, and the storage of solutions for next iterations. Because of its low memory space and straightforward mechanisms, BMOGWO-S shows a high performance for large datasets with high-dimensionality. There are other multi-objective algorithms that it would be useful to compare, such as BMOPSOCDR-LS [218] and MOBFSS [215]. All three swarm-based techniques use the mechanism of archive that it is widely used in the literature.

We identify in
In summary, the leading proposals of the swarm-based methods -PSO, ACO, and ABC -are present in most papers as expected. However, it is remarkable that we found a total of 43 other approaches using different insights from nature. In the same way, it was unexpected that algorithms inspired by the cats and bats were so popular in the literature. The future opens a diverse set of inspirations to propose new efficient binary operators.
The idea to propose a new classification -Binary-Binary, Continuous-Binary, and Continuous-Continuous -in Section I arose when we observed that the binary algorithm follow some predetermined central concepts during their development. We notice that these ideas are not systematised and, sometimes, these various paradigms could hinder the understanding. Also, this unprecedented concept may help the researchers to increase the search power of the algorithms.
Looking at the Swarm Intelligence point of view, it seems that the area will be developing adaptive versions to prevent premature convergence by balancing exploration and exploitation. The big challenge of transforming the swarm-based approaches to parameterless is that it strongly impacts the balance between exploitation and exploration, the diversity of solutions and the convergence of the swarm. Moreover, differences on the landscape of fitness functions are also adapted by the use of different set of parameter values.
The Binary-Binary optimisation might not be the only approach, but it is the one that we believe it is the cheapest and most robust way of working with the binary optimisation VOLUME 9, 2021 for Swarm Intelligence. We also believe that the future relies on the necessity of applying multi and many-objective approaches for real problems in which will be fundamental to understand both the swarm techniques and the type of problem because of the rise of the complexity of multiple objectives.
Our work brings awareness and understanding of many applications, algorithms and methods for Binary Optimisation from Swarm Intelligence. We understand that some problems might be better solved by other kinds of algorithms such as binary search trees, coevolutionary algorithms or binary neural networks. However, we argue that Swarm Intelligence brings robustness, flexibility, scalability, modularity, parallelism, and decentralization, which is a challenge to compete, especially in high dimensional and dynamic problems. Swarm Intelligence will be broadly used in the future as the literature converge to less volume of versions, fast and robust mechanisms, and mostly more understanding and organization in the field.
Some limitations and biases can be reflected in our analyses. Our analyses might not represent well papers that are not well-cited in the literature, not in English, and were published before and after our data collection. Moreover, some swarm-based inspirations are more popular than others. For instance, bird-inspired algorithms have a much larger volume of papers than those inspired by fireflies, fish, or wolves. In this way, our analyses follow this unbalanced distribution of papers across inspirations.

X. CONCLUSION
This paper presents an investigation of the most prominent swarm-based algorithms to deal with binary optimisation. This study carried out 403 papers from four important scientific databases: IEEE Xplore, ACM, Science Direct, and Springer, and some of them were also ranked by Google Scholar.
We propose a new way to categorise binary swarmbased algorithms: Binary-Binary, Binary-Continuous and Continuous-Continuous approaches. The difference between them lay in the intermediate steps to improve the solutions during the iterative process. Moreover, they differ generally in computational cost as they apply similar time-consuming mechanisms.
The final solution presented by the models has to be a binary vector, but, based on the previous continuous versions of the algorithms, the displacement (as the velocity in the BPSO) can be binary or continuous. The most used transfer functions to map continuous variables into binary vectors are the sigmoid function (S-Shaped) and two proposals of V-Shaped functions, which use the hyperbolic tangent and expression based on the arctangent. However, new adaptive and hybrid functions were proposed in the last years that balance better the exploration-exploitation necessary for effective optimisation.
It is important to remark that the binary swarm-algorithms found always come from previous continuous versions. Around 7.5% of the papers address multiobjective methods, while another 10% presents hybrid proposals. Most of the hybrid algorithms address a swarm-based method together with the Genetic Algorithm. We consider the multi and many-objective methods the future of not only swarm intelligence but also for the general optimisation area because of the increasing necessity of applying more than one fitness function in real problems that usually requires at least the decrease of computational cost and the growth of any kind of profit (for example: money or accuracy of diseases).
We highlight that the use of binary versions inspired by the PSO algorithm continues to be the most popular in the literature. While the second most used proposal is the BACO, with 55 papers found, the BPSO appears in 263 papers. It is about 65% of the total amount of articles selected. The interest in this method may be related to the fact that the BPSO was the first swarm algorithm to be proposed in the literatur, and also because of the simplicity in its implementation. Also, the popularity of the continuous PSO version can influence.
We encourage the field to focus on the Binary-Binary approach because it targets the problem using less complexity and more efficiency. Binary problems can be sensitive to small changes. Flipping some features can change the context entirely, and this may harm the convergence. Moreover, binary problems usually suffer from the issue of creating similar solutions which stagnate the system rapidly. Consequently, what seems to be an easy problem of flipping or not features becomes a hard problem to balance the convergence and diversity cautiously. We believe that binary problems should be solved slowly and using the randomness of operators carefully. Moreover, using continuous search space adds unnecessary work and time to the system because moving a little bit on the search space will not affect the solutions and fitness in the binary space, making a fake slow convergence and diversity.
We highlight that at least half of the papers are composed of new proposals or improvements in existing algorithms,increasing the applications of benchmark problems. In the same way, almost half of the papers were published in conference annals. We hope that this paper encourages researchers to work more in Swarm Intelligence in many ways: comparing existing techniques, understanding the nuances of operators, proposing new algorithms, and applying them to complex problems.

APPENDIX A CONTINUOUS-CONTINUOUS ALGORITHMS A. BEE-INSPIRED
The Binary Artificial Bee Colony (BABC) versions were inspired by the behaviour of a honey bee swarm, being proposed by Karaboga in 2005 [3], [5], [97], [112], [251]. The bees are classified into three categories, each one playing a specific role during the process of finding a food source: employed, onlooker and scout bees [44]. The employed bees are in charge of bringing nectar from a known food source. Then, the employed bees share the information about the quality of this source (amount of nectar) to the onlooker bees, using a process called waggle dance. In nature, the duration of dance is proportional to the quality of a source [5], [12], [112].
The onlooker bees have to choose one of the food sources to explore. Their probability of selecting a source is proportional to its amount of nectar. Better food sources tend to attract more bees [5], [110]. The scout bees seek for unknown food sources in the vicinity of the hive, flying randomly. Whenever a source is exhausted (nectar is over), and employed becomes a scout. In the process of search, employed and onlooker bees perform exploitation (local search) whereas the scout bees perform exploration (global search) [3], [110].
Based on this metaphor, the food sources are the candidate solutions to the problem, and their nectar amount corresponds to their fitness. The solutions are not codified on the agent (bees) but in the environment (food source). The metaphor of different types of bee is used to select which food sources are more or less explored, as well as to define unexplored candidate solutions [18], [98].
The proposition of the BABC from He et al. [44] is a Continuous-Continuous method. This algorithm is initiated generating the food sources x i with real numbers according to (7): where i = 1, . . . , N is the number of food sources, d = 1, . . . , D is the dimension of the problem (the number of features), x cont d max and x cont d min are the lower and upper bounds of the d-th parameter defined by the user. At each iteration, new food sources are generated in the vicinity of the previous using (8): being j a random selected food source and j = i. This idea is the same as the original ABC algorithm to deal with real problems. However, the fitness in the binary space must be calculated using binary strings. In this case, the authors suggest the use of the operation described in (9) to map the solution: After that, the evaluation of the binary solution fit(x i ) can be performed. Note that (9) can also be used to convert v cont i into v i . The solutions and the process to change their positions occurs in the real space. The conversion into a binary string happens to calculate the fitness.
After the stage of employed bee, the onlooker bees choose a food source to explore based on the information found by the employed bees. The probability of selecting a food source is proportional to its fitness (amount of nectar), which is calculated using (10): For a food source i, each onlooker draw a number r = rand(0, 1) and if r < p i , it explores the corresponding source x i , as in a roulette wheel scheme. Then, a new source v i is generated in the same way as in the employed bee. In the end, a greedy selection is performed.
The last stage of the BABC is the scout bee's phase. It just occurs if some food source is exhausted. An employed bee exceeds the maximum number of trials (trial) to improve the fitness of a source searching on its vicinity. In this case, a new food source is randomly generated at the beginning of the algorithm and the variable limit is set as zero. Algorithm 1 summarises the general steps to implement the BABC algorithm.

B. FIREFLY-INSPIRED
The Binary Firefly Algorithm (BFA), as the name suggests, is inspired by the social behaviour of fireflies [252]. We found two different implementation approaches, both using the continuous-continuous idea. There are around two million of species of fireflies, and the most part produces a short and rhythmic flashing light. These flashes are generated through a process of bioluminescence and exhibit two specific goals: a) to attract other fireflies and b) to attract a potential prey [127]. Based on this natural behaviour, the Firefly Algorithm was developed according to three idealised rules [6]: i) fireflies are unisex; ii) the degree of attractiveness of a firefly is proportional to its brightness. Hence, considering any two flashing fireflies, the less bright moves towards, the brighter. If two fireflies have the same brightness they move randomly; iii) the brightness of a firefly is proportional to its fitness. To implement the BFA, each simple agent is a firefly The variation of the attractiveness is a vector β where the dimensions are proportional to the distance between the fireflies i and j. Each dimension is calculated separately as shown in (11): where β 0 ∈ [0, 1], γ ∈ [0, 10] is the absorption coefficient, n ≥ 1 is a user defined coefficient and the vector r ij has all positions equals to ''−1,'' ''0'' or ''1'' as presented in (1). Then, the position is updated by (12): in which α is a parameter defined by the user a priori, x t j is some brighter firefly (higher fitness), (x j,d −x i,d ) is equivalent to r ij,d and follow the rule defined in (1) too. Note that, in (12), the second component is related to attraction while the third one is a random step.
To determine the value of each dimension, the sigmoid function can be applied as in (2), creating S(x t+1 i,d ). Therefore, the determination of the binary position is given by (3).
Other possibility described in [6] is the application of the hyperbolic tangent function instead of the sigmoid, using (4) and having |tanh(x i,d )|. Then, (3) is changed replacing S(x i,d ) by |tanh(x i,d )| in the inequality.
Liu et al. [45] introduces two important modifications in BFA. They changed (1) by calculating the distance between two fireflies using their hamming distance -the total number of different bits of x i,d and x j,d -, which is a scalar, as in (13): where ⊗ is the logical function XOR which sets the output as ''0'' if the bits in the dimension d of both vectors are the same and ''1'' if they are different. Hence, β becomes a scalar too, being unique for all dimensions in (12). Another difference between the proposals is that in (12) they replace rand(0, 1) − 1 2 by a random generated vector ε. Finally, the steps of the Binary Firefly can be summarised in Algorithm 2.
Algorithm 2 BFA Pseudocode 1: Initialise randomly the parameters: number of fireflies N , the initial position of the fireflies as binary vectors, γ , β 0 , α and the stop criterion; 2: while stop criterion is not reached do 3: for i = 1 to N do 4: for j = i + 1 to N do 5: if fit(x j ) < fit(x i ) then 6: Calculate the attractiveness β 7: Update the position according to (12) 8: Evaluate the fitness of each firefly 9: Transform the position into a binary vector 10: end if 11: end for 12: Ranking the fireflies according to their fitness and find the best solution; 13: end for 14: end while 15: Output the best solution

C. FLOWER POLLINATION-INSPIRED
The Flower Pollination Algorithm (FPA) is a nature-inspired population-based algorithm which, depending on the definition, might not fit on the swarm group. The inspiration of the algorithm is different from the most swarm methods because it does not come from the behaviour of a collective of groups of animals [253]. However, as in the GSA case, the communication between the agents are cooperative, instead of generational or competitive.
The Binary Flower Pollination Algorithm (BFPA) was inspired in the natural pollination process of flowering plants and is the binary counterpart of the FPA introduced by Yang [254]. This algorithm is a case of the Continuous-Continuous method since the solution is first calculated as a real vector and then mapped into a binary string. At the same time, the process of changing the position is performed using real numbers.
The main topics addressed are the optimal reproduction and the survival of the best-adapted plant. In this way, the algorithm follows four basic rules [46], [128]: i) Biotic cross-pollination is considered as global pollination. Also, pollen-carrying pollinators move to obey Lévy flights; ii) Self or abiotic-pollination are viewed as local pollination; iii) Pollinators, such as insects, can develop flower constancy. It means that the reproduction probability is proportional to the similarity of the two flowers involved; iv) The interaction or switching of local and global pollinations are controlled by the switching probability p ∈ [0, 1]. Due to the physical proximity and other factors, like the wind, local pollination has a more significant fraction p. The rules i and iii are related to the global pollination. In this case, the pollens from the flowers are carried by pollinators (insects, wind, etc.) allowing the pollens to travel long distances. This process is described by (14): where (15) in which s is the step size and s s 0 > 0, x t i is the pollen i at iteration t, gbest is the best position achieved so far, α is the variable which controls s, L(λ) is the Lévy flight step size (strength of the pollination), (λ) stands for the gamma function, being λ ∈ [1,2].
In (15), observe that (λ) distribution is valid for large steps s > 0 and s 0 is suggested be 0.1. Indeed, some papers omits α, as in [129]. Nevertheless, the local pollination (rule ii) is defined by (16): being x t j and x t i the pollen of different flowers of the same species. Finally, rule iv is addressed to mimic the local and the global pollination, the switching probability p.
Following the basic rule proposed by the first BPSO, Rodrigues et al. [46] suggests again the application of the sigmoid function to transform the pollen in a binary vector, using (2) to generate S(x t i,d ), and the new position is given by (3). Algorithm 3 present the complete steps to implement BFPA.
The work from Dahi et al. [41] discusses the application of five techniques to map the continuous solution into a binary string: nearest-integer, normalisation, angle modulation, search process and the traditional sigmoid function. They conclude that the V-Shaped proposals can achieve better results.

Algorithm 3 BFPA Pseudocode
Initialise the N flower/pollen gametes randomly Evaluate the population and define the best initial solution gbest Determine p ∈ [0, 1] as the switch probability while stop criterion is not reached do for each pollen gamete do if rand < p then // Global pollination Draw a (D-dimensional) step vector L which obeys a Lévy flight distribution, as in (15) Undoubtedly, the Particle Swarm Optimization (PSO) is the most known and used swarm-based algorithm in the literature, proposed in 1995 by Kennedy and Eberhart [14]. The biological metaphor that inspired the algorithm was the collective intelligence of flocks of birds or school of fishes, simulating their social behaviour.
The BPSO is characterised by simple rules of information sharing between individual agents. The agent is called a particle, and a population of particles is named as a swarm. Each of them is assumed to be a location in the multi-dimensional search space or a candidate solution for the addressed problem. As usual, a particle is associated with a performance measure, the fitness.
Another critical remark is that the particle's position is changed based on its best position achieved so far (selfexperience) and the best position found by some particle in its neighbourhood (the group experience). Sometimes, the entire swarm can be defined as the neighbourhood. Hence, an emergent complex global behaviour arises. The general steps of the BPSO are described in Algorithm 4.
In our search, we found just a few papers which address the Continuous-Continuous paradigm. In this case, the velocity and the position of the particles are vectors containing real VOLUME 9, 2021 Algorithm 4 BPSO Pseudocode 1: Initialise all particle's position x 0 i randomly with ''0'' and ''1'' as the values 2: Initialise the particle's velocity v 0 i 3: Initialise the particles' best-known position pbest 0 p with their initial position 4: Be fit(x i ) the fitness of particle i: evaluate the fitness of the whole swarm 5: Set gbest 0 as the position of particle which has the best fitness 6: while stop criterion is not reached do 7: for each particle i = 1 to N do 8: for each particle dimension d = 1 to D do 9: the particle's position 10: Update particle's fitness fit(x i ) 11: if fit(x i ) < pbest i then 12: Update the particle's best known position: end if 14: if fit(x i ) < gbest then 15: Update the swarm's best known position: gbest = x i 16: end if 17: end for 18: Update the particle's velocity 19: end for 20: Output the best solution 21: end while numbers. The conversion into binary strings occurs in the particles' position immediately before the fitness evaluation.
Two papers from Yassin et al. use the original PSO with both position and velocity as real vectors [47], [48]. They state that particles' positions x cont i,d are the probabilities to flip a bit in a binary string. Therefore, the binary positions are generated from the continuous position using (17): wherex t i,d means flip the bit in dimension d of x i,d . Babu et al. [49] proposed a similar way to solve the phasor measurement unit (PMU) problems. The normalisation method [41], [50] is another way to use the current paradigm. , respectively. Then, the binary vector is created using (19): Binary Coupled Spring Forced Multiagent Coordination Optimization (BCSFMCO) is an algorithm developed by Zhang and Hui [255]. They mixed a communication topology of PSO and multi-agent consensus protocols from control theory to create this method. The problems addressed as a case study are benchmark tasks and topology design for multi-agent formation control. The comparative analysis includes some PSO versions. The same authors developed the Binary Hybrid Multiagent Swarm Optimization Algorithm (BHMSO) [256], inspired by multi-agent consensus protocols from control theory. Benchmark problems were addressed and solved, being this proposal superior to BPSO versions.

APPENDIX B BINARY-CONTINUOUS ALGORITHMS A. BEE-INSPIRED
There are different ways to represent solutions and displacements for bee-inspired algorithms. The approach from Lu et al. [36] uses an initial position x i in the binary space. However, the new food source v i initially is in the real domain -a Binary-Continuous representation.
The food sources are randomly generated by randbin(0, 1) and v cont i is calculated directly from (8). The resultant vectors v cont i have the elements in the range [−1, 1]. Then, they use the hyperbolic tangent function tanh(v cont i,d ) described in (4) to perform the mapping process. The binary v i is given by (20): After v i be determined, a greedy selection is applied between it and x i , and those with better fitness value remains in the next iteration. A similar strategy was used by Wei et al. [54], but instead of the direct application of (20), they suggest to adopt the function round, or the nearest integer, to generate v i from v cont i . As the employed bees, the onlooker and scout bees also apply the transfer function to their positions.

1) ANT-INSPIRED
The Binary Ant Colony Optimization (BACO) is a binary version of the previous ACO algorithm introduced by Dorigo [55], developed to solve integer problems like scheduling or routing [55], [56]. It is inspired by the behaviour of ant colonies searching for the shortest path to a food source [60]. In nature, ants randomly select a way to reach some food source. On finding it, they release pheromone trails on their way back to the nest as a manner to communicate with other members of the colony [116], [257]. The pheromone evaporates over time, and its concentration decreases with the path length. The ants are more likely to follow the trails that present the highest pheromone concentrations [59], [61].
BACO differs from several swarm-based algorithms because the optimum solution stands in the environment (trails) instead of in the agent [116]. The movements of the ants occur in a graph that the vertices (nodes) consist of bits ''0'' and ''1,'' or the state transition of every bit [58]. Algorithm 5 presents the steps of the BACO. At each iteration, an ant travels all nodes to build a candidate solution. The ant departs from a randomly selected node and travels through the digraph along the arcs. Its trace will generate a binary string with D bits and, by this means, the colony constructs a group of candidate solutions. The ants perform D walks to form a complete solution. Figure 7 present the search space of an ant. Note that changing (or flipping) a bit is also called state transition which means that a state 0 becomes a state 1 or vice versa [57]- [59]. The initial concentration of pheromone is usually the same in all edges. The index of an agent is i. Here, d, 0 or d, 1 are the edges which links the current node d, where the ant is located, to the next node d + 1. Some authors also consider the next node as a variable.
During the moving, each ant decides the next node (bit 0 or bit 1) depending on the amount of pheromone on the path and the visibility from the current node to the next. Therefore, the probability to go to ''0'' is calculated as in (21) and to ''1'' is given by (22) [120]: and being τ d,0 the artificial pheromone trail or the pheromone density of the side which leads to ''0'' and τ d,1 leads to ''1,'' η d,0 and η d,1 are the visibility densities of each edge and α and β are the relative importance of the pheromone and the visibility, respectively. The majority of the selected papers in this work mention a heuristic function, which can be seen as the visibility densities in some problems. However, they disregard these parameters in their applications, as in [60] and [57]. Other works, like [61] and [58], define this probability without mentioning the visibility. In both cases, (21) can be rewritten as (23) [61]: The update of the pheromone in each edge at iteration t is given by (24) and (25), respectively: (25) in which ρ ∈ [0, 1] is the evaporation rate and τ t gbest is the incremental amount of pheromone, calculated via (26) [120]: if the arc from d to (0 or 1) is in the trace 0 otherwise (26) where fit(gbest) is the best fitness value. The literature presents some important variations in the calculation of the parameters of the algorithm. Kuo et al. [61] introduce a convergence factor cf t ∈ [0, 1], calculated via (27). This step is performed after the update in the pheromone trail.
This factor is proportional to the difference between τ d,0 and τ d, 1 . The authors suggest that when the cf is near to 1 after t iterations, the ants are trapped in a local optimum. Therefore, pheromone values are reinitialised, and the algorithm is restarted. Then, the calculation of the pheromone update is given by (28) and (29). (28) and (29) in which w ib s ib + w rb s rb + w gb s gb if the arc from d to (0 or 1) is in the trace 0 otherwise (30) where s ib is the best solution achieved so far, s rb is the best solution achieved during the current iteration and s gb the best solution revealed since the last re-initialisation of the pheromone values. The variables w weights the importance of each component of S t gbest .

2) BAT-INSPIRED
The Bat Algorithm (BA) mimics the collective intelligence from a group of bats [39]. The Binary BA (BBA) is the adaptation of the original continuous BA, introduced by Yang in 2010 [62], to deal with binary variables. In Algorithm 6, we describe all steps to be followed in the BBA.
The key characteristic of bats is the advanced capability of echolocation. The species are divided into two subtypes: megabats and microbats. The natural echolocation search mechanism of the microbats is the main inspiration of the BA algorithm [123]. The echolocate via ultrasonic pulses produces the echo, which is used to define the location, the exact distance and the measurements and qualities [122]. VOLUME 9, 2021 Algorithm 5 BACO Pseudocode 1: Initialise randomly the pheromones trails with small positive values 2: while stop criterion is not reached do 3: for all ant do 4: Compute transition probabilities using (21) and (22) 5: for all dimension do 6: Draw where to go using the probabilities calculated; 7: end for 8: end for 9: Update pheromone trails 10: Evaporate pheromone trails 11: end while 12: Output the best solution (the way with the maximum pheromone levels) Using the echolocation, the bats can differ the food (or a pray) from background barriers. An artificial bat flies randomly and presents a position (x i ), a velocity (v i ), a fixed frequency f min and a loudness A 0 to search preys. The loudness varies from a significant (positive) value A 0 to a minimum constant number A min . As in nature, the artificial bat can automatically tune the frequency of its emitted pulses and adjust the rate of pulse emission per, depending on the proximity of its target (notice that some papers name per as r). The frequency F of a bat i is defined according to (31): where F max and F min are the limits in which the frequency must lies and β = rand(0, 1). The displacement of an agent is defined based on its velocity, described in (32): being gbest the best solution obtained so far, as usual. According to Amine et al. [39], the new position of a bat is defined using the sigmoid function applied to the velocity using (2), generating S(v t i,d ). Then, the position is given by (3). In the same way, Basetti et al. [63] proposed the use of the V-shaped transform function to update the position based on the velocity. They addressed (5) generating V (v t i,d ) and the position is updated according to (6). A local search scheme is evaluated using the loudness A and pulse emission rate per, both calculated using (33) and (34), respectively: (33) where, in general, A 0 i ∈ [1, 2] and α is a constant and being per 0 i ∈ [0, 1] and γ a constant. These two quantities are used to determine if a local search will be performed during some iteration.
In 2018, a novel binary bat algorithm (NBBA) was proposed to solve the 0-1 knapsack problem [64]. Different from Algorithm 6 BBA Pseudocode 1: Initialise randomly the bat population (x 0 ), velocities v i , pulse frequency F, pulse rates per i , and loudness A i 2: Evaluate the bats calculating their fitness 3: Set gbest as the particle with the lowest fitness 4: while stop criterion is not reached do 5: for all bat do 6: Be x i the current position of the bat 7: if rand > per i then 8: Generate a local solution (x new i ) around gbest flipping one of its dimensions 9: else 10: Update velocity using (32) 11: Adjust frequency by means of (31)  [30], NBBA applies rough set scheme (RSS), one-to-one strategy and multi-V-shaped transfer function. Using these three mechanisms, the diversity and convergence outcome the BBA.
In 2020, Zhang [65] published another version of binary bat-inspired algorithm called Binary Cooperative Bat searching Algorithm (BCBA) in which four different transfer functions are tested, and an optimal topology is proposed as a thread-off between effectiveness and convergence.

3) CAT-INSPIRED
The Binary Cat Swarm Optimization (BCSO) is inspired in the ability of domestic cats to hunt and to stay alert to possible dangers [66]. According to Sarafi et al. [28], cats spend most of their time resting when they are awake. In this case, they change their position carefully and slowly or do not move. However, for the rest of the time, the cats are tracing targets [125]. The first proposal that we found was from Chu et al. [258], which is adequate to solve continuous problems.
The BCSO defines two modes of behaviour: seeking mode and tracing mode. In the first, their motions are slow and near the original position. Biologically, it corresponds to the resting state of the cats. In the mode, a cat moves according to its velocities.
The mixture ratio MR is a parameter that defines the percentage of cats that will perform the tracing mode, and the percentage of cats will be in the seeking mode. In each iteration, the cats are randomly selected based on the MR. The complete steps of the BCSO are summarized in Algorithm 7, and we describe both steps of the BCSO separately.

a: SEEKING MODE
The cats' displacement is usually slow and near the current position. A cat could look around and seek the next position. In this mode, there are four factors, which are [66]: i) seeking memory pool (SMP): the size of seeking memory for each cat -a positive integer number SMP ∈ (1, N ). It can be seen as the number of identical copies clones) generated from a given solution; ii) probability muting operation (PMO): parameter which defines the mutation probability for the selected dimensions. In this case, 0% < PMO ≤ 100%; iii) counts of dimension to change (CDC): present how many dimensions will be selected to possible mutation. Note that 1 ≤ CDC < D. Usually, it is presented as a percentage; iv) self-position considering (SPC): a boolean flag which defines whether the current position of a cat will be one of the candidates in the seeking mode to or not. Its state can be true or not true. Then, the seeking mode is performed according to the steps described below [69]: Step 1: Consider the SPC flag. If it is set as true, produce SMP − 1 copies of the current position of each cat and take the current position as one of the candidates. If SPC flag is not true, make SMP copies of the current position of each cat; Step 2: For each copy, select CDC dimensions and mutate (flip) them according to PMO; Step 3: Evaluate all cats, calculating their fitness; Step 4: Calculate the selecting probability of each cat according to (35) [67] and apply a roulette wheel method and replace the current position by the selected one: (35) where fit i is the fitness of the i-th cat, fit b = fit max if we are working on a minimisation problem and fit b = fit min if we are working on a maximisation problem. Some authors [66], [68] propose a greedy selection between the copies, instead of applying the roulette wheel.

b: TRACING MODE
In tracing mode, a cat is metaphorically tracing targets, moving towards the best solution. Therefore, the next movement is determined based on the cat's velocity and the best position found by the rest of the swarm [67]. The notion of velocity in the BCSO is different from the PSO. In this case, the velocity is the probability of a dimension being changed (flipped).
This mode presents the following steps [68]. First, select the best-positioned cat (highest fitness) and name it as gbest.
Define v 0 i and v 1 i as the vectors representing the intermediate velocities of each cat. The first is the probability of the bits changes to ''0,'' and the other is the probability of flipping to ''1.'' Note that they do not complement [69]. The update process for each dimension happens according to (36).
where w is the inertia weight and d 1 i,d and d 0 i,d are temporary values, updated using the gbest as guide, as in (37).
being r 1 = rand(0, 1) and c 1 a constant defined by the user.
According to the position of cat x i , its velocity (probability of change d-th bit of i-th cat) is calculated by (38) v in which, the velocity is bounded by [v min , v max ]. In the last step, update the positions of the swarm. As often happens, it is necessary to apply v i,d on a sigmoid function defined in (2), generating the values S(v i,d ). In the proposal from Sharafi et al. [28], the new value of each dimension is updated as in (39) On the other hand, Mohamadeen et al. [66] suggest another way to perform this using (40).
wherex t i,d means flip the bit in dimension d or the 1's complement of x i,d .

4) GRAVITATIONAL SEARCH ALGORITHM
The Binary Gravitational Search Algorithm (BGSA) was inspired by Newton's Laws of motion and gravity and is based on the metaphor of gravitational interaction between masses [70]. Note that the algorithm is not a classic swarmbased approach, since it is not inspired in the collective behaviour of groups of animals [130]. However, we decide to discuss the method since the mechanisms of collaboration between the agents are cooperative, as in the swarm approaches [37].
The BGSA is a nature-inspired proposal, introduced by Rashedi et al. [51]. BGSA algorithm is presented in Algorithm 8. In this case, the displacement is a real vector, while the position is always a binary string. The agent is a mass Select MR% of the cats to perform the tracing mode and consider that the other cats are in seeking mode 4: Evaluate each cat and save gbest 5: for all cat do 6: if x t i is in seeking mode then 7: if SPC flag is true then 8: Produce as many as SMP − 1 copies of the present position of each cat and take the current position as one of the candidates 9: else 10: Make SMP copies of the present position of each cat 11: end if 12: for all copy do 13: Select as many as CDC dimensions 14: Randomly mutate this CDC dimensions according to PMO 15: end for 16: Evaluate the fitness of all copies 17: Apply roulette wheel method, select one candidate and replace the current position by its 18: end if 19: if x t i is in tracing mode then 20: Calculate the intermediate velocities using (36) and (37) 21: Update cat's velocity utilising (38) 22: Update cat's position as a binary vector 23: end if 24: end for 25: end while 26: Output the best solution (gbest) i due to mass j at iteration t and considering dimension d, is described by 41 [72]: where M i and M j are the masses of agents i and j, respectively (updated according (47) ahead), is a small positive constant, R t ij is the distance between these agents, which follows the subtraction method described in (1) [73], and G t is the gravitational constant, calculated by (42): (42) in which G 0 is the initial gravitational constant, T max is the total number of iterations and α is an exponential decay constant. Observe that G gradually decreases over time. It is possible to define the distance R t ij as two other ways. The first is the Hamming distance, calculated in (43), similar Algorithm 8 BGSA Pseudocode 1: Initialise randomly the N masses/agents 2: Evaluate the fitness and define best and worst using (49) and (48) respectively. 3: while stop criterion is not reached do 4: Update G using (41) 5: for all mass do 6: Evaluate the fitness 7: Calculate the gravitational mass by means of (47) and (48) 8: Calculate the acceleration using (45) and (44) 9: Update velocity utilising 46 10: Update position using some mapping method to have a binary vector 11: end for 12: update best and worst using (49) and (50) respectively. 13: end while 14: Output the best solution to (13) [37]: Another possibility is to divide the result of (43) by D, generating a normalised Hamming distance [71]. The total gravitational force on an individual i in the d-th dimension is the sum of all forces provided by the other masses, which is given by 44 [73]: The acceleration a t i,d of mass i in the dimension d at iteration t is given by (45): (45) and the velocity is updated using (46): The last step updates the gravitational mass, which is done by (47): where fit t (x i ) is the current fitness of x i while the variables best t and worst t are the fitness of the best and worst masses, being selected according (49) and (50): We highlight the similarity between (48) and (35) from the BCSO. The first expression maps the velocity into a binary position vector and applies the sigmoid function described in (2), generating S(v i,d ) [72]. Then, the positions are updated using (3). Another possibility is discussed by Ji et al. [73] and by Nezamabadi [37] that performs the transformation by the use of (4) (creating |tanh(v i,d )|) and (6).

5) WOLF-INSPIRED
The Grey Wolf Optimizer (GWO) was inspired in the leadership hierarchy and hunting mechanism of grey wolves [52]. These animals have a strict social dominant hierarchy that is divided into alpha (α), betas (β), deltas (δ) and omegas (ω). The most dominant individual is the alpha, and the lowest one is the omega. The algorithm mimics the hunting mechanism of wolves that guided by the alpha, the pack recognise the location, encircle the location and attack the prey.
Emary et al. [7] introduced the algorithm in the binary domain proposing the Binary Grey Wolf Optimizer (bGWO). The general BGWO algorithm is shown in the Algorithm 9. They have proposed two different approaches to solve feature selection, bGWO1 and bGWO2. In bGWO1 the central update equation is shown in (51).
where Crossover(x 1 , x 2 , x 3 ) is a crossover between three solutions x 1 , x 2 and x 3 that represent the effect of alpha, beta and delta in the current wolf, respectively. x 1 , x 2 and x 3 effect is calculated using the (52).
where a d , b d , c d are the binary values of d-th dimension for each parameter.

Algorithm 9 BGWO Pseudocode
Initialise the N wolves randomly Find the α, β, δ solutions based on fitness while stop criterion is not reached do for all wolf do Update wolf position end for Update a, A and C Evaluate the current position of individual wolves Update α, β, δ end while Output the best solution In bGWO2 is proposed a different approach where only the position is converted to binary using the (56).
where x t+1 d is the new binary position in dimension d at iteration t, and sigmoid(a) is defined as follows: Jiang et al. [75] have upgraded the convergence parameter ( a) as shown in (58) in order to improve the algorithm. Hu et al. [76] also proposed changes in the a parameter as described in (59) to improve feature selection using BGWO.
where e is Euler number and m is the maximum number of iterations.
Alzubi et al. [77] proposed a modified binary grey wolf optimizer (MBGWO). The proposal used omega information to reduce the impact rate of the best solutions from 0.33 (α, β and δ) to 0.25 (α, β, δ and ω). In this new approach, the new crossover operator is defined as shown in (60).
Luo et al. [78] introduced a new mechanism to highlight the leadership hierarchy by using a differentiated position updating strategy. In this approach, the leader wolves can only move if the new position is better than the previous one and other wolves unconditionally move to their new position.
A quantum-inspired binary grey wolf optimizer (QI-BGWO) was proposed by Srikanth et al. [79]. The usage VOLUME 9, 2021 of quantum concepts has improved the balance between exploration and exploitation in the problem.

6) BIRD-INSPIRED
Regarding the Binary PSO, because of a large number of papers, we found an elevated number of distinct proposals. Formally, consider a population of N particles (swarm). The agent i present its current position x i = (x i,1 , x i,2 , . . . , x i,D ) and a velocity v i = (v i,1 , v i,2 , . . . , v i,D ). In the majority of the versions of BPSO, while the position is a binary string, the velocity is a continuous (real) vector limited in the interval [−v max ; +v max ], where v max is defined by the user. The update of the velocity is performed according to (61): where w is the inertial weight, c 1 and c 2 are the cognitive and social coefficients, respectively, pbest t i,d is the best position found by particle i until iteration t (the individual position that achieved the best performance index during the search process) and gbest t d is the best position found by some neighbour of the same agent i accordingly the communication topology. The algorithm is often initialised by randomly spreading the particles over the search space. The same process is applied to generate the initial velocities, but, in some cases, the velocities are initialised equal to zero. Some papers as in Phuangpornpitak and Tia [89], Unler and Murat [90] and Azad et al. [91] suggest a mechanism to limits the inertia weight in a linear decay way, as in (62): in which w ∈ [w min , w max ] and t max is the maximum number of iterations [92]. The concept of inertia weight was in fact introduced by Shi and Eberhart [74] to balance the local and global search. Liu et al. [31] performed an extensive parameter analysis of the use of the parameter w for BPSO. Large values of inertia tend to display exploitation, and smaller values of inertia tend to display exploration. As exploration is usually recommended at the beginning of the iterations, a linearly increasing inertia weight is most likely to show better results. The opposite is suggested by Shi and Eberhart [74] for continuous optimisation, highlighting that even using the same main mechanisms of PSO, the fact of being binary or continuous optimisation or being different fitness functions can require a different balance of parameters.
Another idea is presented in Pookpunt et al. [93] and Chanthaphavong and Chetty [95]. The proposals suggest that the accelerations coefficients (c 1 , c 2 ) linearly changes over the iterations, like in (63) and (64), respectively: where c 1 changes from c 1i = 2.5 to c 1f = 0.5 and: (64) in which c 2 is modified from c 2i = 0.5 to c 2f = 2.5.
The initial values are suggestions of the authors and can be adapted to each problem [94]. To transform the elements of the velocity into a binary vector, they proposed the utilisation of a logistic sigmoid function, using the current velocity as a parameter in (2) to generate S(v i,d ). Therefore, the update of the position is performed according to (3). On the other hand, Mirjalili et al. [96] and Kumar et al. [38] present a map of the V-shaped function in the velocity |V (v t i,d )|. Hence, the update in the binary position is given by (5) and (6).
Modified versions of the V-shaped transfer function is also present in the literature, such as MBPSO [88]. Jiang et al. proposed a binary version of PSO (NMBPSO) using a hybrid transfer function of S and V-shaped [87] outperforming BPSO because NMBPSO provides local search capability even at later iterations. In the same way, Minzu et al. [86] suggested the application of the |tanh(v t i,d )| from (4). Souza et al. [85] applied an alternative equation using the modulus of a sigmoid function, generating |S alt (v t i,d )|. This value replaces the probability |tanh(v t i,d )| in (6). A modification called modified BPSO is showed in the papers from Menhas et al. [84] and Ko et al. [83]. In this case, the velocity is calculated without the inertia weight and the acceleration coefficients via (65).
and the position is updated according to (66) where a ∈ [0, 1] is static probability fixed as a constant, being the usual set a = 0.5. Note that the velocity is constrained in the interval [0, 1]. The Novel BPSO is presented in the works of de Sousa et al. [82] and Puri and Hsiao [40]. The noticeable fact about this version is that it uses a velocity equation with similarities to the BCSO algorithm. To do so, it is necessary to define four temporary values: d regarding gbest. These values are calculated using (67) and (68) where w is the same inertia weight. Here, the final velocity can be defined as the probability of flipping the d-th bit of i-th particle, which is performed using (70): Usually, the velocity is bounded by v max . Then, the new position is given by (71) [40] x t+1 i,d = in whichx t i,d means flip the bit in dimension d of x t i,d . Puri and Hsiao [40] and Sousa et al. [82] mention that 71 can be replaced by the use of the sigmoid function S(v i,d ) or the V-shaped proposal V (v i,d ), as in (2) and (5), respectively. Then, one can apply (3) or (6), as discussed.
Siqueira et al. [81] introduced the Double Swarm BPSO, which was inspired in the original BPSO and the steps of the BCSO. The authors divide the swarm into two sub-swarms, named swarm of mutation and displacement swarm. A half part of the particles is placed in the displacement swarm using the roulette wheel and the rest stands in the swarm of mutation. In the swarm of mutation, NC clones are created, and NM dimensions are flipped. Again, it is performed the roulette wheel to select the winner, comparing the original agent and the clones. On the other hand, considering the displacement swarm, the particles fly in the search space, following the velocity described in (72). v t+1 i,d = wv t i,d + rand(0, 1)c 1 (2gbest t d − x t i,d − 1) (72) in which c 1 is a user-defined variable.
The Competitive Swarm Optimizer (CSO) is also a kind of PSO in which the particles learn from randomly selected competitors and not from the global or the personal best position. The idea is to perform better in large-scale optimisation problems. A binary approach was introduced by Gu et al. [80]. In this work, the authors dealt with feature selection problems, considering the KNN classifier and benchmark problems.

APPENDIX C BINARY-BINARY ALGORITHMS A. BEE-INSPIRED
Kiran and Gunduz propose the first Binary-Binary ABC in 2013, the binABC [97]. In this case, the authors suggested generating the bees as binary strings, being the bit value according to a probability p. Then, the positions of employed and onlookers are updated following (73).
where ⊗ is the ''XOR'' operator and works as the NOT gate: if it is less than 0.5, the result obtained is inverted.
Another approach is in work from Wei and Hanning [54]. Initially, they set x d min = 0, x d max = 1 and rand(0, 1) is replaced by randbin(0, 1), which means that each draw generates a bit 0 or 1. Similarly, Santana et al. [99] initialises the agents as 0 or 1 depending on rand(0, 1) being smaller or equal/higher to 0.5. Depending on the problem, the initialisation can speed up the results, for instance, if the binary problem is trying to minimise, having the majority of dimensions starting with zeros is generally beneficial.
Then, we can rewrite (7) as (74). We observe that the computational cost of applying (74), the most applied one, is much smaller than applying (7).
Once the food sources are generated, each employed bee moves to its food source and finds a new one in its neighbourhood v i . To do so, Jia et al. [98] have adapted the of the continuous space using logic gates operations generating (75): where ⊗ is the ''XOR'' operator, is the ''AND'' operator, ⊕ is the ''OR'' operator and j, j = i, a randomly selected food source. The variable φ i,d is defined by (76): where the parameter r controls the generating probability of ''0'' and ''1.'' The binary value φ i,d will have more probability to be a binary number ''0'' when r takes a small value. Therefore, the new food source will be more close to the older under the ''XOR'' operation. When using this procedure, we work in a binary space [98]. In Santana et al. [99], the employed and onlookers bees update their positions differently. If the chosen food sources have higher fitness than the bees and the chosen dimensions do not have the same position, the bees will change their dimensions to the value found on the food sources -otherwise, the bees do not move. Santana et al. [99] also show the impact on the algorithm when taking into account different strategies.
Aytimur et al. [100] compare three versions of binary ABC: using the sigmoid function, using crossover technique from Genetic Algorithm and using a Boolean technique based on the XOR operator. Aytimur et al. [100] show that the binary version of BABC using crossover technique from Genetic Algorithm performed better than the others.

A
Binary-Binary approach is introduced by Siqueira et al. [101], named Boolean BCSO, which presents some important differences in comparison with the BCSO. Firstly, in the seeking mode, the SPC flag is suppressed. VOLUME 9, 2021 Then, in the tracing mode, the new velocity v t+1 i,d is defined according to (77).
Finally, the last difference is the allocation of the MR cats in the tracing mode using the roulette wheel method.
In 2020, Siqueira et al. [102] proposed a version of the BCSO called Simplified Binary Cat Swarm Optimization (SBCSO). SBCSO proposes a new velocity and position strategy that overcome binary versions of CSO, FSS, ABC, GA and PSO not only in the accuracy but also in the computational cost.

2) BIRD-INSPIRED
Suresh et al. [103] present an improved version of the BPSO to solve maintenance schedule problem, being this a Binary-Binary algorithm. The same proposal is presented by Sedighizadeh et al. [104].
In this case, each particle is a binary string, and its position and velocity are updated using binary digital operation (Boolean gates), according to (79) and (80), respectively: where d is the current dimension, '' '' is the AND, ''⊗'' is the XOR and ''⊕'' is the OR operators, r 1 = randbin(0, 1) and r 2 = randbin(0, 1) are two binary integer numbers randomly generated.
An exciting remark comes from the works from Pu et al. [105], Bin et al. [106], Pirhayati and Mazlumi [108] and Gomez et al. [107], in which the authors change the positions of the AND and XOR gates in (79) and (80).
ELLIACKIN FIGUEIREDO received the B.Eng. degree in computer engineering from the University of Pernambuco, Brazil, in 2010, and the M.Sc. and Ph.D. degrees in computer science from the Federal University of Pernambuco (UFPE), Brazil, in 2013 and 2017, respectively. Since 2020, he has been a Brazilian Civil Servant and a Project Manager with the Innovation Laboratory of Information Technology (STI Labs), UFPE. In the STI Labs, he works developing and delivering software solutions that meet business needs of the university. His passion for agile methods is based on his belief that these methods enable to deliver high quality software products to customers and end-users. His research interests include artificial intelligence and data science as well as their applications in engineering and financial problems. Besides that, he has interests in software engineering and agile methods, such as XP and Scrum.
CLODOMIR SANTANA received the bachelor's degree in computer engineering from the Polytechnic School, University of Pernambuco, and the master's degree in systems engineering from the University of Pernambuco. He is currently pursuing the Ph.D. degree in computer science with the University of Exeter, U.K. During his undergraduate, he was awarded a Scholarship from the Brazilian Coordination of Superior Level Staff Improvement (CAPES) to be a Visiting Student at the Faculty of Engineering and Applied Science, Memorial University, Canada, for a period for 16 months. His master's dissertation received an honorable mention from the University of Pernambuco, in 2019. His research interests include bio-inspired metaheuristics, robots' swarm, clustering techniques, multi-objective optimization, complex networks, human dynamics, and neural networks.